Troubleshooting a “Forever” Recovering Database

1.         Step 1 – check the ErrorLog to see the progress, the remaining seconds, and the spid.

2.         Step 2 - estimate the recovering progress by running the script below every several minutes to see if the usages of the diskI/O and CPU have increased (or you can also find similar info from the errorlog). If yes, it is in progress, but just takes a while.

SELECT * FROM sys.sysprocesses
SELECT * FROM sys.dm_exec_requests 

3.         Step 3 – find out the number of VLFs. The larger the VLFs, the longer the recovering
-- If the database is online
DBCC LogInfo (DBName) 
 -- If the database is offline, you can use the Windbg tool to analyze the dump file
http://msdn.microsoft.com/en-us/windows/hardware/hh852365.aspx

4.         Step 4 – If no changes on I/O and CPU for a long time, consider to restart SQL Server. In some case, you may need to install the patches. For details, see

a.         FIX: Slow performance when you recover a database if there are many VLFs inside the transaction log in SQL Server 2005, in SQL Server 2008 or in SQL Server 2008 R2 (https://support.microsoft.com/kb/2455009)

b.         FIX: It takes a long time to restore a database in SQL Server 2008 R2 or in SQL Server 2008 or in SQL 2012 (https://support.microsoft.com/kb/2653893)

c.          Procedures for using the patches above: (1) Stop SQL Server in production, (2) install the patch on test, (3) copy the servervr.exe and SQLOS.dll to production, and (4) restart the SQL Server in production.

5.         Sometime, when you try to kill a spid, but the spid keeps rolling back, you can run the following to see how long it will be done. If it takes days to finish the roll back, you can either wait or restart SQL Server. Remember, after the restart, SQL Server will still be in recovering state (rolling back) for a long time if the VLFs are too large.

 KILL <spid> WITH statusonly

6.         Best practices to prevent a long time ‘recovering’
a.         Log file – auto-growth by 200-300 MB, do not use %
b.         Frequently backup log files to reduce VLFs, or
c.          Shrink the log files
d.         Install the latest updates and patches