Trouble-shooting SQL I/O Problems

Summary: If you have faced I/O related error, you basically have three options: (1) Replacing the damaged hardware - often the best solution, (2) Restoring the backup - the cleanest solution if hardware errors can be excluded, and (3) Using DBCC CHECKDB - a quick-and-dirty fix.

 

Scenario: If you have faced repeatedly occurring database problems, most likely it is caused by the I/O subsystem (e.g., Windows, controller, drives, cables, disk, RAM, and many other hardware problems). SQL Server itself rarely is the root cause of the database problem.

 

1.  How SQL Server writes data?

The I/O from an instance of the Database Engine includes logical and physical writes. A logical write occurs when data is modified in a page in the buffer cache. A physical write occurs when the page is written from the buffer cache to disk.

 

When a page is modified in the buffer cache, it is not immediately written back to disk; instead, the page is marked as dirty. This means that a page can have more than one logical write made before it is physically written to disk. For each logical write, a transaction log record is inserted in the log cache that records the modification. The log records must be written to disk before the associated dirty page is removed from the buffer cache and written to disk. SQL Server uses a technique known as write-ahead logging that prevents writing a dirty page before the associated log record is written to disk. This is essential to the correct working of the recovery manager.

 

The following illustration shows the process for writing a modified data page.

 

 

https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjmWO4Laywjyh-VKqpBA9y_2IXvwAcUunkr9T_2KUVIEOjzdzAF8wb_xG1SWs-oeL7TQRGweB_n6uurxh7axCQfgFLQ2K004taZRWYh8PXRu5_6ntJoLWap6mcWD-JJhbj1bNPcoRfbY_c/s1600/writing+a+modified+page.gif

 

When the buffer manager writes a page, it searches for adjacent dirty pages that can be included in a single gather-write operation. Adjacent pages have consecutive page IDs and are from the same file; the pages do not have to be contiguous in memory. The search continues both forward and backward until one of the following events occurs:

·        A clean page is found.

·        32 pages have been found.

·        A dirty page is found whose log sequence number (LSN) has not yet been flushed in the log.

·        A page is found that cannot be immediately latched.

In this way, the entire set of pages can be written to disk with a single gather-write operation.

 

Just before a page is written, the form of page protection specified in the database is added to the page. If torn page protection is added, the page must be latched EX(clusively) for the I/O. This is because the torn page protection modifies the page, making it unsuitable for any other thread to read. If checksum page protection is added, or the database uses no page protection, the page is latched with an UP(date) latch for the I/O. This latch prevents anyone else from modifying the page during the write, but still allows readers to use it.

 

A dirty page is written to disk in one of three ways.

·        Lazy writing

The lazy writer is a system process that keeps free buffers available by removing infrequently used pages from the buffer cache. Dirty pages are first written to disk.

·        Eager writing

The eager write process writes dirty data pages associated with nonlogged operations such as bulk insert and select into. This process allows creating and writing new pages to take place in parallel. That is, the calling operation does not have to wait until the entire operation finishes before writing the pages to disk.

·        Checkpoint

The checkpoint process periodically scans the buffer cache for buffers with pages from a specified database and writes all dirty pages to disk. Checkpoints save time during a later recovery by creating a point at which all dirty pages are guaranteed to have been written to disk. The user may request a checkpoint operation by using the CHECKPOINT command, or the Database Engine may generate automatic checkpoints based on the amount of log space used and time elapsed since the last checkpoint. In addition, a checkpoint is generated when certain activities occur. For example, when a data or log file is added or removed from a database, or when the instance of SQL Server is stopped.

 

The lazy writing, eager writing, and checkpoint processes do not wait for the I/O operation to complete. They always use asynchronous (or overlapped) I/O and continue with other work, checking for I/O success later. This allows SQL Server to maximize both CPU and I/O resources for the appropriate tasks.

 

2.  Troubleshooting Some Common Errors related to SQL Server I/O

2.1.               Error: 823

2.1.1.                   Symptom – a physical consistency problem has occurred

 

2010-03-06 22:41:19.55 spid58      Error: 823, Severity: 24, State: 2.

2010-03-06 22:41:19.55 spid58      The operating system returned error 38(Reached the end of the file.) to SQL Server during a read at offset 0x000000a72c0000 in file 'C:\Program Files\Microsoft SQL Server\MSSQL10.SQL2008\MSSQL\DATA\my_db.mdf'. Additional messages in the SQL Server error log and system event log may provide more detail. This is a severe system-level error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.

 

2009-12-09 04:53:49.170 spid25s      Error: 823, Severity: 24, State: 2. 
2009-12-09 04:53:49.170 spid25s      The operating system returned error 21(The device is not ready.) to SQL Server during a read at offset 0x00000000132000 in file 'D:\tempdb\tempdb_data.mdf'. Additional messages in the SQL Server error log and system event log may provide more detail. This is a severe system-level error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online. 

 

2.1.2.                   Cause

 

SQL Server uses Windows API's [e.g. ReadFile, WriteFile, ReadFileScatter, WriteFileGather] to perform the I/O operations. After performing these I/O operations, SQL Server checks for any error conditions associated with these API calls. If these API calls fail with an Operating System error, then SQL Server reports Error 823.

 

The 823 error message contains the following information:

·       The database file against which the I/O operation was performed

·       The offset within the file where the I/O operation was attempted

·       Is the I/O operation a read or write request

·       The Operating System Error code and Error Description

 

The 823 error message usually indicates that there is a problem with underlying storage system or the hardware or a driver that is in the path of the I/O request. You can encounter this error when there are inconsistencies in the file system or if the database file is damaged.

 

From the mechanism of error generation, the Error 823 occurs when read/write a page. It is nothing to do with the content of the page. So Error 823 is less likely caused by SQL Server. Usually a physical file corruption causes this error, but may also be caused by a device driver. If Error 823 repeatedly occurs on a data file, often it signifies a problem with the hardware device, or the data file is very serious damaged. This basically means that the some valid data has been lost, generally DBCCCHECKDB is hard to repair the problems.

(从错误产生的机制可以看出,823错误是发出一个页面读写请求时发生的,和读写的内容没有关系。所以823错误和SQL Server本身无关。通常是物理文件损坏导致此错误,但也可能是设备驱动程序导致的。如果某个数据文件上反复出现823错误,要不就是硬件设备出了问题,要不就是数据文件已经发生了非常严重的损坏。这个错误基本上意味着数据页里的有效数据已经丢失,一般DBCCCHECKDB很难修复。)

 

2.1.3.                   Solution

 

·       Review the suspect_pages table in msdb to check if other pages [in the same database or different databases] are encountering this problem.

·       Check the consistency of the databases that are located in the same volume [as the one reported in the 823 message] using DBCC CHECKDB command. If you find inconsistencies from the DBCC CHECKDB command, please use the guidance from Knowledge Base article How to troubleshoot database consistency errors reported by DBCC CHECKB

·       Review the Windows Event logs for any errors or messages reported from the Operating System or a Storage Device or a Device Driver. If they are related to this error in some manner, please address those errors first. For example, apart from the 823 message, you may also notice an event like "The driver detected a controller error on \Device\Harddisk4\DR4" reported by the Disk source in the Event Log. In that case, you have to evaluate if this file is present on this device and then first correct those disk errors.

·       Use the SQLIOSim utility to find out if these 823 errors can be reproduced outside of regular SQL Server I/O requests. Note the SQLIOSim tool ships with SQL Server 2008 so there is no need for a separate download on this version and later.

·       Work with your hardware vendor or device manufacturer to ensure

o   The hardware devices and the configuration confirms to the I/O requirements of SQL Server

o   The device drivers and other supporting software components of all devices in the I/O path are updated

·       If the hardware vendor or device manufacturer provided you with any diagnostic utilities, please use them to evaluate the health of the I/O system

·       Evaluate if there are Filter Drivers that exist in the path of these I/O requests that encounter problems.

o   Check if there are any updates to these filter drivers

o   Can these filter drivers be removed or disabled to observe if the problem that results in the 823 error goes away

·       Check to make sure you are not running into any known issue that can result in a 823 error to be reported:

o   FIX: The Database Mirroring session is disconnected when you manually perform a failover in SQL Server 2008

o   FIX: Error message when you run the CHECKPOINT statement after you restore a database that has a read-only user-defined filegroup in SQL Server 2005: "The operating system returned error 5(Access is denied.) to SQL Server"

 

2.2.               Error: 824

2.2.1.                   Symptom – a logical consistency error

 

You might encounter the following error message in the SQL Server error log or the Windows Application event log if a logical consistency check fails after reading or writing a database page:

 

2009-11-02 15:46:42.90 spid51      Error: 824, Severity: 24, State: 2.

2009-11-02 15:46:42.90 spid51      SQL Server detected a logical consistency-based I/O error: incorrect pageid (expected 1:43686; actual 0:0). It occurred during a read of page (1:43686) in database ID 23 at offset 0x0000001554c000 in file 'H:\MSSQL.SQL2008\MSSQL\DATA\my_db.mdf'.  Additional messages in the SQL Server error log or system event log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.

 

2.2.2.                   Cause

 

SQL Server uses Windows API's [e.g. ReadFile, WriteFile, ReadFileScatter, WriteFileGather] to perform the I/O operations. After performing these I/O operations, SQL Server checks for any error conditions associated with these API calls. If these API calls fail with an Operating System error, then SQL Server reports the Error 823. There can be situations where the Windows API call actually succeeds but the data transferred by the I/O operation might have encountered a logical consistency problem. These logical consistency problems are reported through Error 824.

 

The 824 error contains the following information:

·       The database file against which the I/O operation is performed

·       The offset with the file where the I/O operation was attempted

·       The database to which this file belongs

·       The page number which was involved in the I/O operation

·       Was the operation a read or write operation

·       Details about the logical consistency check that failed [The type of check, actual value and expected value used for this check]

 

These logical consistency checks are additional integrity checks performed by SQL Server to ensure certain key aspects of the data that was involved in the data transfer was maintained through out the I/O Operation. The checks include checksum, Torn Page, Short Transfer, Bad Page Id, Stale Read, Page Audit Failure. The nature of the checks performed vary depending on different configuration options at the database and server level. 

 

The 824 error message usually indicates that there is a problem with underlying storage system or the hardware or a driver that is in the path of the I/O request. You can encounter this error when there are inconsistencies in the file system or if the database file is damaged.

 

As it can be seen from the above description, although Error 824 is a "logic error" on data corruption detected by SQL Server, the sources of damage, mostly are not SQL Server itself. The main problem is that the expected writing is not completely done. Therefore, the reason for the Error 824, is basically in the I/O subsystem. Because SQL Server read/write requests are sent to Windows, and then Windows distributes the requests to the underlying disk system, so the problem may occur at each layer below Windows such as a disk drive failure, a problematic disk firmware, and device drivers incorrect and so on. We are pretty sure that SQL Server itself does not cause the 824 errors.

 

As 824 error occurs as a logic error on a page, DBCC CHECKDB can be used to repair the errors in many cases. However, this repair is only for the logical error. For the data lost on the page before the Error 824, SQL Server cannot get them back. Thus, Error 824 typically indicates part of the data has been lost.

 

(从上面的介绍可以看出,824虽然是一个逻辑错误,是SQL Server主动发现的数据损坏,但是损坏的来源,大都不是SQL Server自己。这里的错误,主要是由于预期的写入没有完全完成导致的。所以824错误的原因,基本上还是在I/O子系统。由于SQL Server的读写请求是发给Windows,再由Windows发给底层的磁盘系统的,所以问题有可能发生在Windows以下的每一层,例如磁盘驱动器存在故障、磁盘固件存在问题、设备驱动程序不正确等。可以负责任地说,SQL Server自己是不会导致824错误的

 

由于824错误是发生在页面这一级的逻辑错误,所以很多时候DBCC CHECKDB能够修复。但是这种修复也仅仅是逻辑上的,页面里面存储的数据在824错误发生之前就已经丢失,SQL Server无法将它们修复回来。所以824错误基本也意味着部分数据丢失)

 

2.2.3.                   Solution

 

·       Review the suspect_pages table in msdb to check if other pages [in the same database or different databases] are encountering this problem.

·       Check the consistency of the databases that are located in the same volume [as the one reported in the 824 message] using DBCC CHECKDB command. If you find inconsistencies from the DBCC CHECKDB command, please use the guidance from Knowledge Base article How to troubleshoot database consistency errors reported by DBCC CHECKDB.

·       If the database that encounters these 824 errors does not have the PAGE_VERIFY CHECKSUM database option turned on, please do so immediately. 824 errors can occur for other reasons than a checksum failure but CHECKSUM provides the best option to verify consistency of the page after it has been written to disk.

·       Review the Windows Event logs for any errors or messages reported from the Operating System or a Storage Device or a Device Driver. If they are related to this error in some manner, please address those errors first. For example, apart from the 824 message, you may also notice an event like "The driver detected a controller error on \Device\Harddisk4\DR4" reported by the Disk source in the Event Log. In that case, you have to evaluate if this file is present on this device and then first correct those disk errors.

·       Use the SQLIOSim utility to find out if these 824 errors can be reproduced outside of regular SQL Server I/O requests. Note that SQLIOSim ships with SQL Server 2008 so there is no need for a separate download on this version or later.

·       Work with your hardware vendor or device manufacturer to ensure:

o   The hardware devices and the configuration confirms to the I/O requirements of SQL Server

o   The device drivers and other supporting software components of all devices in the I/O path are updated

·       If the hardware vendor or device manufacturer provided you with any diagnostic utilities, please use them to evaluate the health of the I/O system

·       Evaluate if there are Filter Drivers that exist in the path of these I/O requests that encounter problems.

o   Check if there are any updates to these filter drivers

o   Can these filter drivers be removed or disabled to observe if the problem that results in the 824 error goes away

 

2.3.               Error: 605

2.3.1.                   Symptom – a page or page allocation error

 

A query can fail with an error like the following if it needs to read a database page where the allocation unit as stored in the page header (also referred to as the actual allocation unit) does not match the expected allocation unit based on table(s) referenced in the query:

Msg 605, Level 21, State 3, Line 1
Attempt to fetch logical page (1:224) in database 11 failed. It belongs to allocation unit 72058253744865280 not to 72057594040287232

When the severity is Level 21, the user session is disconnected and the error is written into the SQL ERRORLOG and Windows Application Event Log as EventID=605. The severity could also be 12 which would only result in the error returned to the client application. Read below in the remaining portions of the article for more information about the severity 12 scenario.

In this message the first allocation unit listed (after the "belongs to...") is the actual allocation unit and the second allocation unilt (the one after "not to") is the expected allocation unit. An allocation unit is an ID that identifies which index and object a page belongs to.

 

2.3.2.                   Cause

 

Severity = 21

This problem could be an issue with the database page or a problem with the SQL Server engine when determining what the expected allocation unit should be. See the Resolution section for more details.

Severity = 12

The severity of this error can also be Level 12 if the query fails because it is using read uncommitted isolation level or the NOLOCK query hint (also known as a "dirty read").

 

严重级别为21表示可能存在数据损坏。造成的原因包括损坏的页链、损坏的IAM或该对象的sys.objects目录视图中存在无效条目。这些错误通常由硬件或磁盘设备驱动程序故障而引起

严重级别为12表示可能存在暂时性错误,即在缓存中出现错误,但不表示对磁盘上的数据造成破坏

由于605这个错误意味着一些页面分配出了问题,所以也是一个非常严重的数据库损坏。一般用DBCC CHECKDB也很难修复

 

2.3.3.                   Solution

 

Severity = 21

If DBCC CHECKDB or CHECKTABLE reports an error (which should be Msg 2533), then the page is damaged or an incorrect page. You should restore from a backup to resolve the problem. If you cannot restore from a backup, use the repair options with DBCC CHECKDB. To find the cause of this problem you should use techniques as you would with other database corruption errors including:

·       Check for hardware or system level problems (A tool such as SQLIOSIM can be helpful to do this. SQLIOSIM ships with SQL Server 2008 and SQL Server 2008 R2 and does not require a separate download)

·       Ensure you have the PAGE_VERIFY=CHECKSUM database option turned on. While it is possible to still encounter an Msg 605 error with checksum enabled (Ex. "scribbler" problem or incorrect page provided from I/O system), enabling checksum can provide definitive proof that the page was modified incorrectly within the I/O system.

·       Try to restore a database backup you know that will be "clean" (no errors from CHECKDB) and transaction log backups you know span the time when the error was encountered. If you can "replay" this problem by restoring a "clean" database backup and transaction logs then contact Microsoft Technical Support for assistance.

·       If DBCC CHECKDB does not report an error and the problem occurs frequently, you should contact Microsoft Technical Support for assistance. Be prepared to help find the query that is encountering the Msg 605 error. The More Information section talks about how to identify the query that has encountered this error.

 

Severity = 12

·       Avoid using a read uncommitted isolation level (or NOLOCK hint)

·       Be prepared to retry the query until you do not see the error

·       Avoid making changes to the affected table(s) while running the query.

2.4.               Other Errors

2.4.1.                   Error 8946 – Allocation page has invalid PFS_PAGE page header values.

2.4.2.                   Error 8966 – Could not read and latch page with latch type SH. Sysindexes failed.

2.4.3.                   Error 2570 – Column ‘C1” value is out of range for data type “datetime”.

2.4.4.                   Error 3854 – Meta-data error such as an object (e.g. a constraint) has a parent that has an invalid object type in a system table.

2.4.5.                   Solutions:

Try DBCC CHECKDB. Some errors can be fixed without data loss, some can be fixed with data loss, whereas others cannot be fixed even with data loss.

 

3.  How to Do a System Health Checking?

3.1.               Are there any unexpected shutdown or blue screen? – Check the Windows System Log

3.2.               Any hardware or device problem, especially related to the read/write? – Check Windows System Log

3.3.               The antivirus program should not scan the data or log files. Temporarily disable the antivirus problem to make sure it is not caused by the antivirus program.

3.4.               How to Do a Stress Test? – SQL Server is an I/O heavy program. The hardware must pass the stress test and keep the data in integrity when the power is off.

 

Test Goal

The Best Test Strategies

Transaction Safety: Commit

1)    Modify data in a table, commit the tran

2)    Take off the power cord

3)    Restart SQL Server

4)    Run DBCC CHECKDB, make sure the DB is not corrupted.

5)    Make sure the data was changed.

Transaction Safety: Rollback

1)    Insert a group of data into a table, commit the tran

2)    Begin transactions to modify data, but not commit them

3)    Run CHECKPOINT on each database

4)    Take off Power

5)    Restart SQL Server

6)    RUN DBCC CHECKDB to make sure the Database is not corrupted.

7)    Make sure the committed transaction is saved and the uncommitted ones were rolled back.

Backup Safety

1)    Make a full backup

2)    Make a log backup

3)    Do insert, update, and delete with a large volume of data

4)    While modifying, do another log backup

5)    After the log back up is done, immediately take off the power

6)    Restart SQL Server

7)    Run DBCC CHECKDB to make sure the database is not corrupted.

8)    Make sure you can do a STOPAT restore with these backups.

 

It is highly recommended to run SQLIOSim for every SQL Server, to make sure the I/O system is fine.

 

4.  How does SQL Server Detect/Fix IO errors?

 

4.1.               Checksum (default, heavy, thorough) and Torn_Page_Detection

 

You can change the setting:

 

ALTER DATABASE xxx PAGE_VERIFY(CHECKSUM |TORN_PAGE_DETECTION|NONE)

 

4.2.               Trace flags for detecting changes on the page in the buffer pool: 831, 815, 818.

 

DBCC TRACEON(815,-1)-- latch the pages in buffer for unexpected modification requests

DBCC TRACEON(831,-1)-- also performs a checksum check

DBCC TRACEON(818, -1)-- for diagnosing the possible stale read/lost write symptoms.

 

4.3.               Automatic Page Repair was introduced in SQL Server 2008 Database Mirroring.

 

Automatic page repair is supported by database mirroring and by AlwaysOn Availability Groups. After certain types of errors corrupt a page, making it unreadable, a database mirroring partner (principal or mirror) or an availability replica (primary or secondary) attempts to automatically recover the page. The partner/replica that cannot read the page requests a fresh copy of the page from its partner or from another replica. If this request succeeds, the unreadable page is replaced by the readable copy, and this usually resolves the error.

 

Note: Automatic page repair differs from DBCC repair. All of the data is preserved by an automatic page repair. In contrast, correcting errors by using the DBCC REPAIR_ALLOW_DATA_LOSS option might require that some pages, and therefore data, be deleted.

 

4.3.1.                   Error Types That Cause an Automatic Page-Repair Attempt

 

Database mirroring automatic page repair tries to repair only pages in a data file on which an operation has failed for one of the errors listed in the following table.

 

Error number

Description

Instances that cause automatic page-repair attempt

823

Action is taken only if the operating system performed a cyclic redundancy check (CRC) that failed on the data.

ERROR_CRC. The operating-system value for this error is 23.

824

Logical errors.

Logical data errors, such as torn write or bad page checksum.

829

A page has been marked as restore pending.

All.

 

4.3.2.                   Page Types That Cannot Be Automatically Repaired

 

Automatic page repair cannot repair the following control page types:

·       File header page (page ID 0).

·       Page 9 (the database boot page).

·       Allocation pages: Global Allocation Map (GAM) pages, Shared Global Allocation Map (SGAM) pages, and Page Free Space (PFS) pages.

 

4.3.3.                   Handling I/O Errors on the Principal/Primary Database

 

On the principal/primary database, automatic page repair is tried only when the database is in the SYNCHRONIZED state and the principal/primary is still sending log records for the database to the mirror/secondary. The basic sequence of actions in an automatic page-repair attempt are as follows:

1)     When a read error occurs on a data page in the principal/primary database, the principal/primary inserts a row in the suspect_pages table with the appropriate error status. For database mirroring, the principal then requests a copy of the page from the mirror. For AlwaysOn Availability Groups, the primary broadcasts the request to all the secondaries and gets the page from the first to respond. The request specifies the page ID and the LSN that is currently at the end of the flushed log. The page is marked as restore pending. This makes it inaccessible during the automatic page-repair attempt. Attempts to access this page during the repair attempt will fail with error 829 (restore pending).

2)     After receiving the page request, the mirror/secondary waits until it has redone the log up to the LSN specified in the request. Then, the mirror/secondary tries to access the page in its copy of the database. If the page can be accessed, the mirror/secondary sends the copy of the page to the principal/primary. Otherwise, the mirror/secondary returns an error to the principal/primary, and the automatic page-repair attempt fails.

3)     The principal/primary processes the response that contains the fresh copy of the page.

4)     After the automatic page-repair attempt fixes a suspect page, the page is marked in the suspect_pages table as restored (event_type = 5).

5)     If the page I/O error caused any deferred transactions, after you repair the page, the principal/primary tries to resolve those transactions.

 

4.3.4.                   Handling I/O Errors on the Mirror/Secondary Database

 

I/O errors on data pages that occur on the mirror/secondary database are handled in generally the same way by database mirroring and by AlwaysOn Availability Groups.

1)     With database mirroring, if the mirror encounters one or more page I/O errors when it redoes a log record, the mirroring session enters the SUSPENDED state. With AlwaysOn Availability Groups, if a secondary replica encounters one or more page I/O errors when it redoes a log record, the secondary database enters the SUSPENDED state. At that point, the mirror/secondary inserts a row in the suspect_pages table with the appropriate error status. The mirror/secondary then requests a copy of the page from the principal/primary.

2)     The principal/primary tries to access the page in its copy of the database. If the page can be accessed, the principal/primary sends the copy of page to the mirror/secondary.

3)     If the mirror/secondary receives copies of every page it has requested, the mirror/secondary tries to resume the mirroring session. If an automatic page-repair attempt fixes a suspect page, the page is marked in the suspect_pages table as restored (event_type = 4).

 

If a mirror/secondary does not receive a page that it requested from the principal/primary, the automatic page-repair attempt fails. With database mirroring, the mirroring session remains suspended. With AlwaysOn Availability Groups, the secondary database remains suspended. If the mirroring session or secondary database is resumed manually, the corrupted pages will be hit again during the synchronization phase.

 

4.3.5.                   Developer Best Practice

 

An automatic page repair is an asynchronous process that runs in the background. Therefore, a database operation that requests an unreadable page fails and returns the error code for whatever condition caused the failure. When developing an application for a mirrored database or an availability database, you should intercept exceptions for failed operations. If the SQL Server error code is 823, 824, or 829, you should retry the operation later. 

 

4.3.6.                   How To: View Automatic Page-Repair Attempts

 

The following dynamic management views return rows for the latest automatic page-repair attempts on a given availability database or mirrored database, with a maximum of 100 rows per database.

 

·       AlwaysOn Availability Groups:

sys.dm_hadr_auto_page_repair (Transact-SQL)

Returns a row for every automatic page-repair attempt on any availability database on an availability replica that is hosted for any availability group by the server instance.

 

·       Database mirroring:

sys.dm_db_mirroring_auto_page_repair (Transact-SQL)

Returns a row for every automatic page-repair attempt on any mirrored database on the server instance.