How to diagnose resource bottleneck from the status and wait type of a request

     Summary: First run sys.dm_exec_requests (sys.dm_exec_connectionssys.dm_exec_sessions) to understand the status and the wait type of each connection.   Then, identity the resources in bottleneck based on the wait type.

1.     sys.sysprocesses (SQL Server 2000 and earlier)
a.  What is it?
Contains information about processes that are running on an instance of SQL Server. These processes can be client processes or system processes. To access sysprocesses, you must be in the master database context, or you must use the master.dbo.sysprocesses three-part name.

b.  How to use it?
SELECT  [spid] ,
        [kpid] ,
        [blocked] ,
        [waittype] ,
        [waittime] ,
        [lastwaittype] ,
        [waitresource] ,
        [dbid] ,
        [uid] ,
        [cpu] ,
        [physical_io] ,
        [memusage] ,
        [login_time] ,
        [last_batch] ,
        [open_tran] ,
        [status] ,
        [host_name] ,
        [program_name] ,
        [hostprocess] ,
        [cmd] ,
        [net_library] ,
FROM    sys.[sysprocesses]
WHERE   [spid] > 50

c.      Limitations
-   Deprecated. It is replaced with three DMVs in SQL Server 2005+: sys.dm_exec_connections sys.dm_exec_sessions, and sys.dm_exec_requests

-   Does not support MARS

In SQL Server 2000 and earlier, database applications could not maintain multiple active statements on a connection. When using SQL Server default result sets, the application had to process or cancel all result sets from one batch before it could execute any other batch on that connection. SQL Server 2005 introduced a new connection attribute that allows applications to have more than one pending request per connection, and in particular, to have more than one active default result set per connection.

2.     Background info about connection, session, and request
a.  The connection is the physical communication channel between SQL Server and the application: the TCP socket, the named pipe, the shared memory region.

b. The session in SQL Server corresponds to the Wikipedia definition of a session: a semi-permanent container of state for an information exchange. In other words the sessions stores settings like cache of your login information, current transaction isolation level, session level SET values etc etc.

Normally there is one session on each connection, but there could be multiple session on a single connection (Multiple Active Result Sets, MARS) and there are sessions that have no connection (SSB activated procedures, system sessions). There are also connections w/o sessions, namely connections used for non-TDS purposes, like database mirroring sys.dm_db_mirroring_connections or Service Broker connections sys.dm_broker_connections.

c.  An execution request in SQL Server means SQL Server is processing a user or system command.

The Common user command types include the following:
·        SELECT
·        INSERT
·        UPDATE
·        DELETE
·        BACKUP LOG
·        DBCC
·        FOR

Internal system processes set the command based on the type of task they perform. Tasks can include the following:
·        WRITER

a.  What is it?
Returns information about the connections established to this instance of SQL Server and the details of each connection (since startup of the server)

b.  How to use it?
--Find all of the connections to the server since its startup
SELECT * FROM sys.dm_exec_connections

a.  What is it?
Returns one row per authenticated session on SQL Server (also since its startup). sys.dm_exec_sessions is a server-scope view that shows information about all active user connections and internal tasks. Use sys.dm_exec_sessions to first view the current system load and to identify a session of interest, and then learn more information about that session by using other dynamic management views or dynamic management functions.

b.  How to use it?
--Example 1: Finding users that are connected to the server and the number of sessions for each user.
SELECT login_name ,COUNT(session_id) AS session_count
FROM sys.dm_exec_sessions
GROUP BY login_name;

--Example 2: Finding long-running cursors, who created the cursors, and what session the cursors are on.
USE master;
SELECT creation_time ,cursor_id
    ,name ,c.session_id ,login_name
FROM sys.dm_exec_cursors(0) AS c
JOIN sys.dm_exec_sessions AS s
   ON c.session_id = s.session_id
WHERE DATEDIFF(mi, c.creation_time, GETDATE()) > 5;

-- Example 3: Finding idle sessions that have open transactions
FROM sys.dm_exec_sessions AS s
    SELECT *
    FROM sys.dm_tran_session_transactions AS t
    WHERE t.session_id = s.session_id
    SELECT *
    FROM sys.dm_exec_requests AS r
    WHERE r.session_id = s.session_id
--Example 4: Finding information about a query's own connection
    c.session_id, c.net_transport, c.encrypt_option,
    c.auth_scheme, s.host_name, s.program_name,
    s.client_interface_name, s.login_name, s.nt_domain,
    s.nt_user_name, s.original_login_name, c.connect_time,
FROM sys.dm_exec_connections AS c
JOIN sys.dm_exec_sessions AS s
    ON c.session_id = s.session_id
WHERE c.session_id = @@SPID;

a.  What is it?
     Returns information about each request that is executing within SQL Server (instant, not accumulative).

b.  How to use it?
--Example 1: Finding the query text for a running batch.

--First, get the SQL_Handle 0x020000005097B03AC054C5C0E9C91CEE079B942A54BBDA7D
SELECT * FROM sys.dm_exec_requests;

--Then, use the sql_handle with system function sys.dm_exec_sql_text to obtain the statement text
SELECT * FROM sys.dm_exec_sql_text(0x020000005097B03AC054C5C0E9C91CEE079B942A54BBDA7D);

--Example 2: Finding all locks that a running batch is holding

-- First find the interesting batch and copy its transaction_id from the output: 368020.
SELECT * FROM sys.dm_exec_requests;

--Then, to find lock information, use the transaction_id with the system function sys.dm_tran_locks.
SELECT * FROM sys.dm_tran_locks
WHERE request_owner_type = N'TRANSACTION'
    AND request_owner_id = 368020;

--Example 3Finding all currently blocked requests
SELECT session_id ,status ,blocking_session_id
    ,wait_type ,wait_time ,wait_resource
FROM sys.dm_exec_requests
WHERE status = N'suspended';

--Example 4: Join the three DMVs to return similar data as sysprocesses
 FROM sys.[dm_exec_requests] r
 RIGHT OUTER JOIN sys.[dm_exec_sessions] s
 ON r.[session_id]=s.[session_id]
 RIGHT OUTER JOIN sys.[dm_exec_connections] c
 ON s.[session_id]=c.[session_id]
 WHERE s.[session_id]>50

6.     sys.dm_os_wait_stats
a.     What’s for?
      Returns information about all the waits encountered by threads that executed. You can use this aggregated view to diagnose performance issues with SQL Server and also with specific queries and batches.

   Column name
   Name of the wait type. For more information, see Types of Waits, later in this topic.
   Number of waits on this wait type. This counter is incremented at the start of each wait.
   Total wait time for this wait type in milliseconds. This time is inclusive of signal_wait_time_ms.
   Maximum wait time on this wait type.
   Difference between the time that the waiting thread was signaled and when it started running.

It shows the time for waits that have completed. It does not show current waits.

The contents of this dynamic management view can be reset by running the following command:

DBCC SQLPERF ('sys.dm_os_wait_stats', CLEAR);

b.     How to use it?
                               SELECT * FROM sys.[dm_os_wait_stats]
c.      The common wait types
                                          i.     LCK_XX –blocking is the most popular wait type. It starts with LCK_

                                        ii.     PAGEIOLATCH_XX - blocking is the buffer pool. PAGEIOLATCH_SH and PAGEIOLATCH_EX are the two most common ones.
-        PAGEIOLATCH_SH:shared buffer page io latch: a bottleneck in memory
-        PAGEIOLATCH_EX:exclusive buffer page io latch: a bottleneck in Disk IO
-        PAGEIOLATCH_DT:destroy buffer page io latch
-        PAGEIOLATCH_KP:keep buffer page io latch
-        PAGEIOLATCH_NL:null buffer page io latch
-        PAGEIOLATCH_UP:update buffer page io latch

                                      iii.     Writelog- waiting for writing the log records in memory to log file: a bottleneck in Disk I/O

                                      iv.     PAGELATCH_XX – blocking on hot pages when inserting data in the buffer pool, it’s nothing to do with IO.

In order to make the page less hot, or writing to multiple pages simultaneously. two possible ways to reduce PAGELATCH_XX:
1.     Create the clustered index on other column, rather than on the identity column
2.     Partition the table based on another table column.

                                        ii.     PAGELATCH_XX can also occur on tempdb, resulting in potential bottleneck. The solution to reduce PAGELATCH_XX in tempdb is:
1.     Create multiple data files for temdb, =number of CPUs
2.     The same file size
3.     Provide sufficient initial size for the tempdb files, and do not let tempdb auto-grow happening, which would be on one file only.

                                        v.     LATCH_XX – in addition to PAGELATCH_XX for synchronizing buffer pages, SQL Server also needs to synchronize other resources. If there are many latches, check
1.     SQL Server running well?
2.     Any exceptions or errors occurred? - Checking access violation in the error log
3.     Side effects of other bottlenecks?
4.     Upgrade

                                      vi.     ASYNC_NETWORK_IO (called NETWORK_IO in SQL 2000) – SQL Server returns faster than the application accepts, output cache is full. Solutions:
1.     Do you really need to return the massive data from SQL? Can the application program handle the massive data?
2.     Why the client application acknowledge so slow? – Any bottlenecks in CPU/IO/Memory?

                                    vii.     CMEMTHREAD, SOS_RESERVEDMEMBLOCKLIST, RESOURCE_SEMAPHORE_QUERY_COMPILE – memory-related waiting types.
1.     If you see these, check the memory bottleneck first.

                                  viii.     SQLTrace_XX – caused by SQL Trace or Profiler
1.     Stop SQL Trace or Profiler if possible

7.     Troubleshooting too many tasks are runnable
a.       If SQL Server runs well, the number of running and runnable SPIDs should not be too high. If high, check to see if

b.      Is CPU near 100%? If yes,
                                     i.     Optimize the most expensive query or application
                                   ii.     Add more CPUs

c.      If CPU is less than 50%, it is likely caused by spinlock (A spinlock is like a latch a lightweight synchronization object used by the storage engine to synchronize thread access to shared data structures. The main difference to a latch is that you actively wait for the spinlock – without leaving the CPU. A “wait” on a spinlock always happens on the CPU, in the RUNNING state. )
                                     i.     SQL Server 2005 64-bit likely has the MUTEX spinlock problem if caching too many execution plan security contexts. Run DBCC freesystemcache(tokenandpermuserstore)
                                   ii.     Use trace flag /T 4618 and /T 4610 to start SQL, which makes SQL uses different cache management mechanism.
                                 iii.     The problem rarely happens in 2008+

8.     The Life Cycle of a SQL request and Wait Type
a.       Stage 1 - The client send requests to SQL Server, requests transmit over the network, SQL Server receives the requests.
                                     i.     No wait types
                                   ii.     But the performance is influenced by the commands in the request (number and complexity of commands) and network speed. Sometimes,
1.     Put commands into batch can help
2.     Put SQL Server and applications on the same machine can help

b.      Stage 2 – SQL Server checks the commands syntactically, compiles them, generates execution plans or finds the cached ones. The possible bottlenecks and wait types are:
                                     i.     CPU Bottlenecks: high CPU %, busy SQLOS scheduler
                                   ii.     Memory Bottlenecks: cmemthread, sos_reservedmemblocklist, resource_semaphore_query_compile (if not reporting Error 701), spinlock due to searching for the cached execution plan
                                 iii.     Schema lock in the compilation/recompile process – Bottleneck: LCK_M_

c.       Stage 3 – Execute the commands
                                     i.     Step 1 – SQL Server request memory for the commands
1.     Wait type: resource_semaphore_xx if the new commands cannot get the requested memory.
                                   ii.     Step 2 - If the data are not cached, SQL Server needs to get the data from disk to memory. If not enough free pages in cache buffer, SQL Server needs to manage the buffer and do paging to create enough space for the data.
1.     Wait type: PAGEIOLATCH_xx
                                 iii.     Step 3 – find the records based on the execution plan
1.     Wait type: LCK_XX
                                 iv.     Step 4 – union or computation (sum, max, sort, etc.)
1.     Bottleneck: high CPU %
                                   v.     Step 5 - SQL Server may use tempdb for temp tables, table variables, or other objects.
1.     Bottleneck: tempdb
                                 vi.     Step 6 – If the command is to modify data, SQL Server will change the page data in the buffer pool.
1.     Wait type: PageLatch_XX
                               vii.     Step 7 – If the command is to modify data, SQL Server needs to record the changes in the log file first before committing the transaction.
1.     Wait type: writelog
                             viii.      Step 8- return the results to the client
1.     Wait type: ASYNC_NETWORK_IO

d.      All of the actions depends on the availability of a worker or thread and a scheduler in a CPU.
                                     i.     If no worker or thread available, the task has to wait. The wait status is Ox46UMSTHREAD.
                                   ii.     If worker is available but the scheduler is not available, the status of the request is runnable
                                 iii.     If scheduler is available now, the status is running. If the task is CPU expensive, it will run a while on the CPU with a high %