Troubleshooting Scheduler and CPU in SQL Server


Summary:

 

Due to the introduction of SQLOS, CPU often is not a trouble maker for SQL Server performances. But problems do occur. The typical problems are:

1.   A long-running task occupies the CPU and makes the server hang, you will find Errors 17883, 17884, and 17888 in the error log. 

2.   High % of CPU usage. In this case, you need to identify the trouble maker. 

Some problems can be solved by upgrading the hardware, some are side effects of memory and disk I/O problems. But, more than often, the problems are caused by bad database designs, bad queries, or inappropriate use of queries. 

 

1.  For SQL Server professionals, we first need to separate OLTP from OLAP (Data Warehouse), then use different strategies to prevent/lessen/solve the problems which make CPU unhappy.

1)            For OLTP:

·       Use small-size indexes for the tables requiring many modifications.

·       Decrease the CPU usage % by reusing the execution plans and joining fewer tables with a flatten design 

·       Optimize indexes.

·       Use small transaction sizes

·       Avoid parallel execution of OLTP queries

2)            For OLAP:

·       You can build as many indexes as you need.

·       For complex queries, recompiling is more efficient than re-using the non-optimal existing plans

·       Consider adding index if necessary.

·       If large scans are inevitable, use index rebuild/reorganization to reduce fragmentation.

·       Parallel execution are often good for OLAP queries             

==================================================================================================

2.  SQLOS (see more on http://blogs.technet.com/b/josebda/archive/2009/03/30/sql-server-2008-sqlos.aspx)

1)            Key components

2.1.1            Scheduler

2.1.2            Memory management

2.1.3            Error and Exception handling

2.1.4            Deadlock detection and resolution

2.1.5            Run 3rd party codes (Dll, Extended SP etc.)

2)              Why use it?

2.2.1            Only the running tasks are assigned with threads, idle connections are ignored

2.2.2            Sharply reduce the context switch in Windows

2.2.3            Very helpful, 1000-2000 simultaneous developers only need 100-200 threads.

3)            Basics

2.3.1            What is it?

i.      User mode layer between SQL Server and the Windows OS.

ii.    Use Thread mode or fiber mode. Default is thread.

2.3.2            Scheduler

i.      One scheduler on a SQL Server, corresponds to one logical CPU

ii.    One time, only one worker can run. Other workers must wait.

2.3.3            Worker

i.      A worker = A thread or fiber, the execution unit.

ii.    Scheduler schedule workers not threads or fibers

iii.  When a worker runs, it will try to finish the entire task unless it yields

iv.  When the worker is idle over 15 minutes, scheduler will delete the worker and the thread/fiber to release the resources

v.    There is a max limit of threads (workers) depending on the number of CPU and 32 vs. 64 bit, such as 256 on a 32-bit, <=4 processors machine.

2.3.4            Task

i.      The smallest execution unit such as a batch

ii.    When it runs, it executes or waits if it is blocked.

2.3.5            Yielding – yielding to other workers

i.      All workers are non-preemptive (vs. preemptive in Windows)

ii.    SQL Server defines many rules for yielding to refrain the running time of a task in the scheduler.

iii.  If for some reason, a task runs over 60s and does not yield, SQL Server Health Monitor will create a mini-dump and a 17833, 17884, or 17888 error in error logs.

·       17883 – a single scheduler has trouble, check the abnormal number of user mode, kernel mode, system idle%, or process utilization% etc. But often it is cause by system IO.

·       17884 – all schedulers have threads waiting or blocked

·       17888 – over 50% scheduler have threads waiting or blocked

iv.  Check the connection status of thread, workers, schedulers etc.

 

--了解scheduler和用户连接所使用的thread的情况UMS = User Mode Scheduling

DBCC SQLPERF(umsstats)

SELECT * FROM sys.sysprocesses WHERE kpid<>0

 

--More details

SELECT * FROM sys.dm_os_schedulers

SELECT * FROM sys.dm_os_workers

SELECT * FROM sys.dm_os_threads

SELECT * FROM sys.dm_os_tasks

SELECT * FROM sys.dm_os_waiting_tasks

SELECT * FROM sys.dm_os_ring_buffers

 

2.3.6            Overall architecture

http://3.bp.blogspot.com/-3K_KiAzzxQU/VBKNFSJwdCI/AAAAAAAAAbk/jBYSfMblWkM/s1600/11.jpg

 

对于每个CPUSQL都会有一个scheduler与之对应。在每个scheduler里,会有若干个worker,对应于每个线程。在客户端发过来请求之后,SQL会将其分解成一个或多个task。根据每个scheduler的繁忙程度,task会被分配到某个scheduler上。如果scheduler里有空闲的workertask就会被分配到某个worker上。如果没有,scheduler会创建新的worker,供task使用。如果scheduler里的worker已经到了他的上限值,而他们都有task要运行,那么新的task只好进入等待worker的状态.

 

3.  SQL Server CPU 100%

1)            Often not a problem, much less memory and disk, typically is low, 60-70% is very high to SQL

2)            When CPU is used?

3.2.1            Compile and recompile

3.2.2            Sort and aggregation

3.2.3            Join

3.2.4            Decompression

3)              How to set CPU in SQL Server (use sp_configure)

3.3.1            Priority boost – not recommended as it may mess up windows

3.3.2            Affinity mask

3.3.3            Lightweight pooling – if on, fiber instead of thread is used, complicated but will reduce the number of context switch. Not suggested to change

3.3.4            Max degree of parallelism = CPU number

3.3.5            Cost Threshold of Parallelism – default to 5, rarely change it

3.3.6            Max workers threads – default to 0, auto decide. Theoretically, 256 on a 32-bit 4CU machine and 512 on a 64-bit 4CPU machine

3.3.7            Bottom line: often caused by workloads, little can be done in SQL Server to improve it. So need to find who caused the high %CPU and anyway to redesign the db.

4)            Troubleshooting

3.4.1            How much is CPU% on the server, how much is from SQL Server? – Check the counters in performance monitor

i.      Entire server

·       Processor%Processor Time: 该计数器可以监视CPU执行的非闲置线程所用的时间。持续80%90%的状态可能表明需要升级CPU或需要增加更多的处理器。

·       Processor%Privileged TimeKernel Mode):对应于处理器执行Microsoft Windows内核命令(如处理SQL Server I/O请求)所用时间百分比。如果Physical Disk计数器的值很高时,该计数器的值也一直很高,则考虑安装速度更快或效率更高的磁盘子系统。

·       Processor%User TimeUser Mode):对应于处理器执行用户进程(例如 SQL Server)所用时间的百分比。

·       SystemProcessor queue length:对应于等待处理器时间的线程数。当一个进程的线程需要的处理器循环数超过可获得的循环数时,就产生了处理器瓶颈。
        
如果有很多进程在争用处理器时间,可能需要安装一个速度更快的处理器。如果使用的是多处理器系统,则可以增加一个处理器。

·       Context switches/sec:指计算机上的所有处理器全都从一个线程转换到另一个线程的综合速率。
        
当正在运行的线程自动放弃处理器时出现上下文转换,由一个有更高优先就绪的线程占先或在用户模式和特权 (内核模式之间转换以使用执行或分系统服务。
        
如果此计数器的数值较大,则表明锁定竞争很激烈,或者线程在用户和内核模式之间频繁切换。 

ii.    SQL Server Process

·       Processor%Processor Tim

·       Processor%Privileged TimeKernel Mode

·       Processor%User TimeUser Mode

3.4.2            Any 17883/17884 or Access Violation errors? – Check the errorlog

3.4.3            Find the most expensive queries and optimize them

i.      SQL Trace is the better, but negative impacts making it impossible to use

ii.    SQL 2005+, use the following DMV, or standard reports

 

--SQL2005 DMV: top 50 queries utilizing CPU since startup

SELECT

    highest_cpu_queries.*,

    highest_cpu_queries.total_worker_time,

    DB_NAME(q.dbidAS dbname,

    q.[text] AS qtext

FROM (SELECT TOP 50

    qs.*

FROM sys.dm_exec_query_stats qs

ORDER BY qs.total_worker_time DESC)

AS highest_cpu_queries

CROSS APPLY sys.dm_exec_sql_text(plan_handle) AS q

ORDER BY highest_cpu_queries.total_worker_time DESC

 

--find the top 25 recomplied usps

SELECT TOP 25

    sql_text.text AS sqltext,

    sql_handle AS sqlhandle,

    plan_generation_num AS plangenerationnum,

    execution_count AS execcount,

    DB_NAME(dbidAS dbname,

    objectid AS objectid

FROM sys.dm_exec_query_stats a

CROSS APPLY sys.dm_exec_sql_text(sql_handleAS sql_text

WHERE plan_generation_num > 1

ORDER BY plan_generation_num DESC

 

iii.  ReadTrace – see http://www.sqlskills.com/blogs/erin/rml-utilities-and-sql-server-2012/

iv.  After finding the problematic queries, the key now is to optimize the database and queries

v.    If you cannot change or re-design, migrate some databases to other servers and upgrade the hardware.

 

4.  How to Improve Performance of an OLTP system?

1)            Database Design?

4.1.1            The frequent run queries have 4+ table join.

i.      Find them

 

--Return the most popular top 100 queries

SELECT TOP 100

        cp.cacheobjtype,

        cp.usecounts,

        cp.size_in_bytes,

        qs.statement_start_offset,

        qs.statement_end_offset,

        qt.dbid,

        qt.objectid,

        SUBSTRING(qt.text, qs.statement_start_offset / 2, CASE

               WHEN qs.statement_end_offset = -1 THEN LEN(CONVERT(nvarchar(MAX), qt.text)) * 2 ELSE qs.statement_end_offset

        END - qs.statement_start_offset / 2) AS statement

FROM sys.dm_exec_query_stats qs

CROSS APPLY sys.dm_exec_sql_text(qs.sql_handleAS qt

INNER JOIN sys.dm_exec_cached_plans AS cp

        ON qs.plan_handle = cp.plan_handle

WHERE cp.plan_handle = qs.plan_handle

AND cp.usecounts > 4

ORDER BY dbid, usecounts DESC

 

ii.    Recommendation: make the tables less normalized.

 

4.1.2            More than 3 indexes are frequently updated

i.      Find them.

--Find the top 100 indexes frequently updated

--From databaseid,objectid,indexidpartitionnumber, determine whhich index on which table

 

SELECT TOP 100 DB_NAME(database_id),*

FROM sys.dm_db_index_operational_stats(NULL,NULL,NULL,null)

ORDER BY leaf_insert_count+leaf_delete_count+leaf_update_count DESC

 

ii.    Suggestions: really needed?

 

4.1.3            A large number of table scans and range scans

i.      Find them: high value of SQL Server: Access Methods – Full Scans/sec and Range Scans/sec, and run the query below:

--return the top 50 queries with the most I/O and their execution plans

SELECT TOP 50

        total_logical_reads / execution_count AS avg_logical_read,

        total_logical_writes / execution_count AS avg_logical_write,

        total_physical_reads / execution_count AS avg_phys_read,

        execution_count AS execcount,

        statement_start_offset AS stmt_start_offset,

        statement_end_offset AS stmt_end_offset,

        SUBSTRING(sql_text.text, (statement_start_offset / 2),

        CASE

               WHEN (statement_end_offset - statement_start_offset) / 2 <= 0 THEN 64000 ELSE (statement_end_offset - statement_start_offset) / 2

        ENDAS exec_statment,

        sql_text.text AS text,

        plan_text.*

FROM sys.dm_exec_query_stats

CROSS APPLY sys.dm_exec_sql_text(sql_handleAS sql_text

CROSS APPLY sys.dm_exec_query_plan(plan_handle) AS plan_text

ORDER BY (total_logical_reads + total_logical_writes) / execution_count DESC

 

ii.    Add appropriate indexes

 

4.1.4            Indexes not used

i.      Find them: indexes not in sys.dm_db_index_usage_stats

ii.    Get rid of them

 

2)            CPU Threshold in the OLTP

4.2.1            Signal waits should be <25%

SELECT CONVERT(NUMERIC(5,4),SUM(signal_wait_time_ms)/SUM(wait_time_ms))

FROM sys.dm_os_wait_stats

4.2.2              Re-use rate of execution plans in OLTP should be >95%, computed from SQL Server Statistics

Initial compilation/sec = SQL Compilation/sec – SQL Re-compilation/sec

Re-use rate of execution plans = (Batch request/sec – Initial Compilation/sec)/Batch requests/sec

4.2.3              The % of Cxpacket/wait in parallel execution should be <5% - parallel execution in OLTP is bad

--Calculate % of expacket/wait

DECLARE @cxpacket BIGINT

DECLARE @sumwait BIGINT

 

SELECT @cxpacket=wait_time_ms

FROM sys.dm_os_wait_stats

WHERE wait_type='cxpacket'

 

SELECT @sumwait=SUM(wait_time_ms)

FROM sys.dm_os_wait_stats

 

SELECT CONVERT(NUMERIC(5,4),@cxpacket/@sumwait)

 

3)            Memory in the OLTP

4.3.1              Page life expectancy should be >300s - Check SQLServer: Buffer Manager, SQLServer: Buffer Nodes

4.3.2              Page life expectancy should not decrease over 50% - Check SQLServer: Buffer Manager

4.3.3              Memory Grants Pending should be <=1 - Check SQL Server Memory Manager: Memory Grants Pending

4.3.4              SQL Cache hit ratio should >90 – Check SQLServer: Plan Cache

4)              I/O in the OLTP

4.4.1              Average disk sec/read should be <20ms, often 4-8 ms, - Check Physical Disk

4.4.2              Average disk sec/write should be <20ms, often 1 ms,for log writing - Check Physical Disk

4.4.3              Big IOs Table Scans and Range Scans should be <1 – Check SQLServer:Access Method: Full Scans/sec and Range Scans/sec if they are high, adding proper indexes

4.4.4              These (asynch_io_completion , io_completion, logmgr, writelog, and pageiolatch_x) should not be in the top 2 list when running the following query:

SELECT TOP 2 wait_type FROM sys.dm_os_wait_stats ORDER BY wait_time_ms DESC

5)              Blocking in the OLTP

4.5.1            % of blocking should be <2%, run the following script to check

DECLARE @dbid INT

SELECT

        @dbid = DB_ID()

SELECT

        dbid = database_id,

        objectname = OBJECT_NAME(s.object_id),

        indexname = i.name,

        i.index_id,

        partition_number,

        row_lock_count,

        row_lock_wait_count,

        [block%] = CAST(100 * row_lock_wait_count / (1 + row_lock_count) AS numeric(15, 2)),

        row_lock_wait_in_ms,

        [avg row lock waits in ms] = CAST(1 * row_lock_wait_in_ms / (1 + row_lock_wait_count) AS numeric(15, 2))

FROM    sys.dm_db_index_operational_stats(@dbid, NULL, NULL, NULL) AS s,

               sys.indexes AS i

WHERE OBJECTPROPERTY(s.object_id, 'IsUserTable') = 1

AND i.object_id = s.object_id

AND i.index_id = s.index_id

ORDER BY row_lock_wait_count DESC

4.5.2            Automatically report blocking over 30s in SQL Trace, - Check sp_configure ‘blocked process threshold’

4.5.3            The average block time should be <100ms, run the script as above

4.5.4            The top 2 blocks should not have LCK_M_?? when running the following script:

SELECT TOP 2 wait_type FROM sys.dm_os_wait_stats ORDER BY wait_time_ms DESC

4.5.5              Deadlock should be less than 5 every hour. Turn on Trace flag 1204 or check SQL Trace

6)            Networking in the OLTP

4.6.1            Output queue length should be <2 – check Network Interface

4.6.2            Packets Outbound Discarded, Packets Outbound Errors, Packets Received Discarded, Packets Received Errors – check Network Interface.

5.  How to Improve Performance of an OLAP system?

1)            Database Design

5.1.1            More indexes are fine as most of DMLs are read only, covered index is recommended.

5.1.2            Reduce the fragmentation <25% by using rebuilding indexes

USE master --改为你要扫描索引碎片的那个数据库

DECLARE @dbid INT

SELECT

    @dbid = DB_ID()

SELECT

    *

FROM sys.dm_db_index_physical_stats(@dbid, NULL, NULL, NULL, NULL)

WHERE avg_fragmentation_in_percent > 25

ORDER BY avg_fragmentation_in_percent DESC

 

5.1.3            Add proper indexes

--Find the possible missing indexes in the current database

SELECT

    d.*,

    s.avg_total_user_cost AS avgtotalusercost,

    s.avg_user_impact AS avguserimpact,

    s.last_user_seek AS lastuserseek,

    s.unique_compiles AS uniquecompile

FROM sys.dm_db_missing_index_groups g

INNER JOIN sys.dm_db_missing_index_group_stats s

    ON s.group_handle = g.index_group_handle

INNER JOIN sys.dm_db_missing_index_details d

    ON g.index_handle = d.index_handle

WHERE s.group_handle = g.index_group_handle

AND d.index_handle = g.index_handle

ORDER BY s.avg_user_impact DESC

 

--推荐建立索引的字段

 DECLARE @handle INT

SELECT @handle=d.index_handle

FROM sys.dm_db_missing_index_group_stats s,

 sys.dm_db_missing_index_groups g,

 sys.dm_db_missing_index_details d

 WHERE s.group_handle=g.index_group_handle

 AND d.index_handle=g.index_handle

 SELECT * FROM sys.dm_db_missing_index_columns(@handle)

 ORDER BY column_id

 

5.1.4            Do not add useless indexes

 

2)            CPU Threshold in the OLAP

5.2.1              Signal waits should be <25%

SELECT CONVERT(NUMERIC(5,4),SUM(signal_wait_time_ms)/SUM(wait_time_ms))

FROM sys.dm_os_wait_stats

5.2.2              Re-use rate of execution plans in OLTP should be <25% (precise execution plan is much more important than compile/recompile), computed from SQL Server Statistics

Initial compilation/sec = SQL Compilation/sec – SQL Re-compilation/sec

Re-use rate of execution plans = (Batch request/sec – Initial Compilation/sec)/Batch requests/sec

5.2.3              The % of Cxpacket/wait in parallel execution should be >10% - parallel execution in OLAP is popular

--Calculate % of expacket/wait

DECLARE @cxpacket BIGINT

DECLARE @sumwait BIGINT

 

SELECT @cxpacket=wait_time_ms

FROM sys.dm_os_wait_stats

WHERE wait_type='cxpacket'

 

SELECT @sumwait=SUM(wait_time_ms)

FROM sys.dm_os_wait_stats

 

SELECT CONVERT(NUMERIC(5,4),@cxpacket/@sumwait)

 

3)            Memory in the OLTP

5.3.1              Page life expectancy should not decrease over 50% (data paging is normal in OLAP, but often decrease indicates memory problem, try to see if you can improve by optimizing indexes) - Check SQLServer: Buffer Manager

5.3.2              Memory Grants Pending should be <=1 - Check SQL Server Memory Manager: Memory Grants Pending

4)              I/O in the OLAP – IO is more than in OLTP, but much IO is still a problem.

5.4.1            Average disk sec/read should be <20ms, often 4-8 ms, - Check Physical Disk

5.4.2            Average disk sec/write should be <20ms, often 1 ms,for log writing - Check Physical Disk

5.4.3            Big IO Scans should be <1 (large scan indicates missing indexes) – Check SQLServer:Access Method: Full Scans/sec and Range Scans/sec if they are high, adding proper indexes

5.4.4            These (asynch_io_completion , io_completion, logmgr, writelogand pageiolatch_x) should not be in the top 2 list when running the following query:

SELECT TOP 2 wait_type FROM sys.dm_os_wait_stats ORDER BY wait_time_ms DESC

5.4.5            Locks in the OLAP

i.      % of locking should be <2%, run the following script to check

DECLARE @dbid INT

SELECT

        @dbid = DB_ID()

SELECT

        dbid = database_id,

        objectname = OBJECT_NAME(s.object_id),

        indexname = i.name,

        i.index_id,

        partition_number,

        row_lock_count,

        row_lock_wait_count,

        [block%] = CAST(100 * row_lock_wait_count / (1 + row_lock_count) AS numeric(15, 2)),

        row_lock_wait_in_ms,

        [avg row lock waits in ms] = CAST(1 * row_lock_wait_in_ms / (1 + row_lock_wait_count) AS numeric(15, 2))

FROM    sys.dm_db_index_operational_stats(@dbid, NULL, NULL, NULL) AS s,

               sys.indexes AS i

WHERE OBJECTPROPERTY(s.object_id, 'IsUserTable') = 1

AND i.object_id = s.object_id

AND i.index_id = s.index_id

ORDER BY row_lock_wait_count DESC

ii.    Automatically report locks over 30s in SQL Trace, - Check sp_configure ‘blocked process threshold’

iii.  The average lock time should be <100ms, run the script as above

iv.  The top 2 locks should not have LCK_M_?? when running the following script:

SELECT TOP 2 wait_type FROM sys.dm_os_wait_stats ORDER BY wait_time_ms DESC