Troubleshooting Memory-related Issues in SQL Server

Summary:

·        Make sure Windows itself has sufficient memory. Check it from the performance monitor

·        Make sure the memory for SQL Server is not stole by Windows or other programs

·        Set up SQL Server correctly, such as enable AWE for a 32-bit machine with memory over 4GB, for 64-bit machine, enable Lock pages in memory for the service account.

·        Check the sum for Database pages, stolen memory and multi-pages from sys.dm_os_memory_clerks

·        Check the waittype from sys.sysprocesses and other object counters in performance monitor to see where is memory exactly: database pages, stolen memory, or multi-page?

·        Check errorlog and the results from sys.dm_os_memory_clerks to decide which part experienced pressure when it occurred: database pages, stolen memory, or multi-page?

·        If the pressure is on database pages, use DMV and SQL Server trace to decide which database, which application, and which query caused the problem?

·        If the pressure is on stolen memory, what's the limit for stolen memory? Which memory clerk causes the problem? Which requests? How about the connections? Why lazy writers did not solve the problem?

·        If the pressure is on stolen memory, what's the limit for multi-page? Is SQL Server or 3rd party codes requesting multi-pages? If it's SQL Server, which memory clerk? Can you use -g parameter to delay the problem?

1.  Basic concepts

1)  Two levels: Windows OS and SQLOS (resource management in SQL Server)

2)  The 3 key ones: memory, CPU, and Disk

3)  Memory-related concepts in Windows OS

(1)          Virtual address space

i.                On a 32-bit machine, the maximum of the address is 232, that is, 4294967269B, about 4GB.

ii.              Windows will decide when to use physical memory or paging file to store the data.

(2)          Physical memory – much faster than disk

(3)          Reserved memory – a chunk of memory in virtual address space, not in the physical memory

(4)          Committed memory: first reserve, then commit, after commitment, the committed pages become effective pages in physical memory. The key idea is to reduce the use of physical memory by delaying the committed pages.

(5)          Shared memory – visible to multiple threads

(6)          Private bytes – the non-shared part in the committed memory

(7)          Working set – the part stored in the physical memory

(8)          Page fault – A page fault occurs when you visit a virtual address space which is not in the working set of physical memory. Windows will examine and handle the error.

i.                Is it access violation?

ii.              Is it hard fault? – The target page is stored in the page file in physical disk.

iii.            Is it soft fault? – The target page is in physical memory, but has not been in working set yet.

(9)          System working set

i.                Windows system uses physical memory and working set.

ii.              Include system cache, page pool, non page pool, and system mapped views

(10)     Stack

(11)     In process

(12)     Out of process

(13)     Memory leaking – keep reserving and committing, never release the memory, even it is done.

i.                Case 1: SQL Server keep requesting memory from Windows, rarely happening

ii.              Case 2: A process in SQL Server keeps requesting memory from SQL Server, typically caused by inappropriate action from the client.

4)  SQL Server can have only a maximum of 2G in 32 bit

(1)          Default: Total 4G in 32-bit Windows, 2G for Kernel Mode, 2G for User Mode (SQL Server runs in user mode)

(2)          Enhancement 1: You can change boot.ini to change kernel part to 1GB, and increase the user part to 3GB.

(3)          Enhancement 2: use Address Windowing Extension (AWE) to increase the memory to 64GB for SQL Server.

i.                Sp_configure ‘AWE enabled’, 1

ii.              Restart SQL Server

iii.            The service account for SQL Server need to have the Lock pages in memory permission in Windows

iv.            You can check if AWE is abled or not from the error log

v.              Not every memory request in SQL Server utilizes AWE, only those in the reserve-first-then-commit will use AWE.

vi.            The virtual memory space in 64-bit for a user application such as SQL Server has increased to 8TB

2.  Memory check at the operating system level

1)  Use performance monitor

2)  Need to look at trends and interactions of the object counters, not individual indices.

3)  Three perspective

(1)          Overall usage analysis

i.                COMMITTED BYTES – Total memory used by all threads, including data in the physical memory and file paging.

ii.              Commit Limit – the maximum memory available. If committed bytes is close to commit limit and the paging file cannot be automatically grown, the system cannot provide additional memory anymore.

iii.            Available Mbytes – free physical memory. SQL Server often leaves little to the operating system. The higher, the better.

iv.            Page file: %usage and page file: % peak usage – the higher, the worse.

v.              Pages/sec: the total number of pages (read and write) from the disk for windows and SQL Server and other applications. The lower, the better

(2)          Memory usage by Windows itself

i.                <1G on a 32-bit machine, 1-2 GB on a 64-bit machine.

ii.              Counters

·       Memory: Cache Bytes = the working set for the Windows system = system cache resident bytes + system driver resident bytes + system code resident bytes + pool paged resident bytes

(3)          System pool

·       Check two counters - Memory: Pool nonpaged bytes and Memory: Pool Paged Resident Bytes

3.    Check the process

·       Need to check many counters under the process object for each process, especially the private bytes and working set, to find out which process uses the largest memory, why the memory usage keep growing, or when the sudden changes happen.

4.  Memory and other system resources

1)  Memory is the most important resource

2)  Memory and disk: a busy disk=>caused memory paging or by disk itself?

3)  Memory and CPU: high CPU time under the user mode indicates no bottleneck on memory or disk, if high CPU time in the kernel mode, it often means there is memory bottleneck, but the overall CPU usage is often is <50%.

5.  Recommendation on analyzing memory-related problems

1)  32-bit o 64-bit?

2)  Counters are related, do not make decisions based on individual numbers

3)  Almost start with memory analysis

4)  Do not forget the memory used by the operating system itself

5)  Tracking the counters for each process

(1)          Private bytes keep growing?

(2)          Working set keep growing?

(3)          Handle leak or thread leak?

(4)          Does the growth related to other problems?

6)  Think memory, CPU, and disk together

6.  How SQL Server share memory with Windows and other programs

1)  First, as an application program, SQL Server is constrained by the virtual address space (e.g., 2G on a 32-bit machine)

2)  Second, SQL Server likes to occupy as much as memory possible

3)  Third, SQL Server manages memory dynamically by changing the max server memory and target server memory

7.  How to prevent the sharp decrease of memory used by SQL Server?

1)  It may happen after Windows Server 2003

2)  Often caused by Windows, hardware drivers, or other applications

3)  Solutions:

(1)          Make the SQL Server start account has the Lock page in memory permission (gpedit.msc)

(2)          Sp_configure ‘max server memory’ to leave Windows OS enough memory

(3)          Update hardware drivers to reduce the possibility caused by hardware.

8.  How to allocate memory to SQL Server reasonably

1)  Leave OS and other programs enough memory

2)  Then assign as much as memory to SQL Server

(1)          Use 64-bit if possible

(2)          Use AWE, do not use /3GB in boot.ini for a 32-bit machine

(3)          Set Max Server Memory for SQL Server such 2.8GB for a 4-GB machine, 6GB for a 8GB machine, or 3-4GB to the OS for a machine with memory over 8GB. Also deduct the part for other programs if there are any

(4)          Give the start account ‘Lock in pages in memory’ permission if SQL Server Enterprise is used.

9.  How does SQL Server manage memory?

1)  Min server memory

2)  Max server memory

3)  Set working set size (do not use it now)

4)  AWE enabled

5)  Lock pages in memory

10.      Different types of memories used in SQL Server

1)  From the perspective of purpose of usage

(1)          Database cache – for database pages

(2)          Consumers: connections, query plans, optimizer, utilities such as BCP, Log manager, backup, and others

(3)          Thread – 512 KB for each thread

(4)          Third-party codes such as COM, XP, CLR

2)  From the perspective of request way

(1)          Reserve a large chunk of memory, then commit piece by piece – Database Cache works in this way, so AWE is applicable to database cache

(2)          Direct commit from virtual memory space to physical memory – called stolen, works for others such as consumers, threads, 3rd party codes etc., cannot use AWE

3)  From the perspective of request size

(1)          Buffer pool contains many single pages (for pages <=8K)

(2)          Multi-page or MemToLeave contains multiple page allocations (for pages >8k)

4)  The relationship

Type

Database cache

Consumer

3rd party code

threads

Way of Request

     Reserved/commit

Yes

Mostly No

Mostly No

No

     Stolen

No

Yes

Yes

Yes

Size of Request

     Buffer pool(single page)

Yes

Almost

No

No

     MemToLeave(multi-page)

No

Little

All

All

 

11.      Limits on various parts of memory

1)  Not every memory request can utilize the free memory in SQL Server, especially on a 32-bit machine

2)  Case 1 – 32-bit

·        4GB – 2GB (for kernel) = 2GB (for virtual memory space)è384MB for MemToLeave or Multi-page (default 256 +256 thread*0.5=384) + 1664MB(the max for buffer pool)

3)  Case 2 – 32-bit with AWE enabled:  still a max of 384MB for MemToLeave and a max of 1.6GB for stolen memory in the buffer pool, but database cache is extended by AWE now

4)  Case 3 – 32-bit SQL Server on a 64-bit Machine:  still a max of 384MB for MemToLeave and a max of 3.6GB for buffer pool

5)  Case 4 – 32-bit SQL Server on a 64-bit Machine with AWE enabled:  still a max of 384MB for MemToLeave and a max of 3.6GB for stolen memory, use the extended AWE for database cache

6)  Case 5 – 64-bit: no limit on multi-page (MemToLeave is not applicable anymore) or single page. They can use the available space in the VAS freely. Sometimes, they even steal the space in the data cache in the buffer pool.

12.      Implications: more memory does not always work!

1)  Where is the bottleneck? Database cache, stolen, buffer pool, or mem-to-leave?

2)  Have reached the limit? Will the newly added memory be used? For instance, on a 32-bit + AWE server, the max of stolen memory is 1.6GB, if this is not enough, you have to upgrade the machine to 64-bit. Adding memory does not help

3)  Typically the newly added memory is used by buffer pool. Does the buffer pool have the memory pressure? If no, it does not help. If SQL Server has cached too many dynamic SQL execution plans, it even has a reverse effect as it has to maintain the plans which rarely to be reused.

13.      How to examine the status of memory usage in SQL Server?

1)  SQL Server Object Counters

(1)          Memory Manager

The Memory Manager object in Microsoft SQL Server provides counters to monitor overall server memory usage. Monitoring overall server memory usage to gauge user activity and resource usage can help you to identify performance bottlenecks. Monitoring the memory used by an instance of SQL Server can help determine:

·               If bottlenecks exist from inadequate physical memory for storing frequently accessed data in cache. If memory is inadequate, SQL Server must retrieve the data from disk.

·               If query performance can be improved by adding more memory or by making more memory available to the data cache or SQL Server internal structures.

SQL Server Memory Manager counters

Description

Connection Memory (KB)

Specifies the total amount of dynamic memory the server is using for maintaining connections.

Database Cache Memory (KB)

Specifies the amount of memory the server is currently using for the database pages cache.

Free Memory (KB)

Specifies the amount of committed memory currently not used by the server.

Granted Workspace Memory (KB)

Specifies the total amount of memory currently granted to executing processes, such as hash, sort, bulk copy, and index creation operations.

Lock Blocks

Specifies the current number of lock blocks in use on the server (refreshed periodically). A lock block represents an individual locked resource, such as a table, page, or row.

Lock Blocks Allocated

Specifies the current number of allocated lock blocks. At server startup, the number of allocated lock blocks plus the number of allocated lock owner blocks depends on the SQL Server Locks configuration option. If more lock blocks are needed, the value increases.

Lock Memory (KB)

Specifies the total amount of dynamic memory the server is using for locks.

Lock Owner Blocks

Specifies the number of lock owner blocks currently in use on the server (refreshed periodically). A lock owner block represents the ownership of a lock on an object by an individual thread. Therefore, if three threads each have a shared (S) lock on a page, there will be three lock owner blocks.

Lock Owner Blocks Allocated

Specifies the current number of allocated lock owner blocks. At server startup, the number of allocated lock owner blocks and the number of allocated lock blocks depend on the SQL Server Locks configuration option. If more lock owner blocks are needed, the value increases dynamically.

Maximum Workspace Memory (KB) 

Indicates the maximum amount of memory available for executing processes, such as hash, sort, bulk copy, and index creation operations.

Memory Grants Outstanding

Specifies the total number of processes that have successfully acquired a workspace memory grant.

Memory Grants Pending

Specifies the total number of processes waiting for a workspace memory grant. – a signal of bottleneck

Optimizer Memory (KB)

Specifies the total amount of dynamic memory the server is using for query optimization.

Reserved Server Memory (KB)

Indicates the amount of memory the server has reserved for future usage. This counter shows the current unused amount of memory initially granted that is shown in Granted Workspace Memory (KB).

SQL Cache Memory (KB)

Specifies the total amount of dynamic memory the server is using for the dynamic SQL cache.

Stolen Server Memory (KB)

Specifies the amount of memory the server is using for purposes other than database pages.

Target Server Memory (KB)

Indicates the ideal amount of memory the server can consume.

Total Server Memory (KB)

Specifies the amount of memory the server has committed using the memory manager. – the size of buffer pool

 

(2)          Buffer Manager

The Buffer Manager object provides counters to monitor how SQL Server uses:

·       Memory to store data pages.

·       Counters to monitor the physical I/O as SQL Server reads and writes database pages.

·       Buffer pool extension to extend the buffer cache by using fast non-volatile storage such as solid-state drives (SSD).

Monitoring the memory and the counters used by SQL Server helps you determine:

·       If bottlenecks exist from inadequate physical memory. If it cannot store frequently accessed data in cache, SQL Server must retrieve the data from disk.

·       If query performance can be improved by adding more memory, or by making more memory available to the data cache or SQL Server internal structures.

·       How often SQL Server needs to read data from disk. Compared with other operations, such as memory access, physical I/O consumes a lot of time. Minimizing physical I/O can improve query performance.

 

SQL Server Buffer Manager counters

Description

Buffer cache hit ratio

Indicates the percentage of pages found in the buffer cache without having to read from disk. The ratio is the total number of cache hits divided by the total number of cache lookups over the last few thousand page accesses. After a long period of time, the ratio moves very little. Because reading from the cache is much less expensive than reading from disk, you want this ratio to be high. Generally, you can increase the buffer cache hit ratio by increasing the amount of memory available to SQL Server or by using the buffer pool extension feature.

Checkpoint pages/sec

Indicates the number of pages flushed to disk per second by a checkpoint or other operation that require all dirty pages to be flushed.

Database pages

Indicates the number of pages in the buffer pool with database content.

Extension allocated pages 

Total number of non-free cache pages in the buffer pool extension file.

Extension free pages

Total number of free cache pages in the buffer pool extension file.

Extension in use as percentage

Percentage of the buffer pool extension paging file occupied by buffer manager pages.

Extension outstanding IO counter

I/O queue length for the buffer pool extension file.

Extension page evictions/sec

Number of pages evicted from the buffer pool extension file per second.

Extension page reads/sec

Number of pages read from the buffer pool extension file per second.

Extension page unreferenced time

Average seconds a page will stay in the buffer pool extension without references to it.

Extension pages writes/sec

Number of pages written to the buffer pool extension file per second.

Free list stalls/sec

Indicates the number of requests per second that had to wait for a free page.

Lazy writes/sec

Indicates the number of buffers written per second by the buffer manager's lazy writer. The lazy writer is a system process that flushes out batches of dirty, aged buffers (buffers that contain changes that must be written back to disk before the buffer can be reused for a different page) and makes them available to user processes. The lazy writer eliminates the need to perform frequent checkpoints in order to create available buffers.

Page life expectancy

Indicates the number of seconds a page will stay in the buffer pool without references.

Page lookups/sec

Indicates the number of requests per second to find a page in the buffer pool.

Page reads/sec

Indicates the number of physical database page reads that are issued per second. This statistic displays the total number of physical page reads across all databases. Because physical I/O is expensive, you may be able to minimize the cost, either by using a larger data cache, intelligent indexes, and more efficient queries, or by changing the database design.

Page writes/sec

Indicates the number of physical database page writes that are issued per second.

Readahead pages/sec

Indicates the number of pages read per second in anticipation of use.

 

2)  DMV:

(1)          sys.dm_os_memory_clerk – shows how memory is currently allocated and used. SQL Server 2005+ uses memory clerk to manage SQL Server memory allocation and collection

 

--In SQL Server 2012, the following columns have been renamed.

--single_pages_kb==>pages_kb, the column multi_pages_kb has been removed.

-- the new pages_kb includes both the previous single_pages and Multi_pages.

SELECT

    a.type,

    SUM(a.virtual_memory_reserved_kbAS vm_reserved,

    SUM(a.virtual_memory_committed_kbAS vm_committed,

    SUM(a.awe_allocated_kbAS awe_allcoated,

    SUM(a.shared_memory_reserved_kbAS sm_reserved,

    SUM(a.shared_memory_committed_kbAS sm_commited,

    SUM(multi_page_kbAS multipage_allocatoer,

    SUM(single_pages_kbAS sinlgepage_allocator

FROM sys.dm_os_memory_clerks a

GROUP BY a.type

ORDER BY a.type

 

(2)          sys.dm_os_buffer_descriptors - Returns information about all the data pages that are currently in the SQL Server buffer pool. 

 

--Returning cached page count for each database

SELECT COUNT(*)AS cached_pages_count

    ,CASE database_id

        WHEN 32767 THEN 'ResourceDb'

        ELSE db_name(database_id)

        END AS database_name

FROM sys.dm_os_buffer_descriptors

GROUP BY DB_NAME(database_id) ,database_id

ORDER BY cached_pages_count DESC;

 

--Returning cached page count for each object in the current database

SELECT COUNT(*)AS cached_pages_count

    ,name ,index_id

FROM sys.dm_os_buffer_descriptors AS bd

    INNER JOIN

    (

        SELECT object_name(object_idAS name

            ,index_id ,allocation_unit_id

        FROM sys.allocation_units AS au

            INNER JOIN sys.partitions AS p

                ON au.container_id = p.hobt_id

                    AND (au.type = 1 OR au.type = 3)

        UNION ALL

        SELECT object_name(object_idAS name  

            ,index_idallocation_unit_id

        FROM sys.allocation_units AS au

            INNER JOIN sys.partitions AS p

                ON au.container_id = p.partition_id

                    AND au.type = 2

    ) AS obj

        ON bd.allocation_unit_id = obj.allocation_unit_id

WHERE database_id = DB_ID()

GROUP BY name, index_id

ORDER BY cached_pages_count DESC;

 

(3)          sys.dm_exec_cached_plans - Returns a row for each query plan that is cached by SQL Server for faster query execution. You can use this dynamic management view to find cached query plans, cached query text, the amount of memory taken by cached plans, and the reuse count of the cached plans.

 

SELECT usecounts, refcounts,size_in_bytes, cacheobjtype, objtype, text

FROM sys.dm_exec_cached_plans

CROSS APPLY sys.dm_exec_sql_text(plan_handle)

WHERE usecounts > 1

ORDER BY usecounts DESC;

GO

 

14.      Troubleshooting Buffer Pool

1)  The 2nd most popular problems in SQL Server performance, blocking is the 1st.

2)  Data paging symptoms

(1)          Many lazy writes frequently: SQL Server: Buffer Manager – Lazy writes/sec

(2)          Many page reads - SQL Server: Buffer Manager – Page reads/sec. Occasionally high is fine, but frequent high indicates problems.

(3)          Stolen in the buffer pool is low as the execution plan is often cleared. Check SQL Server: Buffer Manager –Stolen pages.  If there is a memory pressure on buffer pool, SQL Server clears the execution plans in the buffer pool first, making the stolen pages sharply decreases.

(4)          Page life expectancy is often not very high and often decreasing as the cached pages are often cleared. Check SQL Server: Buffer Manager – Page Life

(5)          Run sys.sysprocessed, you will see some sessions are suspended, waiting for SYNC_IO_COMPLETION

3)  Analysis of the reasons

(1)          Caused by Windows OS and other applications

i.                How to know?

a.     Check if SQL Server: Memory Manager – Total Server Member reduced

b.    Check if the Memory:Available Mbytes has decreased to a low value

c.     If AWE or Lock page in memory is not used, then check if Process: Private Bytes – sqlservr and Process: working set – sqlservr has sharply decreased

ii.              Solution

a.      Give SQL Server more memory: Set Max server memory

b.      Install SQL Server on dedicated server

(2)          Caused by usage of database page – the total server memory for SQL Server has reached Max Server Memory or SQL Server cannot get more memory from the OS, and the data pages are much larger than buffer pool.

i.                Symptoms

a.      SQL Server: Memory Manager – Total Server Memory = SQL Server: Memory Manager – Target Server Memory (different from case 1 caused by OS or other programs)

b.      The other symptoms are similar to those in case 1

ii.              Solutions

a.      Scale up – enable AWE or adding more memory

b.      Scale out – migrate some databases to other servers

c.      Re-design the tables (partition the tables), indexes, calling procedures to reduce the use of buffer pool.

d.      Find the queries using most of the memory

·       Sys.dm_exec_query_stats - Returns aggregate performance statistics for cached query plans in SQL Server. The view contains one row per query statement within the cached plan, and the lifetime of the rows are tied to the plan itself. But it has two serious problems: (a) Some query plans may be cleared from the cache due to the memory pressure, (b) the statistics are cumulative since the startup of SQL Server. We do not know when the big query was executed.

 

--find the top 50 physical read

 

SELECT TOP 50

        qs.total_physical_reads,

        qs.execution_count,

        qs.total_physical_reads / qs.execution_count AS [Avg Physical Read IO],

        qs.sql_handle,

        qs.plan_handle,

        qs.statement_start_offset,

        qs.statement_end_offset,

        qt.dbid,

        dbname = DB_NAME(qt.dbid),

        qt.objectid, --only procedure,trigger,view,or function have an id

        SUBSTRING(qt.text,qs.statement_start_offset / 2,

               (CASE

                 WHEN qs.statement_end_offset = -1 THEN LEN(CONVERT(nvarchar(MAX), qt.text)) * 2

                 ELSE qs.statement_end_offset

                END - qs.statement_start_offset) / 2 + 1

               ) AS query_text              

FROM sys.dm_exec_query_stats qs

CROSS APPLY sys.dm_exec_sql_text(qs.sql_handleAS qt

ORDER BY qs.total_physical_reads DESC

 

 

--find the top 50 logical reads

 

SELECT TOP 50

        qs.total_logical_reads,

        qs.execution_count,

        qs.total_logical_reads / qs.execution_count AS [Avg Logical Read IO],

        qs.sql_handle,

        qs.plan_handle,

        qs.statement_start_offset,

        qs.statement_end_offset,

        qt.dbid,

        dbname = DB_NAME(qt.dbid),

        qt.objectid,

        --这里必须要加1,因为有时statement_start_offset0,那么需要取的长度要加上1,否则取出来的就会少一个字符: select substring('abc',0,3) 会输出'ab',所以要写成:select substring('abc',0,3 + 1)

        SUBSTRING(qt.text,qs.statement_start_offset / 2,

        CASE

               WHEN qs.statement_end_offset = -1

                       THEN (DATALENGTH(CAST(qt.text AS nvarchar(MAX))) - qs.statement_start_offset) / 2 + 1

               ELSE (qs.statement_end_offset - qs.statement_start_offset) / 2 + 1

        END

        ) AS query_text

FROM sys.dm_exec_query_stats qs

CROSS APPLY sys.dm_exec_sql_text(qs.sql_handleAS qt

ORDER BY qs.total_logical_reads DESC

 

·       Use SQL Trace – save it to a trace file, use fn_trace_gettable to input it into a SQL Server table, and query the table for analysis

 

SELECT * INTO #SAMPLE

FROM sys.fn_trace_gettable('C:\Users\Administrator\Desktop\1.trc',DEFAULT)

WHERE EventClass IN(10,12) -- 10=RPC:Completed, often for usp, 12=SQL BatchCompleted

 

SELECT TOP 100 * FROM #SAMPLE

 

SELECT TOP 1000 TextData,DatabaseID,HostName,ApplicationName,LoginName,SPID,

StartTime,EndTime,Duration,reads,writes,CPU

FROM #SAMPLE

-- find which client application request the large reads

SELECT DatabaseID,HostName,ApplicationName,SUM(reads) AS reads

FROM #SAMPLE

GROUP BY DatabaseID,HostName,ApplicationName

ORDER BY SUM(reads) DESC

--find the top 1000 queries based on the number of reads descending order. Note, all reads in SQL trace is logical read, not physical reads.

SELECT TOP 1000 TextData,DatabaseID,HostName,ApplicationName,LoginName,

SPID,StartTime,EndTime,Duration,reads,writes,CPU

FROM #SAMPLE

ORDER BY Reads DESC

(3)          Caused by stolen memory

i.                Symptoms

·       Rarely happening as if there is a pressure on buffer pool, stolen should clear the execution plans. However, some users may forget to close the objects such as cursors, or un-prepare the execution plans. Those cannot be automatically cleared.

·       Use sys.dm_os_memory_clerks to find them(single_pages_kb)

ii.              Solution – find out who uses so high stolen memory and why

(4)          Caused by multi-page or MemToLeave

i.                How

·       In 32-bit case, VAS is only 2GB, MemToLeave is defaulted to 384MB as a max. But if you use the –g parameter to increase the max for MemToLeave.

·       In 64-bit case, no limit for multi pages anymore, but if memory leak happens when using 3rd party codes.

ii.              Solution

·       Find out who and why use so much MemToLeave memories

 

15.      Troubleshooting Stolen Memory – use sys.dm_os_memory_clerks to find if there is a pressure and use sys.sysprocess waittype to check the details of bottlebeck

1)  Basics

(1)          Also 8k pages in buffer pool, but commit directly

(2)          Do not store cached pages, but used for query analysis, optimization, and storage of execution plans, connection, ordering and calculations, context of user safety etc.

(3)          Some part in stolen are cached such as execution plans or context of user safety, thus, the more, the better. Some parts do not need caches, such as query analysis, optimization, ordering, or hashing. Thus, although stolen <2GB on 32-bit machine, it rarely causes problems

(4)          On a 64-bit machine, no limit on VAS, the limit for stolen memory is determined by the version of SQL Server such as

i.                SQL Server 2000: up to 4GB

ii.              SQL Server 2005 RTM and SP1:

·       Target memory <8GB, Stolen up to target memory*75%

·       Target memory: 8~64 GB, Stolen up to 8GB*75% + (target memory-8GB)*50%

·       Target memory >64GB, Stolen up to 8GB*75% +(64-8)*50%+ (target memory-64GB)*25%

iii.            SQL Server 2005 SP2+:

·       Target memory <8GB, Stolen up to target memory*75%

·       Target memory: 8~64 GB, Stolen up to 8GB*75% + (target memory-8GB)*10%

·       Target memory >64GB, Stolen up to 8GB*75% +(64-8)*10%+ (target memory-64GB)* 5%

iv.            Usually 1-2GB is sufficient. If the execution plans are not repeatedly used, it’s useless if cached. That’s why sometime it is better to routinely do

·       DBCC FreeProcCacheRemoves all elements from the plan cache.

·       DBCC DROPCLEANBUFFERS - Removes all clean buffers from the buffer pool, it is used to test queries with a cold buffer cache without shutting down and restarting the server.

2)  Relationship between stolen and buffer pool for data pages

(1)          Both stolen and cached data are in buffer pool.

(2)          Normally, stolen will not steal memory from database pages as stolen releases the memory after use. Even for the cached plans, SQL Server has the aging mechanism to release the oldest unused plans under memory pressure.

(3)          Only in rare cases, when the memory in stolen is always been used and cannot be released from stolen, in that case, the stolen memory becomes larger and larger, and finally competes memory with database pages.

3)  Memory pressure

(1)          From outside – when there is a memory pressure from OS or other programs, SQL Server needs to release memory form the buffer pool, including both database pages and stolen memory.

(2)          From SQL Server

i.                Expensive query with huge database pages that buffer pool cannot handle

ii.              Eaten up by un-released objects or behaviors in stolen such as un-closed cursors.

Diagnosis: from the result of sys.dm_os_memory_clerks, check sum(pages_kb) if there is stolen memory pressure.

4)  Symptoms of Stolen Memory Pressure

(1)          Requests cannot be completed, return error 701 etc. Check errorlog

(2)          Request will be done but slow

5)  Solution - Check the waiting status in sys.sysporcesses, the solution depends on the wait type

(1)          0x00B9 – too many simultaneous requests, change the user behaviors

i.                Use USPs or parameterized T-SQL

ii.              Reduce the use of query compile

iii.            Use existing plans if possible

iv.            Avoid to a huge request of query compilation simultaneously

(2)          0x007B – the queries contain many parameters or a long IN subquery, the execution plan is too large to be in a single page, and need to be in the multi-page or MemToLeave.

i.                Do not use queries with many parameters or long “IN” subquery. Put the parameters into temp table and join these temp tables

ii.              Increase MemToLeave

iii.            Run DBCC FreeProcCache routinely or occasionally

(3)          0x011A

i.                1.      Happens when a batch or usp is extremely complex, the subsequent queries must wait unti the complex done is done.

ii.              2.      Change client query behaviors

iii.            3.      Make the query less complex

iv.            4.      Run DBCC FREEPROCCAHCE routinely

16.      Troubleshooting Multi-page (>8K pages)

1)  Basics

(1)          MemToLeave in SQL Server 2000, from 2005, it is called multi-page allocation

(2)          On a 32-bit machine, the default max is 256MB + 0.5MB* (Max threads number in sp_configue) often 256. If you need to increase the limit, use –g

(3)          On a 64-bit machine, no limit, also note the max server memory is only for buffer pool, not for multi-page memory.

(4)          Three purposes of multi-page memory

i.                For threads, each thread needs 512KB or 0.5MB

ii.              The Stolen memory over 8K  from SQL Server

iii.            3rd party codes (heaps, COM, OLE-DB, xproc etc.)

(5)          If threads over the limit, new login cannot connect SQL Server anymore.

2)  Where does the pressure come from?

(1)          Outside SQL Server: SQL CLR, Linked Server, COM, xsp

(2)          From SQL Server – when the 8K page cannot handle

i.                Queries with many parameters, long “IN” subquery

ii.              Network packet size – default to 4KB. If changed to over 8KB, SQL Server will multi-page. If you have to, test, test, and test.

iii.            Complex and heavy use of XML functions.

3)  Symptom and solution

(1)          Out of Memory – Error 701 in SQL Server 2005+ and 17803 in SQL 2000

i.                Rarely happened in 64-bit environment

ii.              Check the error log and find out the reasons

iii.            Solutions

·       Use –g parameter to increase MemToLeave

·       Restart SQL

·       Upgrade to 64-bit

·       But most often, it is caused by application programs such as a complex script, 3rd code not releasing memory, heavy use of linked server, cursors not closed in Java

(2)          Time out occurred while waiting for memory resources to execute the query – Error 8645 (insufficient memory in buffer pool)

i.                Why? SQL Server has limited free buffer pool or a complex query makes free buffer pool is not enough

ii.              Symptoms

·       Login failed

·       Connections are disconnected with various error msg

·       If you run select * from sys.sysprocesses, check waittype, you will find connection are waiting for 0x40 or 0x0040 (Resource_Semaphore)

·       SQL Server: Memory Manager – Memory Grants Pending !=0

iii.            Solutions:

·       Stop other applications competing with SQL Server

·       Increase Max Server Memory

·       Run DBCC FREESYSTEMCACHE, DBCC FREESESSIONCACHE, DBCC FREEPROCCACHE

·       Simplifying the queries that overuse memory, or adjust the application codes.

·       Run DMVs: sys.dm_exec_query_resource_semaphores and sys.dm_exec_query_memory_grants

 

(3)          Could not create a server event tread – Error 17189 in SQL 2005+ and 17802 in SQL 2000

i.                How many threads are being used (SQL itself uses about 20)?

ELECT COUNT(*) FROM sys.sysprocesses WHERE kpid<>0

 

ii.              If not, is it caused by MemToLeave?

SELECT  type ,

         SUM(virtual_memory_reserved_kbAS [vm reserved] ,

         SUM(virtual_memory_committed_kbAS [vm commited] ,

         SUM(awe_allocated_kbAS [awe allocated] ,

         SUM(shared_memory_reserved_kbAS [sm reserved] ,

         SUM(shared_memory_committed_kbAS [sm committed] ,

         SUM(single_pages_kbAS [singlepage allocator],

        SUM(multi_pages_kbAS [multi page allocated]

 FROM    sys.dm_os_memory_clerks

 GROUP BY type

 ORDER BY type