Note: This post elaborates and demonstrates the example in the
book “Microsoft
SQL Server企业级平台管理实践” by 徐海蔚 (Microsoft SQL Server Enterprise Platform Management by Haiwei Xu: 1.1.5 案例:通过脚本监视 tempdb空间使用))。
1. Challenges
a. Some objects such as #tempTable
or @tableVariable are reflected in sys.partitions and sys.allocation_units.
But others such as the internal objects or version-based storage used by SQL
Server are not reflected in sys.partitions and sys.allocation_units, thus, we cannot use sp_spaceused, which employs sys.partitions
and sys.allocation_units for the calculation, to find
the precise information about the space usage by temp DB.
b. DBCC ShowFileStats
only gives us the overall information, but not the detailed breakdowns.
USE tempdb
GO
DBCC ShowFileStats
/*
Fileid FileGroup TotalExtents
UsedExtents
Name FileName
1 1
128
39
tempdev D:\xxx\tempdb.mdf
*/
c. Sys.dm_dm_file_space_usage can
give us the snapshot information on the space used by the user objects,
internal objects, or version-based objects in the temp DB. By itself alone,
however, we cannot monitor the space usage in the tempDB
dynamically. But, but, but, we can keep running the query periodically to
monitor the space usage in the tempDB by SQL Server.
This DMV serves the foundation of the approach presented in this post.
USE tempdb
GO
SELECT * FROM sys.dm_db_file_space_usage
/****manually edited ******
database_id
2
file_id
1
filegroup_id
1
total_page_count
1024
allocated_extent_page_count 312
unallocated_extent_page_count 712
version_store_reserved_page_count 0
user_object_reserved_page_count 128
internal_object_reserved_page_count 8
mixed_extent_page_count
176
*/
2. How
to monitor the space usage in temp DB?
a. Scenario:
Suppose you just notice a sudden huge increase in the temp DB, what causes it?
Is it normal or something wrong? How to monitor the SQL Server behavior?
b. Option
1 – SQL Profiler or SQL Trace. The problem with this option is that we need to
keep the SQL Trace or SQL Profiler always on as we usually do not know when the
increase occurs. Obviously, there is a big negative impact on performance,
which makes it less optimal, especially on a busy box.
c. Option 2 – A less impact method is to run the DBCC/DMV/DMF
command at a certain interval and output the result to a file for a later
analysis. This option is recommended on a heavy system and demonstrated below.
3. How
to use the script to monitor the space usage in the temp DB?
a. Scenario: Suppose we need to perform the following
queries. Of course, we can do a quick qualitative analysis as below. But how do
we monitor the use of space in tempDB quantitatively?
-- First, open a new query window. Let’s say: Connection A (spid=55 in my case)
select @@spid
go
use adventureworks2012
go
select getdate()
go
-- create a temp table
-- This operation should employ user objects page
select * into #mySalesOrderDetail
from Sales.SalesOrderDetail
go
waitfor delay '0:0:2'
select getdate()
go
-- drop the temp table
-- This operation should cause the user object page decreased
drop table #mySalesOrderDetail
go
waitfor delay '0:0:2'
select getdate()
go
-- This join should use many internal objects.
select top 100000 *
from [Sales].[SalesOrderDetail]
INNER JOIN [Sales].[SalesOrderHeader]
ON [Sales].[SalesOrderHeader] .[SalesOrderID] = [Sales].[SalesOrderDetail].[SalesOrderID];
select getdate()
go
-- After the join, the use of the internal objects page should
dramatically decrease
SELECT getdate()
go
b. Use the following the script to monitor
-- Purpose: To monitor the space usage in the tempdb
--Step 1: In a separate connection, let’s say connection B (spid=53 in my case) ,choose Query-Results to file, run the
entire script below. Choose a location and file name for saving the query
result. Keep it running until the query in connection
A is done.
USE tempdb
GO
WHILE 1 = 1
BEGIN
SELECT
GETDATE()
-- check the space usage for tempdb at
the database level
dbcc showfilestats
-- Query 1
-- return the reserved spaces in KB for the user
objects, internal objects, version-stored objects and free space in the
file for tempdb.
SELECT
'Tempdb' AS DB,
GETDATE() AS Time,
SUM(user_object_reserved_page_count) * 8 AS user_objects_kb,
SUM(internal_object_reserved_page_count) * 8 AS internal_objects_kb,
SUM(version_store_reserved_page_count) * 8 AS version_store_kb,
SUM(unallocated_extent_page_count) * 8 AS freespace_kb
FROM sys.dm_db_file_space_usage
WHERE database_id = 2
-- Query 2
-- This query returns the allocated and deallocated pages of the
internal and user objects for the sessions utilizing the tempdb
SELECT
t1.session_id,
t1.internal_objects_alloc_page_count,
t1.user_objects_alloc_page_count,
t1.internal_objects_dealloc_page_count,
t1.user_objects_dealloc_page_count,
t3.*
FROM sys.dm_db_session_space_usage t1 -- Returns the number of pages allocated and deallocated by each
session for the tempdb.
JOIN sys.dm_exec_sessions AS t3 -- Returns one row per authenticated session, it shows information
about all active user connections and internal tasks.
ON t1.session_id = t3.session_id
WHERE (t1.internal_objects_alloc_page_count > 0
OR t1.user_objects_alloc_page_count > 0
OR t1.internal_objects_dealloc_page_count > 0
OR t1.user_objects_dealloc_page_count > 0)
-- Query 3
-- Returns the queries that are running and have utilized the tempdb
SELECT
t1.session_id,
st.text
FROM sys.dm_db_session_space_usage AS t1,
sys.dm_exec_requests AS t4
CROSS APPLY sys.dm_exec_sql_text(t4.sql_handle) AS st
WHERE t1.session_id = t4.session_id
AND t1.session_id > 50
AND (t1.internal_objects_alloc_page_count > 0
OR t1.user_objects_alloc_page_count > 0
OR t1.internal_objects_dealloc_page_count > 0
OR t1.user_objects_dealloc_page_count > 0)
WAITFOR DELAY '0:0:1'
END
--Step 2: run the
queries in connection A (@spid=55). The result is as
below. Notice the four time stamps from the result:
2014-08-26 12:29:14
2014-08-26 12:29:16
2014-08-26 12:29:18
2014-08-26 12:29:24
After the queries in connection A are done, manually
cancel the running query in connection B. The usage information of the spaces
in the tempdb in connection A should be captured in
the file produced by the queries in connection B.
--Step 3: Now is the hard part: to find and interpret the
information captured in the file. We need a text editor for that purpose. We
choose UltraEdit.
--Step 3a. Open the
file (checkTempUsage.rtf in our case) in UltraEdit.
We will check the result from DBCC ShowFileStats
first. On the menu, Search|Find|enter ‘tempDev’ and select ‘列出包含字符串的行’. That should return 36 rows.
--Step 3a1. In the panel of the 36 rows, right click, copy to
clipboard, paste it a text file. Then manually copy the header information and
paste it this .txt file as below. Save it.
Fileid FileGroup TotalExtents UsedExtents Name
Filename
--------- --------- -------------- ----------
-------- --------
1 1
251
42
tempdev D:\xxx\tempdb.mdf
1 1
251
42
tempdev D:\xxx\tempdb.mdf
---Other
34 rows are omitted. The filename is not fully presented------
--Step 3a2. If we do not need the filename column, we can open the
newly saved .txt file in UltraEdit, from the menu Edit|Select Range, enter the beginning and ending numbers
for rows and columns. Copy the result to a new .txt file. That’s the result from the DBCC ShowFileStats
command that we will examine on.
Fileid
FileGroup TotalExtents
UsedExtents
Name
----------- -----------
-------------------- -------------------- ---------
********the 1st 16
rows, identical to the 17th row, are omitted******
1
1
251
42
tempdev
1
1
251
229
tempdev (1st increase
– temp table created)
1
1
251
229
tempdev
1
1
251
229
tempdev
1
1
251
42
tempdev
(1st decrease – temp table dropped)
1
1
251
57
tempdev
1
1
251
116
tempdev
1
1
251
144
tempdev
(2nd increase – a join query)
1
1
251
144
tempdev
1
1
251
144
tempdev
1
1
251
144
tempdev
1
1
251
42
tempdev
(2nd decrease – the query done)
*****Eight identical
rows are omitted********
--------------------------------------------------------------------------------------------------------------------------------
--Step 3b. Now let’s check the result from query 1 in the monitoring connection (spid=53). On the menu, Search|Find|enter ‘tempdb 2014’ and select ‘列出包含字符串的行’. That should return 36 rows as well.
Manually adding the header information.
DB
Time
user_objects_kb internal_objects_kb version_store_kb
freespace_kb
…..omitted 21 rows,
identical to the 22nd row……….
Tempdb 2014-08-26 12:29:14.237
1024
256
0
13376
(1st increase
from 1024 to 12992 due to creation of the temp table)
The
time is slightly after 2014-08-26 12:29:14
Notice
the increase occurs in the time period of 12:29:14 - 12:29:15
Tempdb 2014-08-26
12:29:15.240 12992
256
0
1408
Tempdb 2014-08-26
12:29:16.243
12992
256
0
1408
Tempdb 2014-08-26
12:29:17.243
12992
256 0
1408
(1st decrease
from 12992 to 1024 due to dropping of the temp table)
The
time is slightly after 2014-08-26 12:29:16
Notice
the decrease occurs in the time period of 12:29:17 - 12:29:18
Tempdb 2014-08-26 12:29:18.247
1024
256
0
13376
Tempdb 2014-08-26
12:29:19.250
1024
1216
0
12416
Tempdb 2014-08-26
12:29:20.270
1024
4992
0
8640
(2nd increase
from 256 to 6784 for using internal objects by the join)
The
time is slightly after 2014-08-26 12:29:18
Notice
the increase occurs in the time period of 12:29:18 - 12:29:21
Tempdb 2014-08-26
12:29:21.273
1024
6784
0
6848
Tempdb 2014-08-26
12:29:22.277
1024
6784
0
6848
Tempdb 2014-08-26
12:29:23.280
1024
6784
0
6848
Tempdb 2014-08-26
12:29:24.300
1024
6784
0
6848
(2nd decrease
from 6784 to 256 after the join is done)
The
time is slightly after 2014-08-26 12:29:24
Notice
the decrease starts at 12:29:24.300. But we do not know when it is done exactly
Tempdb 2014-08-26
12:29:25.303
1024
256
0
13376
Tempdb 2014-08-26
12:29:26.310
1024
256 0
13376
Tempdb 2014-08-26
12:29:27.310
1024 256
0
13376
Tempdb 2014-08-26
12:29:28.317
1024 256
0
13376
Tempdb 2014-08-26
12:29:29.317
1024
256
0
13376
Tempdb 2014-08-26 12:29:30.330
1024
256
0
13376
Tempdb 2014-08-26
12:29:31.333
1024
256
0
13376
Tempdb 2014-08-26
12:29:32.337
1024
256
0
13376
Tempdb 2014-08-26
12:29:33.340
1024
256
0
13376
--Step 3c. Now let’s check the result from query 2 in the monitor connection. On
the menu, Search|Find|enter ’55 [0-9]’. 55 is the session ID for the main query. After it, there is
9 spaces, followed by a number. This is the regular expression in UltraEdit. In this way, we will find
the result beginning with session_id 55, followed by internal_object_alloc_page_counts. That should return 36
rows as well. Manually adding the header information. Save it in .txt. Open the
.txt file in UntraEdit for editing (deleting
un-wanted columns as explained below)
But, the results contain much more columns than what we
need. We can use the Column|Delete Columns….menu. Unfortunately, you can only delete 100 spaces each
time. Finally, we get what we need:
session_id
internal_objects user_objects status
last_request_start_tm
last_request_end_tm
-------- ---------------------- ------------
----------
-----------------------
---------------------
55
816
1537
sleeping
2014-08-26 12:25:45.830 2014-08-26
12:25:45.830
55
816
1537
sleeping
2014-08-26 12:25:45.830 2014-08-26
12:25:45.830
55
816
1537
sleeping
2014-08-26 12:25:45.830 2014-08-26
12:25:45.830
………………………………………………………………………………….
55
816
1537
sleeping
2014-08-26 12:25:45.830 2014-08-26 12:25:45.830
55
816
1537
running
2014-08-26 12:29:14.217 2014-08-26 12:29:14.120
55
816
3074
running
2014-08-26 12:29:14.623 2014-08-26 12:29:14.620
55
816
3074 running
2014-08-26 12:29:14.623 2014-08-26 12:29:14.620
55
816
3074
running
2014-08-26 12:29:16.697 2014-08-26 12:29:16.697
55
816
3074
running
2014-08-26 12:29:16.697 2014-08-26 12:29:16.697
55
816
3074
running
2014-08-26 12:29:18.783 2014-08-26 12:29:18.697
55
816
3074 running
2014-08-26 12:29:18.783 2014-08-26 12:29:18.697
55
816
3074
running
2014-08-26 12:29:18.783 2014-08-26 12:29:18.697
55
816
3074
running
2014-08-26 12:29:18.783 2014-08-26 12:29:18.697
55
816 3074
running
2014-08-26 12:29:18.783 2014-08-26 12:29:18.697
55
816
3074
running
2014-08-26 12:29:18.783 2014-08-26 12:29:18.697
……………………………………………………………………………………
55
1632 3074
sleeping
2014-08-26 12:29:24.633 2014-08-26 12:29:24.633
What do they mean? Remember the four time periods of changes on
the space usage in the tempdb from the result in
query 1?
Result from Query
1 Result
from Query 2
12:29:14 - 12:29:15
Running!
12:29:17 - 12:29:18
Running!
12:29:18 - 12:29:21
Running!
12:29:24.300 - ???
(The last session is done somewhere between 12:29.24.300 and
12:29:24.633)
Surely, the running status for connection A (spid=55)
is on when the tempdb is utilized, no matter how we
monitor the usage. Thus, query 2 confirms the running/sleeping status of
connection A. Query 2 also produces much information we have
not discussed yet in this demo.
--Step 3d. Now let’s check the result from query
3 in the monitor connection.
From the result of query 1, we know that from 12:29:18.247 to
12:29:21.273, the internal_object_used has increased
from 256KB to 6784KB, what causes it? Let’s see which query is running on
connection A (spid=55) utilizing the space in tempdb during the time period? That’s query 3’s show!
Time
User_Obj Internal_Obj
----------------------------------------------------------------------------------
2014-08-26
12:29:18.247
1024
256
--The query running on connection A (spid=55)
during the time period is:
session_id text
---------- --------------
55 waitfor delay '0:0:2'
Time
User_Obj Internal_Obj
----------------------------------------------------------------------------------
2014-08-26
12:29:19.250
1024
1216
--The query running on connection A (spid=55)
during the time period is:
session_id text
---------- --------------
55 select top 100000 *
From [Sales].[SalesOrderDetail
INNER JOIN [Sales].[SalesOrderHeader
ON [Sales].[SalesOrderHeader].[SalesOrderID = [Sales].[SalesOrderHeader].[SalesOrderID];
Time
User_Obj Internal_Obj
----------------------------------------------------------------------------------
2014-08-26
12:29:20.270
1024
4992
--The query running on connection A (spid=55)
during the time period is:
session_id text
---------- --------------
55 select top 100000 *
from[Sales].[SalesOrderDetail]
INNER JOIN [Sales].[SalesOrderHeader]
ON [Sales].[SalesOrderHeader].[SalesOrderID] = [Sales].[SalesOrderHeader].[SalesOrderID];
Time
User_Obj Internal_Obj
----------------------------------------------------------------------------------
2014-08-26
12:29:21.273
1024
6784
The
result clearly shows that the big join query on connection A (spid=55) causes the utilization of internal objects in tempdb, which mainly occurred between 12:29:19.250 and 12:29:21.273.