1. Is blocking is caused by application? If yes,
a. Error
messages from the applications?
b. In
which step does the error occur? What does it do? Remember the slow response
and no response are not necessarily caused by SQL Server.
c. If
it is a SQL Server error, blocking more likely raise the timeout errors. If
something else, maybe blocking are not the reason.
d. If
the problem is caused by blocking, what are the symptoms?
i. When
does it happen
ii. How
long?
iii. Related
to workload?
iv. Related
to certain tasks?
v. How
is it gone, automatically or you have to restart SQL Server?
2. Are
the problems are caused by SQL Server? If yes, what do we want to know for troubleshooting?
If it is a
SQL Server error, blocking more likely raises the timeout errors. If something
else, maybe blocking is not the reason.
--Let’s test it
--Window 1:
USE [AdventureWorks]
BEGIN TRAN
UPDATE [dbo].[Employee_Demo_Heap]
SET [Title]='aaa'
WHERE [EmployeeID]=70
UPDATE [dbo].[Employee_Demo_BTree]
SET [Title]='aaa'
WHERE [EmployeeID]=70
SELECT [EmployeeID],[LoginID],[Title] FROM [dbo].[Employee_Demo_Heap]
WHERE [EmployeeID]=70
--Window 2. It should be blocked as it uses a table scan.
USE [AdventureWorks]
SELECT [EmployeeID],[LoginID],[Title] FROM [dbo].[Employee_Demo_Heap] WHERE
[EmployeeID] IN (3,30,)
a. Any blocking?
– check the blocked column from sysprocesses.
If yes,
--Window 1. If blocked !=0
SELECT * FROM master.[sys].[sysprocesses]WHERE spid>50
/*
51 is blocked by 54, while 54 is not blocked. 54 is the root lock.
kpid = thread is
spid kpid blocked
51 15348 54
52 0 0
53 0 0
54 15192 0
*/
--Window 1:
--FIND THE DATABASE
SELECT name,dbid FROM sys.[sysdatabases] WHERE
dbid=8
b. When? – checking the
waittime column from sysprocesses
c. Which Databases, tables, or resources?
--FIND THE DATABASE
SELECT name,dbid FROM sys.[sysdatabases] WHERE
dbid=8
--Find the table or other resources
--Option 1 - Find the object ID is locked. This
script only works for a single object blocked
-- If more than one object, find the rsc_objid
then use Select Object_Name(rsc_objid), one by one.
SELECT CONVERT(SMALLINT, [req_spid]) AS spid ,
[rsc_dbid] AS dbid ,
[rsc_objid] AS objid ,
o.name AS name,
[rsc_indid] AS indid ,
SUBSTRING(v.[name], 1, 4) AS TYPE ,
SUBSTRING([rsc_text], 1, 32) AS RESOURCE ,
SUBSTRING(u.[name], 1, 8) AS mode ,
SUBSTRING(x.[name], 1, 5) AS status
FROM master.[dbo].[syslockinfo] ,
master.[dbo].[spt_values] v
,
master.[dbo].[spt_values] x
,
master.[dbo].[spt_values] u,
sys.objects o
WHERE master.[dbo].[syslockinfo].[rsc_type]
= v.[number]
AND v.[type] = 'LR'
AND master.[dbo].[syslockinfo].[req_status]
= x.[number]
AND x.[type] = 'LS'
AND master.[dbo].[syslockinfo].[req_mode]
+ 1 = u.[number]
AND u.[type] = 'L'
AND SUBSTRING(x.[name], 1, 5) = 'WAIT'
AND o.object_id=rsc_objid
ORDER BY [SPID]
--Option 2: Find the object IDs are locked using
sp_lock
--This option insert the returned results from
sys.sp_lock into a temp table
--using OPENROWSET. Need to turn on the option
first.
sp_configure 'Show Advanced Options',
1
GO
RECONFIGURE
GO
sp_configure 'Ad Hoc Distributed Queries',
1
GO
RECONFIGURE
GO
SELECT * INTO #TestTable FROM OPENROWSET('SQLNCLI', 'Server=localhost\SQL5Instance1;Trusted_Connection=yes;',
'EXEC sys.sp_lock')
--the blokced ones have the 'WAIT' status
SELECT *
FROM #TestTable
WHERE STATUS='WAIT'
--find the object name from object ID
SELECT OBJECT_NAME(1179)
--find the indexes of a table by using object_ID
SELECT * FROM sys.[indexes] WHERE
[object_id]=1179
d. From whom? - check the hostname, program_name, loginname
etc. from sysdatabases
e.
How
did the blocking occur? Why so?
i. Blocking
may occur in the running statements of the current connection – use the
following script to find all of the running connections including blocked or
suspended ones. The idle/sleep connections will not be returned.
SELECT p.[session_id] ,
p.[request_id] ,
p.[start_time] ,
p.[status] ,
p.[command] ,
p.[blocking_session_id] ,
p.[wait_type] ,
p.[wait_time] ,
p.[wait_resource] ,
p.[total_elapsed_time]
,
p.[open_transaction_count] ,
p.[transaction_isolation_level] ,
SUBSTRING(qt.[text], p.[statement_start_offset] /
2,
( CASE WHEN p.[statement_end_offset]
= -1
THEN LEN(CONVERT(NVARCHAR(MAX), qt.[text])) * 2
ELSE p.[statement_end_offset]
END - p.[statement_start_offset] )
/ 2) AS 'sql statement' ,
p.[statement_end_offset] ,
batch = qt.[text]
FROM master.sys.[dm_exec_requests] p
CROSS APPLY [sys].[dm_exec_sql_text](p.[sql_handle]) AS qt
WHERE p.[session_id]
> 50
ii. Blocking
may occur before the running statements of the current connection – use SQL Trance
before blocking occur. If SQL Trace did not start, we would not know. But we
can find the last executed statement before it became idle using DBCC
INPUTBUFFER.
/*
运行DBCC INPUTBUFFER(SPID)可以获得从客户端发送到SQL实例的最后一个批处理语句。这句话的优点是不管连接是否正在运行,都会返回结果。缺点是他返回的是整个批处理语句 (All statements in a batch),而不是当前正在执行的子句。所以对于正在运行的连接,第一种方法比较好。*/
DBCC INPUTBUFFER(@spid)
f.
What
is the current status of the root lock?
Check kpid and waittype
in the sysprocesses.
If both are 0, the thread is idle or in
sleeping.
If
kpid !=0 and waittype=0, it is running or runnable.
If
both are not 0, the thread is running or suspended/blocked waiting for
resources.
i. Is it running or sleeping? If it is
running/runnable, why not finishing so long?
Check the Open_tran column, if it is
!=0, it has an open tran
ii. If it is sleeping, why not releasing
the locks
If both are 0, the thread is idle or in
sleeping. It should release the resources, if not, check the open_tran column.
If it is >0, it indicates an open tran has not been committed. Then use DBCC
INPUTBUFFER to find the statement.
g.
What
did the blocked connections try to do? Why did they also need to use the locked
resources?
i. First,
find the blocked connections
SELECT CONVERT(SMALLINT, [req_spid]) AS spid ,
[rsc_dbid] AS dbid ,
[rsc_objid] AS objid ,
o.name AS name,
[rsc_indid] AS indid ,
SUBSTRING(v.[name], 1, 4) AS TYPE ,
SUBSTRING([rsc_text], 1, 32) AS RESOURCE ,
SUBSTRING(u.[name], 1, 8) AS mode ,
SUBSTRING(x.[name], 1, 5) AS status
FROM master.[dbo].[syslockinfo] ,
master.[dbo].[spt_values] v
,
master.[dbo].[spt_values] x
,
master.[dbo].[spt_values] u,
sys.objects o
WHERE master.[dbo].[syslockinfo].[rsc_type]
= v.[number]
AND v.[type] = 'LR'
AND master.[dbo].[syslockinfo].[req_status]
= x.[number]
AND x.[type] = 'LS'
AND master.[dbo].[syslockinfo].[req_mode]
+ 1 = u.[number]
AND u.[type] = 'L'
AND SUBSTRING(x.[name], 1, 5) = 'WAIT'
AND o.object_id=rsc_objid
ORDER BY
[SPID]
ii. Then
compare the returned result with that from sp_lock to estimate if the number of
locks requested is reasonable.
iii. If
not, try to optimize the indexes and queries.
3. Five
common types of lock problems and solutions
Type
|
waittype
|
open_tran
|
status
|
Auto-disappear
|
Reasons
|
Solutions
|
1
|
!=0
|
>=0
|
runnable
|
Yes, when the execution is done.
|
Just need to take time to finish.
Waiting for system resources such as IO/CPU/Memory
|
1. Optimize database design and query
2. Scale up - Solve the hardware bottleneck
3. Scale out – separate OLAP from OLTP
|
2
|
0x0000
|
>0
|
sleeping
|
No. can kill it
|
1. The client has a command timeout (ADO.NET often sets 30s,
if an execution is not finished within 30s, SQL Server stops the
statement/batch execution, but keep the connection open for the client), or the
client has cancelled the execution of the last statement, but have not rolled
back the open tran. There will an Attention event in SQL Trace.
2. SET IMPLICIT_TRANSACTION ON (Some DB connection drivers or
ActiveX use this mode, but application failed)
|
1. The application program can: (a) catch up the error and
handle it such as “IF @@TranCount>0 Rollback Tran, or (b) SET XACT_ABORT
ON to roll back the entire tran., or (c) use sp_reset_connection
2. If caused by implication transaction on, (a) let SQL
Server rather than DB drivers or ActiveX to handle events if possible, (b)
Set it off and use the auto-commit mode
|
3
|
0x0000
0x0800 0x0063 |
>=0
|
runnable
|
No, until you rollback/disconnect/kill (up to 30s)
|
The client has not received all
of the results before the timeout. Althugh open_tran=0 and
TIL is READ COMMITTED, the resource is still locked.
|
1. Return only what needed, in small chunks if possible
2. Separate OLAP and OLTP
3. Use Read Uncommitted
|
4
|
0x0000
|
>0
|
rollback
|
Yes
|
The blocking spid is always in
rollback. This is because SQL Server need to roll back all of the uncommitted
trans for data consistency when disconnected or a spid is killed. Some roll
back may take hours. You may choose to shutdown SQL Server to stop the
rollback, but the rollback still needs to be done after restart.
You will see an Attention in SQL
Trace, indicating the client has a timeout or is rolling back.
|
1. Make the transactions small
2. Do the large chunk operation at off-peak time.
3. If it has been in rollback for a long time, just let it
go and wait for its completion.
|
5
|
various
|
>=0
|
runnable
|
No, until you rollback/disconnect/kill (up to 30s)
|
The application program causes
deadlock, but it appears as a lock in SQL Server. The hostname for the
blocking and blocked connections are the same.
|
Set a timeout in the application
program, and
catch and handle the command
timeout error.
|
4. How
to reduce blocking in application programs?
a. Blocking
is often caused by application programs, rather than by SQL Server itself. Many
factors in the application may influence blocking in SQL Server. Some of them
are:
i. Use
which transaction isolation level?
ii. When
to start and end a transaction?
iii. How
to connect and discharge the connection?
iv. How
complicated of the commands?
v. The
size of returned results
vi. How
long of the execution time?
vii. How
to catch and handle the timeout command?
viii.
Does the application program use a
database control component? If yes, understand it and use it carefully
1. How
does it establish connection, begin a transaction, and return a result set?
2. Does
it use implicit transaction?
3. Does
it use cursors?
4. Does
it use Unicode as the data type of passing parameters?
5. Which
T. I. L. does it use?
b. Recommendations:
i. Choose
the lowest T. I. L. that meets the business requirement
ii. Control
the granularity and complexity of transactions properly
iii. Catch
and handle timeout errors
iv. Good
database and query design
v. Separate
OLAP from OLTP
5. Connection
Pooling and Blocking
a. ADO
and ADO.NET uses connection pooling as the default
b. A
connection pool can be used by multiple users. sp_reset_connection can be used
to reset a connection if neccessary
c. Advantages
i. Reduce
physical connections
ii. Increase
response speed
iii. Lessen
SQL Server workload
d. Disadvantages
i. Bring
extra problems on blocking and others if the errors and exceptions are not
handled well in the applications
ii. Hard
to debug if problems occur in the application when the application codes handle
the transaction
e. Recommendations
i. It’s
better to let T-SQL handle the transactions. If the application codes have to
be used, make sure to add extra codes to catch/handle errors, to make sure the
tran is committed when the connection is closed.
6. Troubleshooting
Deadlocks
a. Deadlocks
can be on:
i. Locks
(on row, page, meta-data, or application programs)
ii. Queued
threads (one thread is sleeping, whereas other threads are waiting for it)
iii. Memory
(on parallel queries)
iv. Deadlock
on threads when concurrent execution of a query
b. Deadlock
detection mechanism
i. Runs
every 5s
ii. The
victim is the one can be rolled back with less resources
c. Locking,
Blocking, and Deadlock
i. LOCKING occurs when a connection needs access
to a piece of data in database. It is necessary for SQL Server to manage
concurrent connections.
ii. Blocking occurs when one SPID holds a lock on a specific
resource and a second SPID attempts to acquire a conflicting lock type on the
same resource. By default, it will wait forever unless you set Lock_Timeout
or exceeds the timeout limit set up by application.
SET LOCK_TIMEOUT timeout_period -
Specifies the number of milliseconds a statement
waits for a lock to be released. A
value of -1 (default) indicates no time-out period (that is, wait forever). A
value of 0 means to not wait at all and return a message as soon as a lock is
encountered. When a wait for a lock exceeds the time-out value, an error is
returned.
iii. A deadlock occurs when two or more tasks permanently block each
other by each task having a lock on a resource which the other tasks are trying
to lock (i.e., mutual blocking)
d. Diagnosing
i. Trace
Flag 1204(for SQL 2000) and Trace 1222(for SQL 2005+) – in Errorlog
DBCC
TRACEON (1222,-1)
ii. SQL
Server Profiler – Locks – Deadlock Graph (+ 1222 Flag or SQL Trace if necessary)
e. Best
Practices
i. Visit
objects in the same order
ii. Minimize
the interdependency of transactions
iii. Keep
transaction short and in batches (instead of multiple parallel execution threads.
The batch approach will minimize the network transmission)
iv. Use
lower transaction isolation level
v. Optimize
the execution plans to reduce the locks
vi. In
some cases of high concurrency, deadlock is unavoidable, one workaround is to
increase the granularity to a higher level, making one spid is blocked first
(i.e., use blocking to replace deadlock)
f. Case
study
i. Run
the codes in window 1 first and then that in window 2
--In windows 1
DBCC TRACEON (1222,-1) –record the deadlock info
in errorlog
USE [AdventureWorks]
GO
SET NOCOUNT ON
GO
--Run this one first
WHILE 1 = 1
BEGIN
BEGIN TRAN
UPDATE [dbo].[Employee_Demo_Heap]
SET
[BirthDate] = GETDATE()
WHERE
[NationalIDNumber] = '480951955'
SELECT *
FROM [dbo].[Employee_Demo_Heap]
WHERE
[NationalIDNumber] = '480951955'
COMMIT TRAN
END
--in Window 2
USE [AdventureWorks]
GO
SET NOCOUNT ON
GO
--then run the code below
WHILE 1 = 1
BEGIN
BEGIN TRAN
UPDATE dbo.[Employee_Demo_Heap]
SET
[BirthDate] = GETDATE()
WHERE
[NationalIDNumber] = '407505660'
SELECT *
FROM [dbo].[Employee_Demo_Heap]
WHERE
[NationalIDNumber] = '407505660'
COMMIT TRAN
END
ii. Due
to the table scan on the heap table, the two non-related queries will result in
a deadlock. From the errors recorded by trace flag 1222 in error log, you will
find the locked RIDs. Then use DBCC TRACEON (3604) DBCC (dbid, fileid, pagenum,
3) to find details about the page.
iii. Options:
1. Optimize
the index – create a nonclustered index on NationalIDNumber.
USE [AdventureWorks]
CREATE NONCLUSTERED
INDEX NationalIDNumber ON
[Employee_Demo_Heap]([NationalIDNumber] ASC)
a. Pros:
Increase the efficiency and eliminate the deadlock, highly recommended. If not feasible,
try the other three “rude” options below.
b. Cons:
the rows in the two connections not overlapping, the permission to add an index,
may have other negative impacts.
2. Use
‘with (nolock)’ - let SELECT does not put S lock on the queried table
WHILE 1 = 1
BEGIN
BEGIN TRAN
UPDATE [dbo].[Employee_Demo_Heap]
SET
[BirthDate] = GETDATE()
WHERE
[NationalIDNumber] = '480951955'
SELECT *
FROM [dbo].[Employee_Demo_Heap] WITH ( NOLOCK )
WHERE
[NationalIDNumber] = '480951955'
COMMIT TRAN
END
a. Pros:
instant
b. Cons:
possible dirty reads, only good for S locks, need the permission to change
query in SQL. Developers may need to use the dynamic SQL in the application
codes.
DECLARE @a NVARCHAR(4000)
SET @a='SELECT * FROM
[dbo].[Employee_Demo_Heap] WITH (NOLOCK)
WHERE [NationalIDNumber]='+'480951955'
EXEC(@a)
3. Increase
lock granularity such as using PAGLOCK instead of the RIDs on the same page –
making a deadlock become a blocking
WHILE 1 = 1
BEGIN
BEGIN TRAN
UPDATE [dbo].[Employee_Demo_Heap] WITH ( PAGLOCK )
SET
[BirthDate] = GETDATE()
WHERE
[NationalIDNumber] = '480951955'
SELECT *
FROM [dbo].[Employee_Demo_Heap] WITH ( PAGLOCK )
WHERE
[NationalIDNumber] = '480951955'
COMMIT TRAN
END
a. Pros:
no need to change the T.I.L.
b. Cons:
easily blocked
4. Use
SNAPSHOT transaction isolation level before the transaction
SET TRANSACTION ISOLATION
LEVEL SNAPSHOT
GO