Troubleshooting and Optimizing SQL Server Queries


1.     Main Strategies
a.      Caused by I/O?
                                          i.     How to know if there are physical I/O activities?
SET STATISTICS IO ON
SET STATISTICS TIME ON
                                         ii.     If the performance problem is related to I/O, what to do?
1.      Memory bottleneck? Page out/Page In problems? If no
2.      Is the query heavily used or occasionally used? If rarely used, but with large data, suggest the client to run the query at non-peak time.
3.      Is it possible to reduce the volume of data by examining the execution plan?
4.      Hardware (disk) improvement

b.      Caused by heavy compile or recompile?
                                          i.     Focus on two types of queries
1.      Simple but frequently used ones – is it possible to reduce the compile time via database re-design
2.      Complex queries
                                         ii.     Two ways to calculate the compile time
1.      SET STATISTICS TIME ON – cannot be used for application codes
2.      SQL Trace
a.      For Batches, compile time = SQL:StmtStarting – SQL:BatchStarting
b.      For SP, compile time = SQL:StmtStarting – SP:StmtStarting
c.      For RPC, compile time = SQL:StmtStarting – RPC:Starting
d.      For dynamic queries, compile time = 2nd SP:StmtStarting (for the actual query) – 1st SP:StmtStarting (for @Exec)

USE [AdventureWorks]
 GO
DROP PROC LongCompile
 GO

CREATE PROC LongCompile ( @i INT )
AS
    DECLARE @cmd VARCHAR(MAX)
    DECLARE @j INT
    SET @j = 0
    SET @cmd = '
          SELECT COUNT(b.SalesOrderID) ,
                     SUM(p.Weight)
          FROM   dbo.SalesOrderHeader_test a
                     INNER JOIN dbo.SalesOrderDetail_test b ON a.SalesOrderID = b.SalesOrderID
                     INNER JOIN Production.Product p ON b.ProductID = p.ProductID
          where a.SalesOrderID in(43659'
    WHILE @j < @i
        BEGIN
            SET @cmd = @cmd + ',' + STR(@j + 43659)
            SET @j = @j + 1
        END
    SET @cmd = @cmd + ')'
    PRINT @cmd
    EXEC(@cmd)
 GO

 SET STATISTICS TIME ON


 EXEC LongCompile 100

                                        iii.     How to improve if the compile time is too long?
1.      Make the query shorter or simpler
2.      Delete extra indexes
3.      Reuse the plans if possible

c.      Wrong execution plan in each step?
                                          i.     Compare the actual returned rows and estimated rows.
                                         ii.     Index seek or table scan? – Generally Seek is better than Scan, but if a wrong plan is used due to parameter mismatch, scan may be better in some cases. So it depends.
                                        iii.     Right join? Is Nested Loops, Merge, or Hash Join right?
                                        iv.     Filter (i.e., the WHERE clause) after join or join after filer? - filter
                                         v.     Summary of possible reasons for using wrong execution plan
1.      No statistics or out dated – auto update or manual/job to update
2.      Complex query (e.g., complex WHERE) makes SQL Server hard to estimate the cost
3.      Use a wrong plan due to uneven data distribution
a.      Parameter sniffing – performance degradation due to wrong parameter
b.      Reason: uneven data distribution and passing parameters quite different
c.      Solutions
                                                                                                      i.     Use Exec(@cmd+@i )
  USE [AdventureWorks]
  GO
  DROP PROC NOSniff
  GO
  CREATE PROC NOSniff(@i INT)
  AS
  DECLARE @cmd VARCHAR()
  SET @cmd='
  SELECT  
  COUNT(b.[SalesOrderID]),SUM(p.[Weight])
  FROM [dbo].[SalesOrderHeader_test] a
  INNER JOIN [dbo].[SalesOrderDetail_test] b
  ON a.[SalesOrderID]=b.[SalesOrderID]
  INNER JOIN [Production].[Product] p
  ON b.[ProductID]=p.[ProductID]
  WHERE a.[SalesOrderID]='
  EXEC(@cmd+@i)
  GO
                                                                                                     ii.     Use Local Variables
  CREATE PROC Sniff2 ( @i INT )
  AS
    DECLARE @j INT
    SET @j = @i
    SELECT  COUNT(b.[SalesOrderID]) ,
            SUM(p.[Weight])
    FROM    [dbo].[SalesOrderHeader_test] a
            INNER JOIN [dbo].[SalesOrderDetail_test] b ON a.[SalesOrderID] = b.[SalesOrderID]
            INNER JOIN [Production].[Product] p ON b.[ProductID] = p.[ProductID]
    WHERE   a.[SalesOrderID] = @j
  GO
                                                                                                    iii.     Query Hint – Option(Recompile)
 USE [AdventureWorks]
 GO
 DROP PROC NoSniff_QueryHint_Recompile
 GO
 CREATE PROC NoSniff_QueryHint_Recompile ( @i INT )
AS
    SELECT  COUNT(b.[SalesOrderID]) ,
            SUM(p.[Weight])
    FROM    [dbo].[SalesOrderHeader_test] a
            INNER JOIN [dbo].[SalesOrderDetail_test] b ON a.[SalesOrderID] = b.[SalesOrderID]
            INNER JOIN [Production].[Product] p ON b.[ProductID] = p.[ProductID]
    WHERE   a.[SalesOrderID] = @i
    OPTION  ( RECOMPILE )
 GO
                                                                                                    iv.     Create Proc xxx with Recompile
 USE [AdventureWorks]
 GO
 DROP PROC NoSniff_SPCreate_Recompile
 GO
CREATE PROC NoSniff_SPCreate_Recompile ( @i INT )
    WITH RECOMPILE
AS
    SELECT  COUNT(b.[SalesOrderID]) ,
            SUM(p.[Weight])
    FROM    [dbo].[SalesOrderHeader_test] a
            INNER JOIN [dbo].[SalesOrderDetail_test] b ON a.[SalesOrderID] = b.[SalesOrderID]
            INNER JOIN [Production].[Product] p ON b.[ProductID] = p.[ProductID]
    WHERE   a.[SalesOrderID] = @i
 GO
                                                                                                     v.     Force SQL Server to use Nested Lopps/Merger/Hash Join
CREATE PROC NoSniff_QueryHint_JoinHint ( @i INT )
 AS
    SELECT  COUNT(b.[SalesOrderID]) ,
            SUM(p.[Weight])
    FROM    [dbo].[SalesOrderHeader_test] a
            INNER JOIN [dbo].[SalesOrderDetail_test] b ON a.[SalesOrderID] = b.[SalesOrderID]
            INNER HASH JOIN [Production].[Product] p ON b.[ProductID] = p.[ProductID]
    WHERE   a.[SalesOrderID] = @i
 GO
                                                                                                    vi.     Optimize For
USE [AdventureWorks]
 GO
DROP PROC NoSniff_QueryHint_OptimizeFor
 GO
CREATE PROC NoSniff_QueryHint_OptimizeFor ( @i INT )
AS
    SELECT  COUNT(b.[SalesOrderID]) ,
            SUM(p.[Weight])
    FROM    [dbo].[SalesOrderHeader_test] a
            INNER JOIN [dbo].[SalesOrderDetail_test] b ON a.[SalesOrderID] = b.[SalesOrderID]
            INNER HASH JOIN [Production].[Product] p ON b.[ProductID] = p.[ProductID]
    WHERE   a.[SalesOrderID] = @i
    OPTION  ( OPTIMIZE FOR ( @i=74 ) )
 GO
                                                                                                  vii.     Plan Guide
USE [AdventureWorks]
 GO
 EXEC [sys].[sp_create_plan_guide] @name = N'Guide1',
    @stmt = N'SELECT COUNT(b.[SalesOrderID]),SUM(p.[Weight])
 FROM [dbo].[SalesOrderHeader_test] a
 INNER JOIN [dbo].[SalesOrderDetail_test] b
 ON a.[SalesOrderID]=b.[SalesOrderID]
 INNER JOIN [Production].[Product] p
 ON b.[ProductID]=p.[ProductID]
 WHERE a.[SalesOrderID]=@i', @type = N'OBJECT', @module_or_batch = N'Sniff',
    @params = NULL, @hints = N'option(optimize for(@i=75124))';
 GO
                                                                                                 viii.     Comparisons of the methods on fighting the parameter sniffing

Method
Need to change the Proc
Need to recompile each time
Accuracy of execution plan
Exec(@DynamicSQL)
Yes
Yes
Excellent
Local variable
Yes
No
Good
Query hint+"Recompile"
Yes
Yes
Excellent
Query  hint with specified join
Yes
No
Fair
Query hint optimize for
Yes
No
Good+
Plan Guide
No
No
Good+

4.      Used non-sargable in WHERE, unable to use index

d.      Caused by index, table structure, or DML statements? – modify index and use plan guides
                                          i.     Covering index vs. Include
1.      Covering index covers some or all of the columns in the query (A covering index—which is used in covered queries—can provide some or all of the indexed columns that a covered query uses. If a covering index is also a composite index (i.e., it indexes more than one column in its base table or view), it might contain columns that aren't used in the covered query but are used instead in other queries that have overlapping columns.)
2.      Select the 1st column carefully as statistics is only on the 1st column
3.      Do not have to include all columns, especially the lengthy column as covering index arranges the column in B-tree.
4.      Index with Included Columns (A include B) is an alternative to covering index (A, B) – less overhead due to no-B-Tree structure

                                         ii.     Primary Key and Clustered Index
1.      PK is logical, clustered index is physical
2.      By default, PK column has a clustered index, but it can be a non-clustered index as well.

                                        iii.     How to know if it is an index problem?
1.      Step 1 – find the high cost one
2.      Step 2 – Is it table scan or index scan
3.      Step 3 – compare the number of rows return with the number of rows in the table, if it is much less, then SCAN is not a good option. It lacks an index to seek the data.

                                        iv.     How to find missing indexes?
1.      SQL Profiler– use the event Performance – Showplan XML Statistics Profiles, then export to a file (right click)
2.      SELECT * FROM sys.dm_db_missing_index_details
3.      DTA

                                         v.     Using plan guides
1.      Useful in some limited scenarios

2.      Command Tune Up - What else can we do, after solutions on memory/IO bottleneck, blocking and deadlock, compile/recompile, parameter sniffing, index, and plan guide?
a.      Use SARGable filters, do not use NOT or <>, built-in functions, or computations in filters. If you have to use UPPER(name), for instance, add a computed column and index it.

b.      If the passing parameter changes inside the stored procedure, two options: (1) use Option(Recompile), (2) break the sp into a main and a sub one.
CREATE PROC GetRecentSales ( @date DATETIME )
 AS
BEGIN
        --Note: if NULL, it will be replaced with a new date
        IF @date IS NULL
            SET @date = DATEADD(mm, -3,
                                ( SELECT    MAX([OrderDate])
                                  FROM      [dbo].[SalesOrderHeader_test]
                                ))
        SELECT  SUM(d.[OrderQty])
        FROM    [dbo].[SalesOrderHeader_test] h ,
                [dbo].[SalesOrderDetail_test] d
        WHERE   h.[SalesOrderID] = d.[SalesOrderID]
                AND h.[OrderDate] > @date
        OPTION  ( RECOMPILE )
    END
   
 EXEC [sys].[sp_recompile] @objname = N'GetRecentSales'
 GO

 DBCC FREEPROCCACHE
 GO

--Use the query hint: Option(Recompile)

ALTER PROC GetRecentSales ( @date DATETIME )
 AS
    BEGIN
        IF @date IS NULL
            SET @date = DATEADD(mm, -3,
                                ( SELECT    MAX([OrderDate])
                                  FROM      [dbo].[SalesOrderHeader_test]
                                ))
        SELECT  SUM(d.[OrderQty])
        FROM    [dbo].[SalesOrderHeader_test] h ,
                [dbo].[SalesOrderDetail_test] d
        WHERE   h.[SalesOrderID] = d.[SalesOrderID]
                AND h.[OrderDate] > @date
        OPTION  ( RECOMPILE )
    END
   
 --还有一种方法,是把可能出问题的语句单独做成一个子存储过程,让原来
 --的存储过程调用子存储过程,而不是语句本身。例如

 USE [AdventureWorks]
 GO

----the sub-sp

CREATE PROCEDURE GetRecentSalesHelper ( @date DATETIME )
 AS
    BEGIN
        SELECT  SUM(d.[OrderQty])
        FROM    [dbo].[SalesOrderHeader_test] h ,
                [dbo].[SalesOrderDetail_test] d
        WHERE   h.[SalesOrderID] = d.[SalesOrderID]
                AND h.[OrderDate] > @date
               
    END


--the main sp
 
ALTER PROC GetRecentSales ( @date DATETIME )
    AS
        BEGIN
            IF @date IS NULL
                SET @date = DATEADD(mm, -3,
                                    ( SELECT    MAX([OrderDate])
                                      FROM      [dbo].[SalesOrderHeader_test]
                                    ))
            EXEC GetRecentSalesHelper @date --call the sub sp

        END
c.      Temp table vs. Table variable
                                          i.     Temp table – can have statistics and indexes, for large number of rows
                                         ii.     Table variable – for small number of rows (<100)

d.      Simplify complex query
                                          i.     If you have to use dynamic query, limit the conditions of choices and rows returned
                                         ii.     If too many tables in join – de-normalizing them
                                        iii.     Avoid of too many levels of view in view or sp in sp
                                        iv.     Use Order By if you have to
                                         v.     Do you have to return too many rows ? If not, use TOP to limit
                                        vi.     Use several simple statements to replace a complex one.