Index, Statistics, and Optimization

1.     Why do you need optimization? What are the symptoms?
1.1.  The query need to visit a large number of page
1.1.1.    Why need to visit so many pages?
1.1.2.    Scan the table?
1.1.3.    Fragmentation?
1.2.  The execution time is high, even the requested data are cached in memory. Remember Execution time = compile/recompile+ execution, not including the time to return the results.
1.3.  CPU % is fine in single execution, but too high in concurrent executions – need to check the CPU % in the compile and execution stage.
1.4.  Single execution is fine, but concurrent execution results in blocking or deadlock – use lower granularity level of resources if possible

2.     How can indexes improve lookup?
2.1.  Table scan - No indexes at all

-------------------------------Data set up---------------------------------------------
USE adventureworks
GO
--如果表已存在,删
IF EXISTS (SELECT * FROM sys.tables WHERE name='SalesOrderHeader_test' AND schema_id=SCHEMA_ID('dbo') AND type='U')
DROP TABLE dbo.SalesOrderHeader_test
GO

IF EXISTS (SELECT * FROM sys.tables WHERE name='SalesOrderDetail_test' AND schema_id=SCHEMA_ID('dbo') AND type='U')
DROP TABLE dbo.SalesOrderDetail_test
GO

--创建
--Note in the origina Sales.SalesOrderHeader, SalesOrderID column is the identity column (1,1)
--When copying data into the dbo.SalesOrderHeader_test, the identity property of the column is retained.
--But the key, constraints, and indexes are not retained.
SELECT  *
INTO dbo.SalesOrderHeader_test
FROM Sales.SalesOrderHeader
GO

SELECT  *
INTO dbo.SalesOrderDetail_test
FROM Sales.SalesOrderDetail
GO

--创建索
CREATE CLUSTERED INDEX SalesOrderHeader_test_CL
ON dbo.SalesOrderHeader_test (SalesOrderID)
GO

CREATE INDEX SalesOrderDetail_test_NCL
ON dbo.SalesOrderDetail_test (SalesOrderID)
GO

DECLARE @i INT
SET @i = 1
WHILE @i <= 9
  BEGIN
      INSERT  INTO dbo.SalesOrderHeader_test
              ( --SalesOrderID is identity column, inherited from the Sales.SalesOrderHeader table. IDENTITY_INSERT is ON by default
                --It is not needed here.
             RevisionNumber ,
             OrderDate ,
             DueDate ,
             ShipDate ,
             Status ,
             OnlineOrderFlag ,
             SalesOrderNumber ,
             PurchaseOrderNumber ,
             AccountNumber ,
             CustomerID ,
             ContactID,
             SalesPersonID ,
             TerritoryID ,
             BillToAddressID ,
             ShipToAddressID ,
             ShipMethodID ,
             CreditCardID ,
             CreditCardApprovalCode ,
             CurrencyRateID ,
             SubTotal ,
             TaxAmt ,
             Freight ,
            TotalDue ,
             Comment ,
             rowguid ,
             ModifiedDate
              )
              SELECT
                   RevisionNumber ,
                   OrderDate ,
                   DueDate ,
                   ShipDate ,
                   Status ,
                   OnlineOrderFlag ,
                   SalesOrderNumber ,
                   PurchaseOrderNumber ,
                   AccountNumber ,
                   CustomerID ,
                   ContactID,
                   SalesPersonID ,
                   TerritoryID ,
                   BillToAddressID ,
                   ShipToAddressID ,
                   ShipMethodID ,
                   CreditCardID ,
                   CreditCardApprovalCode ,
                   CurrencyRateID ,
                   SubTotal ,
                   TaxAmt ,
                   Freight ,
                   TotalDue ,
                   Comment ,
                   rowguid ,
                   ModifiedDate
              FROM      dbo.SalesOrderHeader_Test
              WHERE   SalesOrderID = 75123
     
      INSERT  INTO dbo.SalesOrderDetail_test
              ( SalesOrderID ,
                --SalesOrderDetailID: the identty column is not needed
             CarrierTrackingNumber ,
             OrderQty ,
             ProductID ,
             SpecialOfferID ,
             UnitPrice ,
             UnitPriceDiscount ,
             LineTotal ,
             rowguid ,
             ModifiedDate
              )
              SELECT  75123 + @i ,
                   CarrierTrackingNumber ,
                   OrderQty ,
                   ProductID ,
                   SpecialOfferID ,
                   UnitPrice ,
                   UnitPriceDiscount ,
                   LineTotal ,
                   rowguid ,
                   GETDATE()
              FROM Sales.SalesOrderDetail
      SET @i = @i + 1
  END
GO
----------------------------数据准备完毕--------------------------------

SET STATISTICS PROFILE ON
SET STATISTICS IO ON
SET STATISTICS time ON
--without an index: table scan, 12.86 note this is not in seconds or ms, no unit for the cost!!!)
--http://blogs.msdn.com/b/sqlqueryprocessing/archive/2006/10/11/what-is-this-cost.aspx
SELECT
[SalesOrderDetailID],[UnitPrice]
FROM [dbo].[SalesOrderDetail_test]
WHERE [UnitPrice]>200

2.2.  Clustered index scan
2.2.1.    No indexes on the columns in the WHERE clause, but there is a cluster index on the table

CREATE CLUSTERED INDEX SalesOrderDetail_test_CL ON [dbo].[SalesOrderDetail_test]([SalesOrderDetailID])

--clustered index on the identity column: clustered index scan, 12.86 seconds
SELECT
[SalesOrderDetailID],[UnitPrice]
FROM [dbo].[SalesOrderDetail_test]
WHERE [UnitPrice]>200

2.2.2.    There are indexes on the columns in the WHERE clause, and there is a cluster index on the table, but the query need to return other columns as well

CREATE CLUSTERED INDEX SalesOrderDetail_test_CL ON [dbo].[SalesOrderDetail_test]([SalesOrderDetailID])

--If querying other columns, it would use clustered index scan
--Clustered index scan: 13.03
SELECT [SalesOrderID],[SalesOrderDetailID],[UnitPrice]
FROM [dbo].[SalesOrderDetail_test]
WHERE [UnitPrice]>200

2.3.  Nonclustered index scan (also called Index Scan) - No clustered index on the table, but there is a nonclustered index on the table. However, when I tested with a nonclustered index on SalesOrderID as the only index on the SalesOrderDetail_test, SQL Server chooses to use table scan instead of an index scan. So I have to use a different query to demonstrate this scenario.
/*
The addres table has a nonclustered index IX_Address_AddressLine1_AddressLine2_City_StateProvinceID_PostalCode which is defined as:
CREATE UNIQUE NONCLUSTERED INDEX [IX_Address_AddressLine1_AddressLine2_City_StateProvinceID_PostalCode] ON [Person].[Address]
(
 [AddressLine1] ASC,
 [AddressLine2] ASC,
 [City] ASC,
 [StateProvinceID] ASC,
 [PostalCode] ASC
)*/
--This query uses Index Scan
SELECT AddressID, AddressLine1, AddressLine2
FROM Person.Address

2.4.  Index seek (i.e. nonclustered seek) - indexes on the columns in the WHERE clause.

CREATE INDEX SalesOrderDetail_test_NCL_Price ON [dbo].[SalesOrderDetail_test]([UnitPrice])
GO

--Non-clustered index on the unit price column: index seek, 1.2
SELECT
[SalesOrderDetailID],[UnitPrice]
FROM [dbo].[SalesOrderDetail_test]
WHERE [UnitPrice]>200

2.5.  Index seek or nonclustered seek + bookmark lookup or nested loop - indexes on the columns in the WHERE clause and querying other columns as well.
2.5.1.    Indexing the columns in the WHERE clause and INCLUDE other returned columns can make the query from clustered index scan to index seek, significant improvement.

--If forcing it to use the non-clustered index,
--it would be non-clustered seek + Nested loop or bookmark lookup/nested loop, 146.93, very expensive
SELECT [SalesOrderID],[SalesOrderDetailID],[UnitPrice]
FROM [dbo].[SalesOrderDetail_test] WITH (INDEX (salesorderdetail_test_ncl_price))
WHERE [UnitPrice]>200

--If a nonclustered index on the column in the WHERE clause and INCLUDE other columns,
--it would be index seek
CREATE INDEX SalesOrderDetail_test_NCL_Include ON [dbo].[SalesOrderDetail_test]([UnitPrice])
    INCLUDE ([SalesOrderID],[SalesOrderDetailID])
GO
-- index seek: 1.37
SELECT [SalesOrderID],[SalesOrderDetailID],[UnitPrice]
FROM [dbo].[SalesOrderDetail_test]
WHERE [UnitPrice]>200

3.     Statistics
3.1.  The same query with the same index structure may have different execution plans based on the volume of data in the table, or the parameters

3.2.  When SQL Server will Auto Update Statistics? - SQL Server 2005+ determines whether to update statistics based on changes to column modification counters (colmodctrs).  If the statistics is defined on a regular table, it is out of date if:
·       The table size has gone from 0 to >0 rows (test 1)
·       The number of rows in the table when the statistics were gathered was 500 or less, and the colmodctr of the leading column of the statistics object has changed by more than 500 since then (test 2)
·       The table had more than 500 rows when the statistics were gathered, and the colmodctr of the leading column of the statistics object has changed by more than 500 + 20% of the number of rows in the table when the statistics were gathered (test 3)

3.3.  For filtered statistics, the colmodctr is first adjusted by the selectivity of the filter before these conditions are tested. For example, for filtered statistics with predicate selecting 50% of the rows, the colmodctr is multiplied by 0.5.

3.4.  The limitation of the automatic update logic is that it tracks changes to columns in the statistics, but not changes to columns in the predicate. If there are many changes to the columns used in predicates of filtered statistics, consider using manual updates to keep up with the changes.

3.5.  If the statistics object is defined on a temporary table, it is out of date as discussed above, except that there is an additional threshold for recomputation at 6 rows, with a test otherwise identical to test 2 in the previous list.

3.6.  Table variables do not have statistics at all.

4.     Compile and Recompile
4.1.  Use the following query to find out all of the cached pans. Note dbid 32767 is for the hidden Resources database

SELECT * FROM sys.[syscacheobjects]

4.2.  Reuse or compile?
4.2.1.    Batch statement – have to be exactly the same for reuse. SQL Server parameterizes ad hoc queries automatically. However, SQL Server is very conservative with plan reuse. It does not want to use a wrong plan. SQL Server decides to reuse a plan only when it is sure that the cached plan is the correct one for a query. Changes in parameter data type, in some SET options, in security context, and more, can produce a new plan when you would expect reuse of an existing cached plan.

      You can check the plans in the cache and the number of executions by using the following query.
--DBCC FREEPROCCACHE if necessary
SELECT  qs.execution_count AS cnt , qt.text 
FROM    sys.dm_exec_query_stats AS qsCROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS qt 
WHERE   qt.text LIKE N'%Orders%' AND qt.text NOT LIKE N'%qs.execution_count%' 
ORDER BY qs.execution_count;
To demonstrate how SQL Server is conservative with plan reuse, the following three queries use the custid in the WHERE clause. The first query returns a single row, the second two rows, and the third 31 rows. SQL Server cannot be sure about the selectivity of the custid column in the Sales.Orders table.
-- One row 
SELECT orderid, custid, empid, orderdate
FROM Sales.Orders
WHERE custid = 13 
GO
-- Two rows 

SELECT orderid, custid, empid, orderdate
FROM Sales.Orders
WHERE custid = 33; 
GO
-- 31 rows 
SELECT orderid, custid, empid, orderdate
FROM Sales.Orders
WHERE custid = 71; 
GO
If you cleared the cache before executing the previous three queries, then the plan reuse query returns the following output.
      cnt text
--- ----
1 SELECT orderid, custid, empid, orderdate FROM Sales.Orders WHERE custid = 33;
1 SELECT orderid, custid, empid, orderdate FROM Sales.Orders WHERE custid = 13;
1 SELECT orderid, custid, empid, orderdate FROM Sales.Orders WHERE custid = 71;
Note that the queries were not parameterized.
/*Demo: Ad hoc batch queries have to be exactly the same for plan reuse.
These two queries have different plans because (1) they have different format, (2) SQL Server cannot parameterize them.*/

--free the cache
DBCC FREEPROCCACHE
GO

--Query 1: no carriage return
SELECT * FROM sys.[syscacheobjects]
GO

--check the plans
select * from sys.syscacheobjects WHERE dbid=8

--Query 2: the same query as the above, but with a carriage return, it will have a new plan
SELECT *
FROM sys.[syscacheobjects]
GO

--check it
select * from sys.syscacheobjects  WHERE dbid=8

4.2.2      Dynamic SQL using Exec () – parameter values have to be the same for reuse

--will be two different plans
DECLARE @dbid VARCHAR(10)
SET @dbid='9'
EXEC('select * from sys.syscacheobjects where dbid<='+@dbid)

SET @dbid='10'
EXEC('select * from sys.syscacheobjects where dbid<='+@dbid)

4.2.3      Parameterization: Simple or Forced – a database property. By default, SQL Server use the Simple option to parameterize simple queries. For other queries, SQL Server will create different plans for different parameters unless it uses the FORCED option.

DBCC FREEPROCCACHE

USE [AdventureWorks]
GO

DBCC FREEPROCCACHE
GO

--will create two plans: one for 50, another one is for @int
SELECT [ProductID] ,
   [SalesOrderID]
FROM   sales.[SalesOrderDetail]
WHERE  [ProductID] > 750
GO

SELECT usecountscounts, sql
FROM   sys.syscacheobjects
WHERE  dbid = 8

--will have two, one for the ad hoc one 850, another one is to reuse the parameterized one as the uecount is 2 now.
SELECT [ProductID] ,
   [SalesOrderID]
FROM   sales.[SalesOrderDetail]
WHERE  [ProductID] > 850
GO

SELECT usecountscounts, sql
FROM   sys.syscacheobjects
WHERE  dbid = 8

4.2.4      sp_executesql promotes plan reuse

--sp_executesql will create only the parameterized plan, no more ad hoc ones.
USE [AdventureWorks]
GO

DBCC FREEPROCCACHE
GO

EXEC [sys].[sp_executesql] N'SELECT p.[ProductID],p.[Name],p.[ProductNumber] FROM [Production].[Product] p
INNER JOIN [Production].[ProductDescription] pd
ON p.[ProductID]=pd.[ProductDescriptionID]
WHERE p.[ProductID]=@a', N'@a int', 320
GO

EXEC [sys].[sp_executesql] N'SELECT p.[ProductID],p.[Name],p.[ProductNumber] FROM [Production].[Product] p
INNER JOIN [Production].[ProductDescription] pd
ON p.[ProductID]=pd.[ProductDescriptionID]
WHERE p.[ProductID]=@a', N'@a int', 870
GO

SELECT *
FROM   sys.[syscacheobjects]
GO

4.2.5      Stored procedures – the best option

4.3      When does the recompile occur?
4.3.1      Object schema changes
4.3.2      sp_recompile
4.3.3      Some actions causes all cached plans gone from the instance
·       Detach a database
·       Server upgrade
·       DBCC FREEPROCCACHE
·       RECONFIGURE
·       ALTER DATABASE…MODIFY FILEGROUP
·       ALTER DATABASE…COLLATE
4.3.4      Some actions causes all cached plans gone from the database
·       ALTER DATABASE…set online/offline
·       ALTER DATABASE…SET EMERGENCY
·       DROP DATABASE
·       DBCC CHECKDB is done.
4.3.5      When a database is automatically shut down (what does it mean exactly?)
4.4      Changes on some of the SET options (e.g., ANSI_NULLS…)
4.5      Statistics changes make the plans recompile. See chart below.
4.6      Create procedure...with recompile and exec ....with recompile


5           How to view/trace the cached plans?
5.1      sys.syscacheobjects - view the cached objects
5.2      DBCC FREEPROCCACHE - clear the cached objects
5.3      Some Events in SQL TRACE/Profiler
5.3.1      CURSORS –cursorrecompile: if the objects used in the cursor have schema changes, the plan for the cursor will be recompiled
5.3.2      PERFORMANCE -auto stats: if auto create and auto update occurred
5.3.3      SP:CacheHit
5.3.4      SP:CacheInsert
5.3.5      SP:CacheMiss
5.3.6      SP:CacheRemove
5.3.7      SP:Recompile. It has the following EventSubClass
·        1 = Schema Changed
·        2 = Statistics Changed
·        3 = Recompile DNR
·        4 = Set Option Changed
·        5 = Temp Table Changed
·        6 = Remote Rowset Changed
·        7 = For Browse Perms Changed
·        8 = Query Notification Environment Changed
·        9 = MPI View Changed
·        10 = Cursor Options Changed
·        11 = With Recompile Option

Note: in SQL Server 2000, 只要批处理中的任何一句语句导致重新编译,就会重新编译整个批处理,无论此批处理是通过存储过程、触发器、即席批查询,还是通过预定义的语句进行提交 (that is, it uses the SP event). SQL2005和更高版本中,只会重新编译批处理中导致重新编译的语句。从这一点来看 SQL2005SQL2000更节约一点。因为这种变化,在SQL2005以后,要改用“TSQL-SQL:StmtRecompile” 这个事件来跟踪重编译,因为现在的重编译都发生在语句一级. This event did not exist in SQL Server 2000. See below for the event classes in SQL Server 2000.


5.4      Object Counters in Perfmon
5.4.1      SQLSERVER:BUFFER MANAGER: buffer cache hit ratio,lazy writes/sec ,procedure cache pages,total pages
5.4.2      SQLSERVERCache Manager: cache hit ratio,cache object counts,cache pages ,cache use
5.4.3      SQLSERVER:MEMORY MANAGER: sql cache memory(kb)
5.4.4      SQLSERVER:SQL STATISTICS: auto-param attmpts/sec,batch request/sec,failed auto-params/sec,safe auto-param/sec, sql compilations/sec,sql re-compilations/sec,unsafe auto-params/sec