Summary: We have three primary ways to get the execution plan:

1) Display Estimated or Include Actual Execution Plan (omitted),

2) SET Commands (SET SHOWPLAN_ALL ON, SET SHOWPLAN_XML ON, and SET STATISTICS PROFILE ON) and

3) SQL Trace.

In understanding the execution plan, one key part is to understand the UNION option. 在SQL Server做联接的时候，会按照输入数据集所基于的表格的结构，衡量可能利用的索引，也根据统计信息，预估两个输入数据集的大小，选择使用三种Join方式其中的一种。如果选得不对，可能就会造成Join的速度非常慢。There are other operators such as aggregation, sort, union, and parallelism. SET STATISTICS IO ON and SET STATISTICS TIME ON are also useful in understanding the execution plan, especially for a single query.

1. Two ways to get execution plan

1) The SET command – USED FOR TESTING A SINGLE QUERY

I. SET SHOWPLAN_ALL ON - Causes Microsoft SQL Server not to execute Transact-SQL statements. Instead, SQL Server returns detailed information about how the statements are executed and provides estimates of the resource requirements for the statements. Return the results before executing the statements.

II. SET SHOWPLAN_XML ON - Causes SQL Server not to execute Transact-SQL statements. Instead, SQL Server returns detailed information about how the statements are going to be executed in the form of a well-defined XML document. Only this option can display the missing indexes and if any plans reused. Return the results before executing the statements as well.

III. SET STATISTICS PROFILE ON - Displays the profile information for a statement. STATISTICS PROFILE works for ad hoc queries, views, and stored procedures. When STATISTICS PROFILE is ON, each executed query returns its regular result set, followed by an additional result set that shows a profile of the query execution. The additional result set contains the SHOWPLAN_ALL columns for the query and these additional columns: Rows - Actual number of rows produced by each operator, and Executes - Number of times the operator has been executed. It return the results after executing the statements!

USE [AdventureWorks]

SET SHOWPLAN_ALL ON

--只返回执行计划，没有结果集返回

--执行计划里只有“EsitmateRows”，没有“Rows”(实际行数)

SELECT a.[SalesOrderID] ,

a.[OrderDate] ,

a.[CustomerID] ,

b.[SalesOrderDetailID] ,

b.[ProductID] ,

b.[OrderQty] ,

b.[UnitPrice]

FROM [dbo].[SalesOrderHeader_test] a

INNER JOIN [dbo].[SalesOrderDetail_test] b ON a.[SalesOrderID] = b.[SalesOrderID]

WHERE a.[SalesOrderID] = 43659

SET SHOWPLAN_ALL OFF

SET STATISTICS PROFILE ON

--先返回结果集，再返回执行计划

--执行计划里有“EsitmateRows”，也有“Rows”(实际行数)

SELECT a.[SalesOrderID] ,

a.[OrderDate] ,

a.[CustomerID] ,

b.[SalesOrderDetailID] ,

b.[ProductID] ,

b.[OrderQty] ,

b.[UnitPrice]

FROM [dbo].[SalesOrderHeader_test] a

INNER JOIN [dbo].[SalesOrderDetail_test] b ON a.[SalesOrderID] = b.[SalesOrderID]

WHERE a.[SalesOrderID] = 43659

SET STATISTICS PROFILE OFF

IV. Other helpful SET commands: SET STATISTICS IO ON, SET STATISTICS TIME ON

-- 清除buffer pool里的所有缓存的数据

DBCC DROPCLEANBUFFERS

SET STATISTICS IO ON

SELECT DISTINCT

ProductID ,

UnitPrice

FROM dbo.SalesOrderDetail_test

WHERE ProductID = 777

-- 清除buffer pool里的所有缓存的数据

dbcc dropcleanbuffers

-- 清除buffer pool里的所有缓存的执行计划

DBCC freeproccache

Set statistics time on

select distinct ProductID, UnitPrice

from dbo.SalesOrderDetail_test

where ProductID = 777

union

select ProductID, UnitPrice

from dbo.SalesOrderDetail_test

where ProductID = 777

Set statistics time off

2) SQL Trace

I. PERFORMANCE: SHOWPLAN ALL (=SET SHOWPLAN_ALL ON)

II. PERFORMANCE: SHOWPLAN STATISTICS PROFILE (=SET STATISTICS PROFILE ON)

III. PERFORMANCE: SHOWPLAN XML STATISTICS PROFILE (=SET SHOWPLAN_XML ON)

2. THREE TYPES OF JOIN

1) Nested Loops Join

I. Logic

FOR each row r1 IN the OUTER TABLE

begin

FOR each row r2 IN the INNER TABLE

IF r1 joins WITH r2

OUTPUT(r1,r2)

IF r1 did NOT JOIN

OUTPUT(r1,NULL)

END

II. Chart

III. Pros: no additional spaces are needed, tempdb is not needed, save memory as well

IV. Cons: not good for big tables, it will scan the table a*b times (a is the number of rows met criteria in the outer table)

V. When is it used?

· Small tables or the number of row in the outer table met the conditions is small.

VI. How to improve?

· Indexing the outer table

· Indexing the searched column in the inner table as well

2) Merge Join

I. Logic

GET FIRST row r1 FROM INPUT1

GET FIRST row r2 FROM INPUT2

WHILE NOT at the END OF either INPUT

begin

IF r1 joins WITH r2

begin

OUTPUT(r1,r2)

GET NEXT row r2 FROM INPUT 2

END

ELSE

IF r1>r2 GET NEXT row r1 FROM INPUT 1

ELSE GET NEXT row2 FROM INPUT 2

END

--也就是说，从两边的数据集里各取一个值，比较一下。如果相等，就把这两行联接起来返回。

--如果不相等，那就把小的那个值丢掉，按顺序取下一个更大的。两边的数据集有一边遍历

--结束，整个join的过程就结束。所以整个算法的复杂度最大就是大的那个数据集里的记录

--数量，这个比起nested loops join两个数据集相乘的复杂度，的确是小了很多。所以

--在数据集很大的情况下，merge join的优势是非常明显的

II. Chart

III. Pros: not a*b, when one table is done, the join is done. Much less computation than nested loops for large tables.

IV. Cons: the tables need to be sorted before the join. If there are multiple rows for a value, SQL Server needs to create worktable in tempdb or memory. This Many-to-Many join could be very expensive.

V. When is it used?

· For big tables.

VI. How to improve?

· Add unique for the indexes if ever possible!!!

3) Hash Join （Build and Probe the hash table）

I. Logic

for each row R1 in the build table

begin

calculate hash value on join key(s) of R1

insert R1 into the appropriate hash bucket

end

for each row R2 in the probe table

begin

calculate hash value on join key(s) of R2

for each row R1 in the corresponding hash bucket

if R1 joins with R2

output (R1, R2)

end

算法描述：

选择两个需要join的表中的一个a，对于a中的每一个记录R1，计算其联接列的hash值，然后根据hash值将R1插入到hash bucket当中。

选择另外一张表b，对于b中的每一条记录R2，我们也计算其联接列的hash值，然后去hash bucket上查找。如果hash bucket上有R1能够跟R2进行连接，那么久输出(R1,R2)的联接结果，可能有多个R1的记录。

II. Chart

(This is only the part on building the hash table)

III. Pros: only need to scan the two tables once, do not need to sort or have indexes, support multiple CPUs executions

IV. Cons: need to use CPU, memory and tempdb to build the hash table and probe the data.

V. When to use it?

· Large tables without indexes

VI. How to improve?

· Add indexes

3. Comparisons of the three joins

4. Other operations

1) Aggregation

I. For aggregation calculations

II. Two types:

· stream aggregation - stream first, then aggregate

· hash aggregation – build the hash table first, then aggregate

2) UNION

I. Union All

II. Union - more expensive

On the Journey to be a SQL Server Professional

Understand the execution plan

1. Two ways to get execution plan

1) The SET command – USED FOR TESTING A SINGLE QUERY

2) SQL Trace

2. THREE TYPES OF JOIN

1) Nested Loops Join

2) Merge Join

3) Hash Join （Build and Probe the hash table）

3. Comparisons of the three joins

4. Other operations

1) Aggregation

2) UNION

3) PARALLELISM – if the cost is high and you have multiple CPUs.