Summary: We have three primary ways to get the execution plan:
1)
Display
Estimated or Include Actual Execution Plan (omitted),
2)
SET
Commands (SET SHOWPLAN_ALL ON, SET SHOWPLAN_XML ON, and SET STATISTICS
PROFILE ON) and
3)
SQL
Trace.
In understanding the execution plan, one key part is to understand the UNION option. 在SQL Server做联接的时候,会按照输入数据集所基于的表格的结构,衡量可能利用的索引,也根据统计信息,预估两个输入数据集的大小,选择使用三种Join方式其中的一种。如果选得不对,可能就会造成Join的速度非常慢。There are other operators such as aggregation, sort, union, and parallelism. SET STATISTICS IO ON and SET STATISTICS TIME ON are also useful in understanding the execution plan, especially for a single query.
1. Two ways to get execution plan
1) The SET command – USED FOR TESTING A SINGLE QUERY
I.
SET SHOWPLAN_ALL ON - Causes Microsoft SQL
Server not to execute Transact-SQL statements. Instead, SQL Server returns
detailed information about how the statements are executed and provides
estimates of the resource requirements for the statements. Return the results
before executing the statements.
II.
SET SHOWPLAN_XML ON - Causes SQL Server not
to execute Transact-SQL statements. Instead, SQL Server returns detailed
information about how the statements are going to be executed in the form of a
well-defined XML document. Only this option can display the missing indexes and
if any plans reused. Return the results before executing the statements as
well.
III.
SET STATISTICS PROFILE ON - Displays
the profile information for a statement. STATISTICS PROFILE works for ad hoc
queries, views, and stored procedures. When STATISTICS PROFILE is ON, each
executed query returns its regular result set, followed by an additional result
set that shows a profile of the query execution. The additional result set
contains the SHOWPLAN_ALL columns for the query and these additional columns:
Rows - Actual number of rows produced by each operator, and Executes -
Number of times the operator has been executed. It return the results after
executing the statements!
USE [AdventureWorks]
GO
SET SHOWPLAN_ALL ON
GO
--只返回执行计划,没有结果集返回
--执行计划里只有“EsitmateRows”,没有“Rows”(实际行数)
SELECT a.[SalesOrderID] ,
a.[OrderDate] ,
a.[CustomerID] ,
b.[SalesOrderDetailID] ,
b.[ProductID] ,
b.[OrderQty] ,
b.[UnitPrice]
FROM
[dbo].[SalesOrderHeader_test]
a
INNER JOIN [dbo].[SalesOrderDetail_test]
b ON a.[SalesOrderID] = b.[SalesOrderID]
WHERE
a.[SalesOrderID] = 43659
GO
SET SHOWPLAN_ALL OFF
GO
SET STATISTICS PROFILE ON
GO
--先返回结果集,再返回执行计划
--执行计划里有“EsitmateRows”,也有“Rows”(实际行数)
SELECT a.[SalesOrderID] ,
a.[OrderDate] ,
a.[CustomerID] ,
b.[SalesOrderDetailID] ,
b.[ProductID] ,
b.[OrderQty] ,
b.[UnitPrice]
FROM
[dbo].[SalesOrderHeader_test]
a
INNER JOIN [dbo].[SalesOrderDetail_test]
b ON a.[SalesOrderID] = b.[SalesOrderID]
WHERE
a.[SalesOrderID] = 43659
GO
SET STATISTICS PROFILE OFF
GO
IV.
Other helpful SET commands: SET STATISTICS IO
ON, SET STATISTICS TIME ON
-- 清除buffer pool里的所有缓存的数据
DBCC DROPCLEANBUFFERS
GO
SET STATISTICS IO ON
GO
SELECT DISTINCT
ProductID ,
UnitPrice
FROM
dbo.SalesOrderDetail_test
WHERE
ProductID = 777
GO
-- 清除buffer pool里的所有缓存的数据
dbcc dropcleanbuffers
-- 清除buffer pool里的所有缓存的执行计划
DBCC freeproccache
go
Set statistics time on
go
select distinct ProductID, UnitPrice
from dbo.SalesOrderDetail_test
where ProductID = 777
union
select ProductID, UnitPrice
from dbo.SalesOrderDetail_test
where ProductID = 777
go
Set statistics time off
GO
2) SQL Trace
I.
PERFORMANCE:
SHOWPLAN ALL (=SET SHOWPLAN_ALL ON)
II.
PERFORMANCE: SHOWPLAN STATISTICS PROFILE (=SET
STATISTICS PROFILE ON)
III.
PERFORMANCE: SHOWPLAN XML STATISTICS PROFILE (=SET
SHOWPLAN_XML ON)
2. THREE TYPES OF JOIN
1) Nested Loops Join
I.
Logic
FOR each row r1
IN the OUTER TABLE
begin
FOR each
row r2 IN the INNER TABLE
IF
r1 joins WITH r2
OUTPUT(r1,r2)
IF r1 did
NOT JOIN
OUTPUT(r1,NULL)
END
II.
Chart
III.
Pros: no additional spaces are needed, tempdb is not needed, save memory as well
IV.
Cons: not good for big tables, it will scan the
table a*b times (a is the number of rows met criteria in the outer table)
V.
When is it used?
·
Small tables or the number of row in the outer
table met the conditions is small.
VI.
How to improve?
·
Indexing the outer table
·
Indexing the searched column in the inner table
as well
2) Merge Join
I.
Logic
GET FIRST row r1
FROM INPUT1
GET FIRST row r2
FROM INPUT2
WHILE NOT
at the END OF either INPUT
begin
IF r1
joins WITH r2
begin
OUTPUT(r1,r2)
GET NEXT
row r2 FROM INPUT 2
END
ELSE
IF
r1>r2 GET NEXT row r1 FROM INPUT 1
ELSE GET
NEXT row2 FROM INPUT 2
END
--也就是说,从两边的数据集里各取一个值,比较一下。如果相等,就把这两行联接起来返回。
--如果不相等,那就把小的那个值丢掉,按顺序取下一个更大的。两边的数据集有一边遍历
--结束,整个join的过程就结束。所以整个算法的复杂度最大就是大的那个数据集里的记录
--数量,这个比起nested loops join两个数据集相乘的复杂度,的确是小了很多。所以
--在数据集很大的情况下,merge join的优势是非常明显的
II.
Chart
III.
Pros: not
a*b, when one table is done, the join is done. Much less computation than
nested loops for large tables.
IV.
Cons: the tables need to be sorted before the
join. If there are multiple rows for a value, SQL Server needs to create
worktable in tempdb or memory. This Many-to-Many join
could be very expensive.
V.
When is it used?
·
For big tables.
VI.
How to improve?
·
Add unique for the indexes if ever possible!!!
3) Hash Join (Build and Probe the hash table)
I.
Logic
for each row R1 in the build table
begin
calculate hash value on join key(s) of R1
insert R1 into the appropriate hash bucket
end
for each row R2 in the probe table
begin
calculate hash value on join key(s) of R2
for each row R1 in the corresponding hash bucket
if R1 joins with R2
output (R1, R2)
end
算法描述:
选择两个需要join的表中的一个a,对于a中的每一个记录R1,计算其联接列的hash值,然后根据hash值将R1插入到hash bucket当中。
选择另外一张表b,对于b中的每一条记录R2,我们也计算其联接列的hash值,然后去hash bucket上查找。如果hash bucket上有R1能够跟R2进行连接,那么久输出(R1,R2)的联接结果,可能有多个R1的记录。
II.
Chart
(This is only the
part on building the hash table)
III.
Pros: only need to scan the two tables once, do
not need to sort or have indexes, support multiple CPUs executions
IV.
Cons: need to use CPU, memory and tempdb to build the hash table and probe the data.
V.
When to use it?
·
Large tables without indexes
VI.
How to improve?
·
Add indexes
3. Comparisons of the three joins
4. Other operations
1) Aggregation
I.
For aggregation calculations
II.
Two types:
·
stream aggregation - stream first, then
aggregate
·
hash aggregation – build the hash table first,
then aggregate
2) UNION
I.
Union
All
II.
Union
- more expensive