1. What
are the operators?
Operators
describe how SQL Server executes a query or a Data Manipulation Language (DML)
statement. The query optimizer uses operators to build a query plan to create
the result specified in the query, or to perform the operation specified in the
DML statement. The query plan is a tree consisting of physical operators.
Because these
operators often iterate through rowsets, they are also called iterators.
You can view the
query plan by using the SET SHOWPLAN statements, the graphical execution plan
options in SQL Server Management Studio, or the SQL Server Profiler Showplan
event classes.
2. Logical
and Physical Operators
Logical
operators describe the relational algebraic operation used to process a
statement. In other words, logical operators describe conceptually what
operation needs to be performed.
Physical
operators implement the operation described by logical operators. Each physical
operator is an object or routine that performs an operation. For example, some
physical operators access columns or rows from a table, index or view. Other
physical operators perform other operations such as calculations, aggregations,
data integrity checks or joins. Physical operators have costs associated with
them.
The query optimizer creates a query plan as a tree consisting of logical operators. After the query optimizer creates the plan, the query optimizer chooses the most efficient physical operator for each logical operator. The query optimizer uses a cost-based approach to determine which physical operator will implement a logical operator.
Usually, a logical operation can be implemented by multiple physical operators. However, in rare cases, a physical operator can implement multiple logical operations as well.
3. Descriptions
of Operators:
Some operators are logical
only, some are physical only, some can be both, and others are neither such as those
for language elements (e.g., if, assign etc.).
Please note logical operators
can appear in the Showplan output as well.
Also please be aware that operators
may vary from version to version. Here is a list of operators possibly
appearing on the showplan output listed on Showplan
Logical and Physical Operators Reference:
Aggregate
Arithmetic Expression
Assert
Assign
Asnyc Concat
Bitmap
Bitmap Create
Bookmark Lookup
Branch Repartition
Broadcast
Build Hash
Cache
Clustered Index Delete
Clustered Index Insert
Clustered Index Merge
Clustered Index Scan
Clustered Index Seek
Clustered Index Update
Collapse
Columnstore Index Scan
Compute Scalar
Concatenation
Constant Scan
Convert
Cross Join
Catchall
Cursor
Declare
Delete
Deleted Scan
Distinct
Distinct Sort
Distribute Streams
Dynamic
Eager Spool
Fetch Query
Filter
Flow Distinct
Full Outer Join
Gather Streams
Hash Match
If
Inner Join
Insert
Inserted Scan
Intrinsic
Iterator
Key Lookup
Keyset
Language Element
Lazy Spool
Left Anti Semi Join
Left Outer Join
Left Semi Join
Log Row Scan
Merge Interval
Merge Join
Nested Loops
Nonclustered Index Delete
Index Insert
Index Scan
Index Seek
Index Spool
Nonclustered Index Update
Online Index Insert
Parallelism
Parameter Table Scan
Partial Aggregate
Population Query
Refresh Query
Remote Delete
Remote Index Scan
Remote Index Seek
Remote Insert
Remote Query
Remote Scan
Remote Update
Repartition Streams
Result
RID Lookup
Right Anti Semi Join
Right Outer Join
Right Semi Join
Row Count Spool
Segment
Segment Repartition
Sequence
Sequence Project
Snapshot
Sort
Split
Spool
Stream Aggregate
Switch
Table Delete
Table Insert
Table Merge
Table Scan
Table Spool
Table Update
Table-valued Function
Top
Top N Sort
UDX
Union
Update
While
Window Spool
4. Table Assess Operators
Scan
|
Seek
|
|
Heap
|
Table Scan
|
|
Clustered Index
|
Clustered Index Scan
|
Clustered Index Seek
|
Non-clustered Index
|
Index Scan
|
Index Seek
|
(Source:
http://blogs.msdn.com/b/craigfr/archive/2006/06/26/647852.aspx)
a. Table
Scan – scan the physical allocation order
Allocation
order scan or physical allocation order - If a table is
organized as a heap, then the only access method available to SQL Server is a table
scan. SQL Server uses Index Allocation Map (IAM) pages to do the scan in
physical allocation order. Even if you select only a few columns from this
table, and even if you use a very selective WHERE clause that limits the result
set to a single row like the following query shows, SQL Server uses the Table
Scan iterator to retrieve the data.
If the isolation level is Read
Uncommitted, or if you are working in a read-only environment. SQL Server can
use the allocation order scan when a
table is clustered as well. SQL Server uses an allocation order scan for a
clustered table if a query does not request any specific order.
An allocation order scan is
faster if a table is less physically fragmented; the scan is slower if the
physical fragmentation is higher. Allocation order scans are not affected by
the logical fragmentation because they are physical order scan.
b. Index
Scan – scan the logical order
Clustered index order scan or
logical order scan - When SQL Server scans a clustered index, it can also scan
in the logical order of the index by using the index order scan. SQL Server
uses the index leaf–level’s linked list to perform an index order scan. Index
order scans are affected negatively by both logical and physical fragmentation.
Nonclustered index order scan
or logical order scan - As with the Clustered Index Scan iterator, SQL Server
can perform an allocation or an index order scan when scanning a nonclustered
index. In addition, a nonclustered index can also cover a query. Covering a
query means that SQL Server can find all data needed for the query in a
nonclustered index and does not need to do any lookups in the base table.
c. Index
Seek - A seek efficiently returns rows from one or
more ranges of an index based on a predicate.
Note 1: SQL Server does not have an operator for partial
scan. But Index Seek can be implemented as starting with an index seek, then with
a partial scan, if the returned data is ordered. This is true for both
clustered and nonclustered index. See Kalen Delaney’s Ordered
Seeks and Scans for details
Note 2: Maybe the most common access method SQL Server
uses in online transaction processing (OLTP) environments is a nonclustered
index seek with an ordered partial scan and then a lookup into the base table
for each row found in the nonclustered index. Such plans are common for
selective queries. The base table can be organized as a heap or as a balanced
tree.
When the table is organized as
a heap, SQL Server uses the RID Lookup operator to retrieve the rows from the
base table. SQL Server finds rows in the base table by using their RIDs.
If a table is clustered, then
SQL Server uses the Key Lookup operator instead of the RID Lookup operator.
In very rare cases, SQL Server
can also decide to use an unordered nonclustered index scan and then perform
either key or RID lookups in the base table. In order to get such a plan, a query
must be selective enough, no optimal covering nonclustered index can be included,
and the index used must not maintain the sought keys in order.
5. Operators
for Join Algorithms
a. Nested
loops
The nested loops algorithm is
a very simple and, in many cases, efficient algorithm. SQL Server uses one
table for the outer loop, typically the table with fewer rows. For each row in
this outer input, SQL Server seeks for matching rows in the second table, which
is the inner table. SQL Server uses the join condition to find the matching
rows. The join can be a non-equijoin, meaning that the Equals operator does not
need to be part of the join predicate. If the inner table has no supporting
index for a seek, then SQL Server scans the inner input for each row of the
outer input. This is not an efficient scenario. A nested loops join is
efficient when SQL Server can perform an index seek in the inner input. The
following query uses the Nested Loops iterator to join the Sales.Orders and the
Sales.OrderDetails tables. Note that the query filters orders to create smaller
inputs; without the WHERE clause, SQL Server would use the merge join
algorithm.
b. Merge
joins,
Merge join is a very efficient
join algorithm. However, it has its own limitations. It needs at least one
equijoin predicate and sorted inputs from both sides. This means that the merge
join should be supported by indexes on both tables involved in the join. In
addition, if one input is much smaller than another, then the nested loops join
could be more efficient than a merge join.
The merge join scans both
inputs only once. It starts by finding the first rows on both sides. If the end
of input is not reached, the merge join checks the join predicate to determine
whether the rows match. If the rows match, they are added to the output. Then
the algorithm checks the next rows from the other side and adds them to the
output until they match the predicate. If the rows from the inputs do not
match, then the algorithm reads the next row from the side with the lower
value. It reads from this side and compares the row to the row from the other
side until the value is bigger than the value from the other side. Then it
continues reading from the other side, and so on.
In a many-to-many scenario,
the merge join algorithm uses a worktable to put the rows from one input side
aside for reusage when duplicate matching rows from the other input exist.
The following query uses the
Merge Join iterator to join the Sales.Orders and the Sales.OrderDetails tables.
Note that the query uses an equijoin. Both inputs are supported by a clustered
index.
SELECT O.custid,
O.orderdate, OD.orderid, OD.productid, OD.qty
FROM Sales.Orders AS
O
INNER JOIN Sales.OrderDetails
AS OD
ON O.orderid =
OD.orderid;
If none of the inputs is
supported by an index and an equijoin predicate is used, then the hash join
algorithm might be the most efficient one.
c. Hash
joins.
Hash joins uses a searching
structure named a hash table. This is not a searching structure you can build,
like a balanced tree used for indexes. SQL Server builds the hash table
internally. It uses a hash function to split the rows from the smaller input
into buckets. This is the build phase. SQL Server uses the smaller input for
building the hash table because SQL Server wants to keep the hash table in memory.
If it needs to get spilled to disk, then the algorithm might become much
slower. The hash function creates buckets of approximately equal size.
After the hash table is built,
SQL Server applies the hash function on each of the rows from the other input.
It checks to see into which bucket the row fits. Then it scans through all rows
from the bucket. This phase is called the probe phase.
A hash join is a kind of
compromise between creating a full balanced tree index and then using a
different join algorithm, and performing a full scan of one side’s input for
each row of the other input. At least in the first phase, a seek of the
appropriate bucket is used.
You might think that the hash
join algorithm is not efficient. It is true that in a single-thread mode it is
usually slower than merge and nested loops join algorithms that are supported
by existing indexes. However, SQL Server can split rows from the probe input in
advance, and perform partial joins in multiple threads. The hash join is
actually very scalable. This kind of optimization of a hash join is called a
bitmap filtered hash join. It is typically used in a data warehousing scenario,
where you can have large inputs for a query and few concurrent users, so SQL
Server can execute a query in parallel. Although a regular hash join can be
executed in parallel as well, the bitmap filtered hash join is even more efficient,
because SQL Server can use bitmaps for early elimination of rows not used in
the join from the bigger table involved in the join. A bitmap filtered hash
join could be treated as the fourth algorithm, or as an enhancement of the
third, the hash algorithm.
The following two queries
create two heaps that don’t have an index from the Sales.Orders and the
Sales.OrderDetails tables.
SELECT orderid, productid, unitprice, qty, discount
INTO Sales.OrderDetailsHeap
FROM Sales.OrderDetails;
SELECT orderid, custid,
orderdate
INTO Sales.OrdersHeap
FROM Sales.Orders;
The next query uses the hash
join algorithm to join the tables.
SELECT O.custid,
O.orderdate, OD.orderid, OD.productid, OD.qty
FROM Sales.OrdersHeap AS O
INNER JOIN Sales.OrderDetailsHeap
AS OD
ON O.orderid =
OD.orderid;
6. Other
plan Iterators
a. Sort
SQL Server uses the Sort
operator whenever it has to sort an input. There might be many reasons to sort
the input. For example, SQL Server might decide to sort an input so it can use the
merge join algorithm. A very typical example of Sort operator usage is for
queries that request an ordered rowset when the order is not supported by an
index.
The following query requests
an ordered rowset. The rowset should be ordered by the qty column of the
Sales.OrderDetails table. However, the table has no index on this column.
SELECT orderid, productid, qty
FROM Sales.OrderDetails
ORDER BY qty;
b. Streaming
Aggregations
SQL Server uses two different
algorithms for calculating aggregations. If an input is ordered by the columns
used in the GROUP BY clause, then SQL Server uses the stream aggregation algorithm,
which is implemented in the Stream Aggregate operator. Stream aggregation is
very efficient. SQL Server might even decide to sort the input before
performing the aggregation in order to make it possible to use the Stream
Aggregate operator.
The following query uses the
Stream Aggregate operator. Note that it groups rows from the Sales.OrderDetails
table by the productid column. A nonclustered index over this column exists. In
addition, because the query does not require any other column, the index is
covering.
SELECT productid, COUNT(*) AS num
FROM Sales.OrderDetails
GROUP BY productid;
c. Hash Aggregation
If the input for the
aggregation is not ordered and the input is so big that sorting it would be
inefficient, then SQL Server uses the hash aggregation algorithm. The operator
used for this kind of aggregation is the Hash Match Aggregate operator.
The
icon is the same as the icon for the Hash Match Join operator. The hash
aggregation algorithm builds the hash table from the input like it builds it
for a hash join. However, the buckets are used to store the groups.
The following query groups
rows from the Sales.OrderDetails table by the qty column; the aggregation is
not supported by an index.
SELECT qty, COUNT(*) AS num
FROM Sales.OrderDetails
GROUP BY qty;