1. Clustered Index Structure
1.1. Architecture
1.2. Main Characteristics
1) Indexes are organized as B-trees.
2) Data is stored in order based on the clustered index key at the leaf level
3) When to use it? Almost always recommended.
2. Non-clustered Index Structure
2.1. Architecture
2.2. Main Characteristics
1) Nonclustered indexes have the same B-tree structure as clustered indexes, except for the following significant differences:
· The data rows of the underlying table are not sorted and stored in order based on their nonclustered keys.
· The leaf layer of a nonclustered index is made up of index pages instead of data pages as in the Clustered Index Structure. So nonclustered indexes have a structure separate from the data rows.
2) Nonclustered indexes can be defined on a table or view with a clustered index or a heap.
3) Each index row in the nonclustered index contains the nonclustered key value and a row locator. This locator points to the data row in the clustered index or heap having the key value:
· If the table is a heap, which means it does not have a clustered index, the row locator is a pointer to the row. The pointer is built from the file identifier (ID), page number, and number of the row on the page. The whole pointer is known as a Row ID (RID).
· If the table has a clustered index, or the index is on an indexed view, the row locator is the clustered index key for the row.
2.3. When to
use the non-clustered index
1)
Nonclustered indexes
should be designed to improve the performance of frequently used queries that
are NOT covered by the clustered index.
2)
Primarily read-only and low update tables can
benefit from many nonclustered indexes.
3) Consider using a nonclustered index for queries that have the following attributes:
·
Use JOIN or GROUP BY clauses.
Create multiple nonclustered indexes on columns involved in join and
grouping operations, and a clustered index on any foreign key columns.
·
Queries that do not return large result sets.
Create filtered
indexes to cover queries that return a well-defined subset of rows from a large
table.
·
Contain columns frequently involved in search
conditions of a query, such as WHERE clause, that return exact matches.
3. Heap
Structure – a table without a clustered index.
3.1. Architecture
3.2. Main
Characteristics
1)
SQL Server uses the IAM (SQL Server uses the
(Index Allocation Map) pages to move through the heap.
2)
Data is not stored in any particular
order. Although usually data is initially stored in the order in which is
the rows are inserted into the table, but the Database Engine can move data
around in the heap to store the rows efficiently; so the data order cannot be
predicted.
3)
Data pages are not linked, so sequential access
needs to refer back to the index allocation map (IAM) pages
3.3. When to
use a heap table?
1)
If a table is a heap AND does not have any nonclustered indexes, then the entire table must be
examined (a table scan) to find any row. This can be acceptable when the table
is tiny, such as a list of the 12 regional offices of a company.
2)
When a table is stored as a heap, individual
rows are identified by reference to a row identifier (RID) consisting of the
file number, data page number, and slot on the page. The row id is a small and
efficient structure. Sometimes data architects use heaps when data is always
accessed through nonclustered indexes and the RID is
smaller than a clustered index key.
4. Will
indexes require more spaces? – Nonclustered indexes do,
but clustered index doesn’t!
--Comparison of the Space Usage by Different Table Structures
--Conclusion: nonclustered index table requires more space to store the non-clustered indexes.
--Case 1: A heap table, no indexes at all.
USE [AdventureWorks2012]
GO
CREATE TABLE [Sales].[salesorderdetail_hash]
(
[salesorderid]
[INT] NOT NULL,
[salesorderdetailid]
[INT],
[carriertrackingnumber]
[NVARCHAR](25) NULL,
[orderqty]
[SMALLINT] NOT NULL,
[productid]
[INT] NOT NULL,
[specialofferid]
[INT] NOT NULL,
[unitprice]
[MONEY] NOT NULL,
[unitpricediscount]
[MONEY] NOT NULL,
[linetotal] NUMERIC(38, 6),
[rowguid]
[UNIQUEIDENTIFIER],
[modifieddate]
[DATETIME]
)
ON [PRIMARY]
INSERT INTO [Sales].[salesorderdetail_hash]
SELECT *
FROM
[Sales].[salesorderdetail]
SELECT *
FROM sys.Dm_db_index_physical_stats(Db_id('Adventureworks2012'),
Object_id('[Sales].[SalesOrderDetail_hash]'), NULL, NULL, NULL)
/****Notice the
heap tables requires 1494 pages
database_id object_id index_id index_type_desc alloc_unit_type_desc
page_count
7
388196433 0 HEAP
IN_ROW_DATA
1494
*/
-- Case 2: The same table, but with a clustered index
CREATE TABLE [Sales].[salesorderdetail_c]
(
[salesorderid]
[INT] NOT NULL,
[salesorderdetailid]
[INT],
[carriertrackingnumber]
[NVARCHAR](25) NULL,
[orderqty]
[SMALLINT] NOT NULL,
[productid]
[INT] NOT NULL,
[specialofferid]
[INT] NOT NULL,
[unitprice]
[MONEY] NOT NULL,
[unitpricediscount]
[MONEY] NOT NULL,
[linetotal] NUMERIC(38, 6),
[rowguid]
[UNIQUEIDENTIFIER],
[modifieddate]
[DATETIME],
CONSTRAINT [PK_SalesOrderDetailC_SalesOrderID_SalesOrderDetailID] PRIMARY
KEY CLUSTERED ( [salesorderid] ASC, [salesorderdetailid] ASC ) ON [PRIMARY]
)
ON [PRIMARY]
INSERT INTO [Sales].[salesorderdetail_c]
SELECT *
FROM
[Sales].[salesorderdetail]
SELECT *
FROM sys.Dm_db_index_physical_stats(Db_id('Adventureworks2012'),
Object_id('[Sales].[SalesOrderDetail_C]'), NULL, NULL, NULL)
/****Notice the
clustered-index tables also requires 1494 pages
database_id object_id index_id index_type_desc alloc_unit_type_desc
page_count
7
404196490 1
CLUSTERED INDEX IN_ROW_DATA
1494
*/
-- Case 3: The same table, but with a nonclustered index on the Primary Key columns. So it is a heap + a B-tree.
CREATE TABLE [Sales].[salesorderdetail_n]
(
[salesorderid]
[INT] NOT NULL,
[salesorderdetailid]
[INT],
[carriertrackingnumber]
[NVARCHAR](25) NULL,
[orderqty]
[SMALLINT] NOT NULL,
[productid]
[INT] NOT NULL,
[specialofferid]
[INT] NOT NULL,
[unitprice]
[MONEY] NOT NULL,
[unitpricediscount]
[MONEY] NOT NULL,
[linetotal] NUMERIC(38, 6),
[rowguid]
[UNIQUEIDENTIFIER],
[modifieddate]
[DATETIME],
CONSTRAINT [PK_SalesOrderDetailN_SalesOrderID_SalesOrderDetailID] PRIMARY
KEY NONCLUSTERED ( [salesorderid] ASC, [salesorderdetailid] ASC ) ON
[PRIMARY]
)
ON [PRIMARY]
INSERT INTO [Sales].[salesorderdetail_n]
SELECT *
FROM
[Sales].[salesorderdetail]
go
SELECT *
FROM sys.Dm_db_index_physical_stats(Db_id('Adventureworks2012'),
Object_id('[Sales].[SalesOrderDetail_N]'), NULL, NULL, NULL)
/****Notice
the nonclustered-index tables requires additional 330
pages.
database_id object_id index_id index_type_desc alloc_unit_type_desc
page_count
7
436196604 0
HEAP
IN_ROW_DATA 1494
7
436196604 0
NONCLUSTERED INDEX IN_ROW_DATA
330
*/
5. Table
Organization
Tables and indexes are stored as a collection of 8-KB pages (but log data are not!).
The following illustration shows the organization of a table.
So what is
partition?
A partition is a
user-defined unit of data organization. By default, a table or index has only
one partition that contains all the table or index pages. Example:
SELECT * FROM sys.partitions WHERE object_id = object_id('[Sales].[SalesOrderDetail_N]')
/****Notice
that this is a non-partitioned table, even with two index IDs
partition_id object_id index_id partition_number hobt_id
rows
72057594070237184
36196604 0
1
72057594070237184
121317
72057594070302720
436196604 2
1
72057594070302720 121317
*/
When a table or
index uses multiple partitions, the data is partitioned horizontally so that
groups of rows are mapped into individual partitions, based on a specified
column. The partitions can be put on one or more filegroups
in the database. You can use sys.partitions to view
the partitions used by a table or index.
When a heap or a
clustered table has multiple partitions, each partition has a heap or B-tree
structure that contains the group of rows for that specific partition. For
example, if a clustered table has four partitions, there are four B-trees; one
in each partition.
For an XML column
in a table, one primary and several secondary XML indexes can be created. An
XML index is a shredded and persisted representation of the XML binary large
objects (BLOBs) in the xml data type column. XML indexes,
different from the regular indexes, are stored as internal tables. You can use sys.xml_indexes and sys.internal_tables to find the
information about xml indexes.