1. Clustered Index Structure

1.1. Architecture

1.2. Main Characteristics

1) Indexes are organized as B-trees.

2) Data is stored in order based on the clustered index key at the leaf level

3) When to use it? Almost always recommended.

2. Non-clustered Index Structure

2.1. Architecture

2.2. Main Characteristics

1) Nonclustered indexes have the same B-tree structure as clustered indexes, except for the following significant differences:

· The data rows of the underlying table are not sorted and stored in order based on their nonclustered keys.

· The leaf layer of a nonclustered index is made up of index pages instead of data pages as in the Clustered Index Structure. So nonclustered indexes have a structure separate from the data rows.

2) Nonclustered indexes can be defined on a table or view with a clustered index or a heap.

3) Each index row in the nonclustered index contains the nonclustered key value and a row locator. This locator points to the data row in the clustered index or heap having the key value:

· If the table is a heap, which means it does not have a clustered index, the row locator is a pointer to the row. The pointer is built from the file identifier (ID), page number, and number of the row on the page. The whole pointer is known as a Row ID (RID).

· If the table has a clustered index, or the index is on an indexed view, the row locator is the clustered index key for the row.

2.3. When to use the non-clustered index

1) Nonclustered indexes should be designed to improve the performance of frequently used queries that are NOT covered by the clustered index.

2) Primarily read-only and low update tables can benefit from many nonclustered indexes.

3) Consider using a nonclustered index for queries that have the following attributes:

· Use JOIN or GROUP BY clauses.

Create multiple nonclustered indexes on columns involved in join and grouping operations, and a clustered index on any foreign key columns.

· Queries that do not return large result sets.

Create filtered indexes to cover queries that return a well-defined subset of rows from a large table.

· Contain columns frequently involved in search conditions of a query, such as WHERE clause, that return exact matches.

3. Heap Structure – a table without a clustered index.

3.1. Architecture

3.2. Main Characteristics

1) SQL Server uses the IAM (SQL Server uses the (Index Allocation Map) pages to move through the heap.

2) Data is not stored in any particular order. Although usually data is initially stored in the order in which is the rows are inserted into the table, but the Database Engine can move data around in the heap to store the rows efficiently; so the data order cannot be predicted.

3) Data pages are not linked, so sequential access needs to refer back to the index allocation map (IAM) pages

3.3. When to use a heap table?

1) If a table is a heap AND does not have any nonclustered indexes, then the entire table must be examined (a table scan) to find any row. This can be acceptable when the table is tiny, such as a list of the 12 regional offices of a company.

2) When a table is stored as a heap, individual rows are identified by reference to a row identifier (RID) consisting of the file number, data page number, and slot on the page. The row id is a small and efficient structure. Sometimes data architects use heaps when data is always accessed through nonclustered indexes and the RID is smaller than a clustered index key.

4. Will indexes require more spaces? – Nonclustered indexes do, but clustered index doesn’t!

--Comparison of the Space Usage by Different Table Structures

--Conclusion: nonclustered index table requires more space to store the non-clustered indexes.

--Case 1: A heap table, no indexes at all.

USE [AdventureWorks2012]

CREATE TABLE [Sales].[salesorderdetail_hash]

(

[salesorderid] [INT] NOT NULL,

[salesorderdetailid] [INT],

[carriertrackingnumber] [NVARCHAR](25) NULL,

[orderqty] [SMALLINT] NOT NULL,

[productid] [INT] NOT NULL,

[specialofferid] [INT] NOT NULL,

[unitprice] [MONEY] NOT NULL,

[unitpricediscount] [MONEY] NOT NULL,

[linetotal] NUMERIC(38, 6),

[rowguid] [UNIQUEIDENTIFIER],

[modifieddate] [DATETIME]

)

ON [PRIMARY]

INSERT INTO [Sales].[salesorderdetail_hash]

SELECT *

FROM [Sales].[salesorderdetail]

SELECT *

FROM sys.Dm_db_index_physical_stats(Db_id('Adventureworks2012'),

Object_id('[Sales].[SalesOrderDetail_hash]'), NULL, NULL, NULL)

/****Notice the heap tables requires 1494 pages

database_id object_id index_id index_type_desc alloc_unit_type_desc page_count

7 388196433 0 HEAP IN_ROW_DATA 1494

-- Case 2: The same table, but with a clustered index

CREATE TABLE [Sales].[salesorderdetail_c]

(

[salesorderid] [INT] NOT NULL,

[salesorderdetailid] [INT],

[carriertrackingnumber] [NVARCHAR](25) NULL,

[orderqty] [SMALLINT] NOT NULL,

[productid] [INT] NOT NULL,

[specialofferid] [INT] NOT NULL,

[unitprice] [MONEY] NOT NULL,

[unitpricediscount] [MONEY] NOT NULL,

[linetotal] NUMERIC(38, 6),

[rowguid] [UNIQUEIDENTIFIER],

[modifieddate] [DATETIME],

CONSTRAINT [PK_SalesOrderDetailC_SalesOrderID_SalesOrderDetailID] PRIMARY

KEY CLUSTERED ( [salesorderid] ASC, [salesorderdetailid] ASC ) ON [PRIMARY]

)

ON [PRIMARY]

INSERT INTO [Sales].[salesorderdetail_c]

SELECT *

FROM [Sales].[salesorderdetail]

SELECT *

FROM sys.Dm_db_index_physical_stats(Db_id('Adventureworks2012'),

Object_id('[Sales].[SalesOrderDetail_C]'), NULL, NULL, NULL)

/****Notice the clustered-index tables also requires 1494 pages

database_id object_id index_id index_type_desc alloc_unit_type_desc page_count

7 404196490 1 CLUSTERED INDEX IN_ROW_DATA 1494

-- Case 3: The same table, but with a nonclustered index on the Primary Key columns. So it is a heap + a B-tree.

CREATE TABLE [Sales].[salesorderdetail_n]

(

[salesorderid] [INT] NOT NULL,

[salesorderdetailid] [INT],

[carriertrackingnumber] [NVARCHAR](25) NULL,

[orderqty] [SMALLINT] NOT NULL,

[productid] [INT] NOT NULL,

[specialofferid] [INT] NOT NULL,

[unitprice] [MONEY] NOT NULL,

[unitpricediscount] [MONEY] NOT NULL,

[linetotal] NUMERIC(38, 6),

[rowguid] [UNIQUEIDENTIFIER],

[modifieddate] [DATETIME],

CONSTRAINT [PK_SalesOrderDetailN_SalesOrderID_SalesOrderDetailID] PRIMARY

KEY NONCLUSTERED ( [salesorderid] ASC, [salesorderdetailid] ASC ) ON

[PRIMARY]

)

ON [PRIMARY]

INSERT INTO [Sales].[salesorderdetail_n]

SELECT *

FROM [Sales].[salesorderdetail]

SELECT *

FROM sys.Dm_db_index_physical_stats(Db_id('Adventureworks2012'),

Object_id('[Sales].[SalesOrderDetail_N]'), NULL, NULL, NULL)

/****Notice the nonclustered-index tables requires additional 330 pages.

database_id object_id index_id index_type_desc alloc_unit_type_desc page_count

7 436196604 0 HEAP IN_ROW_DATA 1494

7 436196604 0 NONCLUSTERED INDEX IN_ROW_DATA 330

5. Table Organization

Tables and indexes are stored as a collection of 8-KB pages (but log data are not!).

The following illustration shows the organization of a table.

So what is partition?

A partition is a user-defined unit of data organization. By default, a table or index has only one partition that contains all the table or index pages. Example:

SELECT * FROM sys.partitions WHERE object_id = object_id('[Sales].[SalesOrderDetail_N]')

/****Notice that this is a non-partitioned table, even with two index IDs

partition_id object_id index_id partition_number hobt_id rows

72057594070237184 36196604 0 1 72057594070237184 121317

72057594070302720 436196604 2 1 72057594070302720 121317

When a table or index uses multiple partitions, the data is partitioned horizontally so that groups of rows are mapped into individual partitions, based on a specified column. The partitions can be put on one or more filegroups in the database. You can use sys.partitions to view the partitions used by a table or index.

When a heap or a clustered table has multiple partitions, each partition has a heap or B-tree structure that contains the group of rows for that specific partition. For example, if a clustered table has four partitions, there are four B-trees; one in each partition.

For an XML column in a table, one primary and several secondary XML indexes can be created. An XML index is a shredded and persisted representation of the XML binary large objects (BLOBs) in the xml data type column. XML indexes, different from the regular indexes, are stored as internal tables. You can use sys.xml_indexes and sys.internal_tables to find the information about xml indexes.

On the Journey to be a SQL Server Professional

Table Structure and Organization in SQL Server

1. Clustered Index Structure

1.1. Architecture

1.2. Main Characteristics

1) Indexes are organized as B-trees.

2) Data is stored in order based on the clustered index key at the leaf level

3) When to use it? Almost always recommended.

2. Non-clustered Index Structure

2.1. Architecture

2.2. Main Characteristics

1) Nonclustered indexes have the same B-tree structure as clustered indexes, except for the following significant differences:

2) Nonclustered indexes can be defined on a table or view with a clustered index or a heap.

3) Each index row in the nonclustered index contains the nonclustered key value and a row locator. This locator points to the data row in the clustered index or heap having the key value:

2.3. When to use the non-clustered index

1) Nonclustered indexes should be designed to improve the performance of frequently used queries that are NOT covered by the clustered index.

2) Primarily read-only and low update tables can benefit from many nonclustered indexes.

3) Consider using a nonclustered index for queries that have the following attributes:

3. Heap Structure – a table without a clustered index.

3.1. Architecture

3.2. Main Characteristics

1) SQL Server uses the IAM (SQL Server uses the (Index Allocation Map) pages to move through the heap.

2) Data is not stored in any particular order. Although usually data is initially stored in the order in which is the rows are inserted into the table, but the Database Engine can move data around in the heap to store the rows efficiently; so the data order cannot be predicted.

3) Data pages are not linked, so sequential access needs to refer back to the index allocation map (IAM) pages

3.3. When to use a heap table?

1) If a table is a heap AND does not have any nonclustered indexes, then the entire table must be examined (a table scan) to find any row. This can be acceptable when the table is tiny, such as a list of the 12 regional offices of a company.

4. Will indexes require more spaces? – Nonclustered indexes do, but clustered index doesn’t!

5. Table Organization