Table Structure and Organization in SQL Server

1.  Clustered Index Structure

1.1.               Architecture

 

https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgMhDmdR-PLLLaHiLe9OHOBDeFGHsc2PpB5lWkUDB6SWi3zfCsyEWyV7-dy7BQa9RVYKvvISIMrAd7F6F9axK_4wMGecztyfYaSIB7ZsFVS173vfFTuDiXfHy_caMgSTfAxz0WCEgbxsSQ/s1600/clustered+index.gif

 

1.2.               Main Characteristics

1)       Indexes are organized as B-trees.

2)       Data is stored in order based on the clustered index key at the leaf level

3)       When to use it? Almost always recommended.

 

2.  Non-clustered Index Structure

2.1.               Architecture

 

https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEip5k8fkDGAsvnT7iy7as3YV-obhyphenhyphendgz29AfZxysdVz41jCsUaIdKoskOWAQPmGUtz1VS71vC0e1KFeDCx5FermFs13lWuK_N5vKKJxCeULkFLT1jgewlofur_gL18xrMkq1Wcf7myvryk/s1600/non-clustered+index.gif

 

2.2.               Main Characteristics

1)       Nonclustered indexes have the same B-tree structure as clustered indexes, except for the following significant differences:

·       The data rows of the underlying table are not sorted and stored in order based on their nonclustered keys.

·       The leaf layer of a nonclustered index is made up of index pages instead of data pages as in the Clustered Index Structure. So nonclustered indexes have a structure separate from the data rows. 

2)       Nonclustered indexes can be defined on a table or view with a clustered index or a heap.

3)       Each index row in the nonclustered index contains the nonclustered key value and a row locator. This locator points to the data row in the clustered index or heap having the key value:

·       If the table is a heap, which means it does not have a clustered index, the row locator is a pointer to the row. The pointer is built from the file identifier (ID), page number, and number of the row on the page. The whole pointer is known as a Row ID (RID).

·       If the table has a clustered index, or the index is on an indexed view, the row locator is the clustered index key for the row. 

2.3. When to use the non-clustered index

1)         Nonclustered indexes should be designed to improve the performance of frequently used queries that are NOT covered by the clustered index.

2)         Primarily read-only and low update tables can benefit from many nonclustered indexes.

3)       Consider using a nonclustered index for queries that have the following attributes:

·        Use JOIN or GROUP BY clauses.

 

Create multiple nonclustered indexes on columns involved in join and grouping operations, and a clustered index on any foreign key columns.

 

·        Queries that do not return large result sets.

 

Create filtered indexes to cover queries that return a well-defined subset of rows from a large table.

 

·        Contain columns frequently involved in search conditions of a query, such as WHERE clause, that return exact matches.

3.     Heap Structure – a table without a clustered index.

3.1. Architecture

https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhu_KXVvkp1dOE82FwwjvLXi8e0JTj5R-M7X55JtzhLlvKRKJQKUtZD4UCyhtkoyi66_jj4xBnXUZnNbvVJHP5SS4HBS4WkgBJnGJYeu0gVbTfyPQzGcpUTzhKRDKJJQmMvM9HOAF9m9_8/s1600/heap_structure.gif

3.2.  Main Characteristics

1)         SQL Server uses the IAM (SQL Server uses the (Index Allocation Map) pages to move through the heap. 

2)         Data is not stored in any particular order. Although usually data is initially stored in the order in which is the rows are inserted into the table, but the Database Engine can move data around in the heap to store the rows efficiently; so the data order cannot be predicted.

3)         Data pages are not linked, so sequential access needs to refer back to the index allocation map (IAM) pages

3.3. When to use a heap table?

1)         If a table is a heap AND does not have any nonclustered indexes, then the entire table must be examined (a table scan) to find any row. This can be acceptable when the table is tiny, such as a list of the 12 regional offices of a company.

2)         When a table is stored as a heap, individual rows are identified by reference to a row identifier (RID) consisting of the file number, data page number, and slot on the page. The row id is a small and efficient structure. Sometimes data architects use heaps when data is always accessed through nonclustered indexes and the RID is smaller than a clustered index key.

4.     Will indexes require more spaces? – Nonclustered indexes do, but clustered index doesn’t!

 

--Comparison of the Space Usage by Different Table Structures

--Conclusion: nonclustered index table requires more space to store the non-clustered indexes.

 

--Case 1: A heap table, no indexes at all. 

 

USE [AdventureWorks2012]

GO

 

CREATE TABLE [Sales].[salesorderdetail_hash]

  (

     [salesorderid]          [INT] NOT NULL,

     [salesorderdetailid]    [INT],

     [carriertrackingnumber] [NVARCHAR](25) NULL,

     [orderqty]              [SMALLINT] NOT NULL,

     [productid]             [INT] NOT NULL,

     [specialofferid]        [INT] NOT NULL,

     [unitprice]             [MONEY] NOT NULL,

     [unitpricediscount]     [MONEY] NOT NULL,

     [linetotal]             NUMERIC(38, 6),

     [rowguid]               [UNIQUEIDENTIFIER],

     [modifieddate]          [DATETIME]

  )

ON [PRIMARY]

 

INSERT INTO [Sales].[salesorderdetail_hash]

SELECT *

FROM   [Sales].[salesorderdetail]

 

SELECT *

FROM   sys.Dm_db_index_physical_stats(Db_id('Adventureworks2012'),

              Object_id('[Sales].[SalesOrderDetail_hash]'), NULL, NULL, NULL)

 

/****Notice the heap tables requires 1494 pages

database_id         object_id   index_id  index_type_desc  alloc_unit_type_desc   page_count

7                          388196433    0         HEAP             IN_ROW_DATA            1494

*/

 

 

-- Case 2: The same table, but with a clustered index 

CREATE TABLE [Sales].[salesorderdetail_c]

  (

     [salesorderid]          [INT] NOT NULL,

     [salesorderdetailid]    [INT],

     [carriertrackingnumber] [NVARCHAR](25) NULL,

     [orderqty]              [SMALLINT] NOT NULL,

     [productid]             [INT] NOT NULL,

     [specialofferid]        [INT] NOT NULL,

     [unitprice]             [MONEY] NOT NULL,

     [unitpricediscount]     [MONEY] NOT NULL,

     [linetotal]             NUMERIC(38, 6),

     [rowguid]               [UNIQUEIDENTIFIER],

     [modifieddate]          [DATETIME],

     CONSTRAINT [PK_SalesOrderDetailC_SalesOrderID_SalesOrderDetailID] PRIMARY

     KEY CLUSTERED ( [salesorderid] ASC, [salesorderdetailid] ASC ) ON [PRIMARY]

  )

ON [PRIMARY]

 

INSERT INTO [Sales].[salesorderdetail_c]

SELECT *

FROM   [Sales].[salesorderdetail]

 

SELECT *

FROM   sys.Dm_db_index_physical_stats(Db_id('Adventureworks2012'),

              Object_id('[Sales].[SalesOrderDetail_C]'), NULL, NULL, NULL)

 

/****Notice the clustered-index tables also requires 1494 pages

database_id         object_id   index_id  index_type_desc  alloc_unit_type_desc   page_count

7                  404196490      1       CLUSTERED INDEX  IN_ROW_DATA            1494

*/

 

-- Case 3: The same table, but with a nonclustered index on the Primary Key columns. So it is a heap + a B-tree.

CREATE TABLE [Sales].[salesorderdetail_n]

  (

     [salesorderid]          [INT] NOT NULL,

     [salesorderdetailid]    [INT],

     [carriertrackingnumber] [NVARCHAR](25) NULL,

     [orderqty]              [SMALLINT] NOT NULL,

     [productid]             [INT] NOT NULL,

     [specialofferid]        [INT] NOT NULL,

     [unitprice]             [MONEY] NOT NULL,

     [unitpricediscount]     [MONEY] NOT NULL,

     [linetotal]             NUMERIC(38, 6),

     [rowguid]               [UNIQUEIDENTIFIER],

     [modifieddate]          [DATETIME],

     CONSTRAINT [PK_SalesOrderDetailN_SalesOrderID_SalesOrderDetailID] PRIMARY

     KEY NONCLUSTERED ( [salesorderid] ASC, [salesorderdetailid] ASC ) ON

     [PRIMARY]

  )

ON [PRIMARY]

 

INSERT INTO [Sales].[salesorderdetail_n]

SELECT *

FROM   [Sales].[salesorderdetail]

go

 

SELECT *

FROM   sys.Dm_db_index_physical_stats(Db_id('Adventureworks2012'),

              Object_id('[Sales].[SalesOrderDetail_N]'), NULL, NULL, NULL)

 

/****Notice the nonclustered-index tables requires additional 330 pages.

 

database_id         object_id   index_id  index_type_desc  alloc_unit_type_desc   page_count

7                  436196604      0       HEAP                                        IN_ROW_DATA        1494

7                  436196604      0       NONCLUSTERED INDEX     IN_ROW_DATA        330

*/

 

5.     Table Organization    

Tables and indexes are stored as a collection of 8-KB pages (but log data are not!).

 

The following illustration shows the organization of a table.

https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjNVa2tSLCdKBhNUbyLvFh4V4la_3iTL5jblnS5XzHnWTc_Mx6pKM5r2CvFWImUwQkPYTkxHziX2Fc03Ml6v17Zg9jeD59cdhlS3XsWaJo-Wu4PdayA4KBFF7zDy4oBWUOdP9IpXG32xto/s1600/Table+organization.gif

 

 

So what is partition?

 

A partition is a user-defined unit of data organization. By default, a table or index has only one partition that contains all the table or index pages.  Example:

 

SELECT * FROM sys.partitions WHERE object_id = object_id('[Sales].[SalesOrderDetail_N]')

 

/****Notice that this is a non-partitioned table, even with two index IDs

 

partition_id        object_id  index_id partition_number  hobt_id                            rows       

72057594070237184    36196604  0        1                    72057594070237184      121317       

72057594070302720    436196604 2       1                    72057594070302720      121317 

*/

 

When a table or index uses multiple partitions, the data is partitioned horizontally so that groups of rows are mapped into individual partitions, based on a specified column. The partitions can be put on one or more filegroups in the database. You can use sys.partitions to view the partitions used by a table or index.

 

When a heap or a clustered table has multiple partitions, each partition has a heap or B-tree structure that contains the group of rows for that specific partition. For example, if a clustered table has four partitions, there are four B-trees; one in each partition.

 

For an XML column in a table, one primary and several secondary XML indexes can be created. An XML index is a shredded and persisted representation of the XML binary large objects (BLOBs) in the xml data type column. XML indexes, different from the regular indexes, are stored as internal tables. You can use sys.xml_indexes and sys.internal_tables to find the information about xml indexes.