Iterations for Operations That Must Be Done per Row

Problem Scenarios:

It should be understood that some tasks simply have to be handled by using iterative solutions. Consider management tasks that need to be done per object in a set, such as a set of databases, tables, or indexes. You need to query a catalog view or other system object to return the set of objects in question, iterate through the result rows one at a time, and then perform the task at hand per object. An example of such a management task is rebuilding indexes that have a higher level of fragmentation than a specific percentage that you decide on.

As another example of a task that requires an iterative solution, suppose that you have a stored procedure that performs some work for an input customer. The work involves multiple statements implemented in the procedure’s body. The logic cannot be implemented for multiple customers at once; it can be implemented only for a single customer.
The following code defines such a procedure, called Sales.ProcessCustomer.

USE TSQL2012;
IF OBJECT_ID('Sales.ProcessCustomer') IS NOT NULL
    DROP PROC Sales.ProcessCustomer;
GO
CREATE PROC Sales.ProcessCustomer ( @custid AS INT )
AS
    PRINT 'Processing customer ' + CAST(@custid AS VARCHAR(10));
GO

The PRINT statement represents the part that would normally implement the work for the input customer. Suppose that you now need to write code that executes the stored procedure for each customer from the Sales.Customers table. You have to iterate through the customer rows one at a time, obtain the customer ID, and execute the procedure with that ID as input.

Three possible options to achieve the goal:

Option 1 – Using a Cursor

You can implement a solution by using a cursor. You first use the DECLARE command to declare the cursor based on a query that returns all customer IDs from the Sales.Customers table. You can use the FAST_FORWARD option to make it a read-only, forward-only cursor.

SET NOCOUNT ON;
DECLARE @curcustid AS INT;
DECLARE cust_cursor CURSOR FAST_FORWARD
FOR
    SELECT  custid
    FROM    Sales.Customers;

OPEN cust_cursor;
FETCH NEXT FROM cust_cursor INTO @curcustid;
WHILE @@FETCH_STATUS = 0
    BEGIN
        EXEC Sales.ProcessCustomer @custid = @curcustid;
        FETCH NEXT FROM cust_cursor INTO @curcustid;
    END;
CLOSE cust_cursor;
DEALLOCATE cust_cursor;
GO

Option 2 – Using TOP(1) and Min()

You can also achieve the same task by using another iterative solution, but one that doesn’t use a cursor. You can use a query with a TOP (1) option ordered by the custid column to return the minimum customer ID. Then loop while the last query does not return a NULL. In each iteration of the loop, execute the stored procedure by using the current customer ID as input.

To get the next customer ID, issue a query with a TOP (1) option, where the custid column is greater than the previous one, ordered by custid. Here’s how the complete solution looks.

SET NOCOUNT ON;
DECLARE @curcustid AS INT;
SET @curcustid = ( SELECT TOP ( 1 )
                            custid
                   FROM     Sales.Customers
                   ORDER BY custid
                 );

WHILE @curcustid IS NOT NULL
    BEGIN
        EXEC Sales.ProcessCustomer @custid = @curcustid;
        SET @curcustid = ( SELECT TOP ( 1 )
                                    custid
                           FROM     Sales.Customers
                           WHERE    custid > @curcustid
                           ORDER BY custid
                         );
    END;
GO

Option 3 – Using Min() and Max()

You may have considered another option; to retrieve the minimum and maximum customer IDs from the table, form a loop that keeps incrementing the current customer ID by 1 until it is equal to the maximum. If at any point gaps appear between existing customer IDs (due to deletions or aspects related to the generation of the keys), your code will end up trying to process customer IDs that don’t exist in your table. In short, this approach isn’t recommended even if currently there are no gaps between keys.

In terms of performance, the first solution (with the cursor) doesn’t really need any special indexes to support it. The second solution (without the cursor) does. You need an index on the custid column. If you don’t have one, each query with the TOP option will end up scanning all table rows and apply a TOP N sort in the plan. In other words, without an index, the second solution will perform badly. Even with an index in place, the second solution does more I/O operations than the first because it needs to perform a seek operation in the index per row.

In conclusion, if you need to do iterations for operations that must be done per row, the cursor option is probably the right choice.

(Source: Training Kit (Exam 70-461) Querying Microsoft SQL Server 2012 (MCSA))



How to handle database fragmentation?

Two Types of Fragmentation:
  • Internal Fragmentation: When records are stored non-contiguously inside the page, then it is called internal fragmentation. In other words, internal fragmentation is said to occur if there is unused space between records in a page. This fragmentation occurs through the process of data modifications (INSERT, UPDATE, and DELETE statements) that are made against the table and therefore, to the indexes defined on the table. As these modifications are not equally distributed among the rows of the table and indexes, the fullness of each page can vary over time. This unused space causes poor cache utilization and more I/O, which ultimately leads to poor query performance. You can control the internal fragmentation with the FILLFACTOR option for the leaf-level pages and with the PAD_INDEX option for the higher-level pages of the CREATE INDEX statement.
  • External Fragmentation: When on disk, the physical storage of pages and extents is not contiguous. When the extents of a table are not physically stored contiguously on disk, switching from one extent to another causes higher disk rotations, and this is called Extent Fragmentation. Index pages also maintain a logical order of pages inside the extent. Every index page is linked with previous and next page in the logical order of column data. However, because of Page Split, the pages turn into out-of-order pages. An out-of-order page is a page for which the next physical page allocated to the index is not the page pointed to by the next-page pointer in the current leaf page. This is called Logical Fragmentation.
External fragmentation mainly slows down scans, which should not be that frequent in OLTP environments; however, they are very important in the data warehousing area. Internal fragmentation is a problem in both scenarios because the table is much bigger than it would be with a sequential key.

How to estimate fragmentation?
You can check both the internal and external fragmentation with the DMF - sys.dm_db_index_physical_stats: the internal fragmentation - the avg_page_space_used_in_ percent column, and the external fragmentation - the avg_fragmentation_in_percent column.

SELECT  OBJECT_NAME(OBJECT_ID) ,
        index_id ,
        index_type_desc ,
        index_level ,
        avg_fragmentation_in_percent ,
        avg_page_space_used_in_percent ,
        page_count
FROM    sys.dm_db_index_physical_stats(DB_ID(N'AdventureWorks2014'), NULL, NULL, NULL, 'SAMPLED')
ORDER BY avg_fragmentation_in_percent DESC

How to reduce database fragmentation?
  • Reducing Fragmentation in a Heap: To reduce the fragmentation of a heap, create a clustered index on the table. Creating the clustered index, rearrange the records in an order, and then place the pages contiguously on disk.
  • Reducing Fragmentation in an Index: There are three choices for reducing external fragmentation, and we can choose one according to the percentage of fragmentation:
    • If avg_fragmentation_in_percent > 5% and < 30%, then use ALTER INDEXREORGANIZE: This statement is replacement for DBCC INDEXDEFRAG to reorder the leaf level pages of the index in a logical order. As this is an online operation, the index is available while the statement is running.
    • If avg_fragmentation_in_percent > 30%, then use ALTER INDEX REBUILD: This is replacement for DBCC DBREINDEX to rebuild the index online or offline. In such case, we can also use the drop and re-create index method.
    • NOT recommended - Drop and re-create the clustered index: Re-creating a clustered index redistributes the data and results in full data pages. The level of fullness can be configured by using the FILLFACTOR option in CREATE INDEX. See more on this option from a reply from Paul Randal:
Don’t ever suggest create/drop clustered index for heap fragmentation removal, or drop/create of a clustered index for clustered index fragmentation removal. That’s almost the worst advice you can give. Both will cause all non-clustered indexes to be removed twice.

Don’t ever suggest drop/recreate nonclustered index either – it allows any constraints being enforced to be broken.

Rebuild or reorganize, using the thresholds I put in Books Online, or whatever works for you.

References:
1. http://blog.sqlauthority.com/2010/01/12/sql-server-fragmentation-detect-fragmentation-and-eliminate-fragmentation/)
2. Training Kit (Exam 70-461) Querying Microsoft SQL Server 2012 (MCSA)  


Performance Tune-up with DMVs

Imagine that you get complaints from end users about SQL Server performance. You have to start investigating the problem immediately. How would you start? 

In a production system, end users can submit thousands of queries per hour. Which query would you analyze first?

You could try to start an Extended Events monitoring session. You could use SQL Trace. In both cases, you would have to wait for quite a while before you gather enough data to start analysis and find the most problematic queries. And what if the problematic queries are not executed soon again after you start your monitoring session? You could only hope that you would be able to catch the problems in a reasonable time.

This is the point at which DMOs become extremely helpful. With DMOs, a lot of the data that you need is already gathered. All you need to do is query appropriate DMOs with regular T-SQL queries and extract useful information. DMOs are not materialized in any database; DMOs are virtual objects that give you access to the data SQL Server collects in memory. 

Although DMOs are really useful, they have some drawbacks. The most important issue you should take care of is when the last restart of the instance you are inspecting occurred.

Cumulative information is useless if the instance was restarted recently.
You can start an analyzing session by gathering some system information about your instance, as the following query shows.

SELECT cpu_count AS logical_cpu_count,
cpu_count / hyperthread_ratio AS physical_cpu_count,
CAST(physical_memory_kb / 1024. AS int) AS physical_memory__mb,
sqlserver_start_time
FROM sys.dm_os_sys_info;

The SQLOS-related sys.dm_os_waiting_tasks DMO gives you information about sessions that are currently waiting on something. For example, the sessions could be blocked by another session because of locking. You can join this DMO to the execution-related sys.dm_exec_sessions DMO to get information about the user, host, and application that are waiting.

You can also use the is_user_process flag from the sys.dm_exec_sessions DMO to filter out system sessions. The following query gives this information.

SELECT S.login_name, S.host_name, S.program_name,
WT.session_id, WT.wait_duration_ms, WT.wait_type,
WT.blocking_session_id, WT.resource_description
FROM sys.dm_os_waiting_tasks AS WT
INNER JOIN sys.dm_exec_sessions AS S
ON WT.session_id = S.session_id
WHERE s.is_user_process = 1;

The sys.dm_exec_requests execution-related DMO returns information about currently executing requests. It includes a column called sql_handle, which is a hash map of the T-SQL batch text that is executed. You can use this handle to retrieve the complete text of the batch with the help of the execution-related sys.dm_exec_sql_text dynamic management function that accepts this handle as a parameter. The following query joins information about current requests, their waits, and text of their SQL batch with the sys.dm_exec_sessions dynamic management view to also get user, host, and application info.

SELECT S.login_name, S.host_name, S.program_name,
R.command, T.text,
R.wait_type, R.wait_time, R.blocking_session_id
FROM sys.dm_exec_requests AS R
INNER JOIN sys.dm_exec_sessions AS S
ON R.session_id = S.session_id
OUTER APPLY sys.dm_exec_sql_text(R.sql_handle) AS T
WHERE S.is_user_process = 1;

You can retrieve a lot of information about executed queries from the execution-related sys.dm_exec_query_stats DMO. You can retrieve information about disk IO per query, CPU consumption per query, elapsed time per query, and more. With the help of the sys.dm_exec_sql_text DMO, you can retrieve the text of the query as well. You can extract specific query text from batch text with the help of the statement_start_offset and statement_end_offset columns from the sys.dm_exec_query_stats DMO. The extraction is somewhat tricky. The
following query lists five queries that used the most logical disk IO with their query text extracted from the batch text.

SELECT TOP (5)
(total_logical_reads + total_logical_writes) AS total_logical_IO,
execution_count,
(total_logical_reads/execution_count) AS avg_logical_reads,
(total_logical_writes/execution_count) AS avg_logical_writes,
(SELECT SUBSTRING(text, statement_start_offset/2 + 1,
(CASE WHEN statement_end_offset = -1
THEN LEN(CONVERT(nvarchar(MAX),text)) * 2
ELSE statement_end_offset
END - statement_start_offset)/2)
FROM sys.dm_exec_sql_text(sql_handle)) AS query_text
FROM sys.dm_exec_query_stats
ORDER BY (total_logical_reads + total_logical_writes) DESC;

Find nonclustered indexes that were not used from the last start of the instance by using the following query. It is important is to have a representative sample of queries collected before using DMOs that return cumulative values.

SELECT OBJECT_NAME(I.object_id) AS objectname,
I.name AS indexname,
I.index_id AS indexid
FROM sys.indexes AS I
INNER JOIN sys.objects AS O
ON O.object_id = I.object_id
WHERE I.object_id > 100
AND I.type_desc = 'NONCLUSTERED'
AND I.index_id NOT IN
(SELECT S.index_id
FROM sys.dm_db_index_usage_stats AS S
WHERE S.object_id=I.object_id
AND I.index_id=S.index_id
AND database_id = DB_ID('TSQL2012'))
ORDER BY objectname, indexname;

In the following exercise, you find missing indexes.

1.  Quickly create the table and the index on that table from the practice for the previous lesson, but use 10 times less data. Then index it and execute the query that could benefit from an additional index. Here is the code.

SELECT N1.n * 100000 + O.orderid AS norderid,
O.*
INTO dbo.NewOrders
FROM Sales.Orders AS O
CROSS JOIN (VALUES(1),(2),(3)) AS N1(n);
GO
CREATE NONCLUSTERED INDEX idx_nc_orderid
ON dbo.NewOrders(orderid);
GO
SELECT norderid
FROM dbo.NewOrders
WHERE norderid = 110248
ORDER BY norderid;
GO

2.  Find missing indexes by using index-related DMOs. Use the following query.

SELECT MID.statement AS [Database.Schema.Table],
MIC.column_id AS ColumnId,
MIC.column_name AS ColumnName,
MIC.column_usage AS ColumnUsage,
MIGS.user_seeks AS UserSeeks,
MIGS.user_scans AS UserScans,
MIGS.last_user_seek AS LastUserSeek,
MIGS.avg_total_user_cost AS AvgQueryCostReduction,
MIGS.avg_user_impact AS AvgPctBenefit
FROM sys.dm_db_missing_index_details AS MID
CROSS APPLY sys.dm_db_missing_index_columns (MID.index_handle) AS MIC
INNER JOIN sys.dm_db_missing_index_groups AS MIG
ON MIG.index_handle = MID.index_handle
INNER JOIN sys.dm_db_missing_index_group_stats AS MIGS
ON MIG.index_group_handle=MIGS.group_handle
ORDER BY MIGS.avg_user_impact DESC;

(Source: Training Kit (Exam 70-461) Querying Microsoft SQL Server 2012 (MCSA) (Microsoft Press Training Kit) by Dejan SarkaItzik Ben-Gan, Ron Talmage)

How to back up all of the databases with a stored procedure?

IF OBJECT_ID('dbo.BackupDatabases', 'P') IS NOT NULL
    DROP PROCEDURE dbo.BackupDatabases;
GO
CREATE PROCEDURE dbo.BackupDatabases
    @databasetype AS NVARCHAR(30)
AS
    BEGIN
        DECLARE @databasename AS NVARCHAR(128) ,
            @timecomponent AS NVARCHAR(50) ,
            @sqlcommand AS NVARCHAR(1000);
        IF @databasetype NOT IN ( 'User', 'System' )
            BEGIN
                THROW 50000, 'dbo.BackupDatabases: @databasename must be User or System', 0;
                RETURN;
            END;
        IF @databasetype = 'System'
            SET @databasename = ( SELECT    MIN(name)
                                  FROM      sys.databases
                                  WHERE     name IN ( 'master', 'model',
                                                      'msdb' )
                                );
        ELSE
            SET @databasename = ( SELECT    MIN(name)
                                  FROM      sys.databases
                                  WHERE     name NOT IN ( 'master', 'model',
                                                          'msdb', 'tempdb' )
                                );
        WHILE @databasename IS NOT NULL
            BEGIN
                SET @timecomponent = REPLACE(REPLACE(REPLACE(CONVERT(NVARCHAR, GETDATE(), 120),
                                                             ' ', '_'), ':',
                                                     ''), '-', '');
                SET @sqlcommand = 'BACKUP DATABASE ' + @databasename
                    + ' TO DISK =
''C:\Backups\' + @databasename + '_' + @timecomponent + '.bak''';
                PRINT @sqlcommand;
--EXEC(@sqlcommand);
                IF @databasetype = 'System'
                    SET @databasename = ( SELECT    MIN(name)
                                          FROM      sys.databases
                                          WHERE     name IN ( 'master',
                                                              'model', 'msdb' )
                                                    AND name > @databasename
                                        );
                ELSE
                    SET @databasename = ( SELECT    MIN(name)
                                          FROM      sys.databases
                                          WHERE     name NOT IN ( 'master',
                                                              'model', 'msdb',
                                                              'tempdb' )
                                                    AND name > @databasename
                                        );
            END;
        RETURN;
    END
GO