On the Journey to be a SQL Server Professional: 4/12/15

Iterations for Operations That Must Be Done per Row

Problem Scenarios:

It should be understood that some tasks simply have to be handled by using iterative solutions. Consider management tasks that need to be done per object in a set, such as a set of databases, tables, or indexes. You need to query a catalog view or other system object to return the set of objects in question, iterate through the result rows one at a time, and then perform the task at hand per object. An example of such a management task is rebuilding indexes that have a higher level of fragmentation than a specific percentage that you decide on.

As another example of a task that requires an iterative solution, suppose that you have a stored procedure that performs some work for an input customer. The work involves multiple statements implemented in the procedure’s body. The logic cannot be implemented for multiple customers at once; it can be implemented only for a single customer.

The following code defines such a procedure, called Sales.ProcessCustomer.

USE TSQL2012;

IF OBJECT_ID('Sales.ProcessCustomer') IS NOT NULL

DROP PROC Sales.ProcessCustomer;

CREATE PROC Sales.ProcessCustomer ( @custid AS INT )

PRINT 'Processing customer ' + CAST(@custid AS VARCHAR(10));

The PRINT statement represents the part that would normally implement the work for the input customer. Suppose that you now need to write code that executes the stored procedure for each customer from the Sales.Customers table. You have to iterate through the customer rows one at a time, obtain the customer ID, and execute the procedure with that ID as input.

Three possible options to achieve the goal:

Option 1 – Using a Cursor

You can implement a solution by using a cursor. You first use the DECLARE command to declare the cursor based on a query that returns all customer IDs from the Sales.Customers table. You can use the FAST_FORWARD option to make it a read-only, forward-only cursor.

SET NOCOUNT ON;

DECLARE @curcustid AS INT;

DECLARE cust_cursor CURSOR FAST_FORWARD

FOR

SELECT custid

FROM Sales.Customers;

OPEN cust_cursor;

FETCH NEXT FROM cust_cursor INTO @curcustid;

WHILE @@FETCH_STATUS = 0

BEGIN

EXEC Sales.ProcessCustomer @custid = @curcustid;

FETCH NEXT FROM cust_cursor INTO @curcustid;

END;

CLOSE cust_cursor;

DEALLOCATE cust_cursor;

Option 2 – Using TOP(1) and Min()

You can also achieve the same task by using another iterative solution, but one that doesn’t use a cursor. You can use a query with a TOP (1) option ordered by the custid column to return the minimum customer ID. Then loop while the last query does not return a NULL. In each iteration of the loop, execute the stored procedure by using the current customer ID as input.

To get the next customer ID, issue a query with a TOP (1) option, where the custid column is greater than the previous one, ordered by custid. Here’s how the complete solution looks.

SET NOCOUNT ON;

DECLARE @curcustid AS INT;

SET @curcustid = ( SELECT TOP ( 1 )

custid

FROM Sales.Customers

ORDER BY custid

);

WHILE @curcustid IS NOT NULL

BEGIN

EXEC Sales.ProcessCustomer @custid = @curcustid;

SET @curcustid = ( SELECT TOP ( 1 )

custid

FROM Sales.Customers

WHERE custid > @curcustid

ORDER BY custid

);

END;

Option 3 – Using Min() and Max()

You may have considered another option; to retrieve the minimum and maximum customer IDs from the table, form a loop that keeps incrementing the current customer ID by 1 until it is equal to the maximum. If at any point gaps appear between existing customer IDs (due to deletions or aspects related to the generation of the keys), your code will end up trying to process customer IDs that don’t exist in your table. In short, this approach isn’t recommended even if currently there are no gaps between keys.

In terms of performance, the first solution (with the cursor) doesn’t really need any special indexes to support it. The second solution (without the cursor) does. You need an index on the custid column. If you don’t have one, each query with the TOP option will end up scanning all table rows and apply a TOP N sort in the plan. In other words, without an index, the second solution will perform badly. Even with an index in place, the second solution does more I/O operations than the first because it needs to perform a seek operation in the index per row.

In conclusion, if you need to do iterations for operations that must be done per row, the cursor option is probably the right choice.

(Source: Training Kit (Exam 70-461) Querying Microsoft SQL Server 2012 (MCSA))

How to handle database fragmentation?

Two Types of Fragmentation:

Internal Fragmentation: When records are stored non-contiguously inside the page, then it is called internal fragmentation. In other words, internal fragmentation is said to occur if there is unused space between records in a page. This fragmentation occurs through the process of data modifications (INSERT, UPDATE, and DELETE statements) that are made against the table and therefore, to the indexes defined on the table. As these modifications are not equally distributed among the rows of the table and indexes, the fullness of each page can vary over time. This unused space causes poor cache utilization and more I/O, which ultimately leads to poor query performance. You can control the internal fragmentation with the FILLFACTOR option for the leaf-level pages and with the PAD_INDEX option for the higher-level pages of the CREATE INDEX statement.

External Fragmentation: When on disk, the physical storage of pages and extents is not contiguous. When the extents of a table are not physically stored contiguously on disk, switching from one extent to another causes higher disk rotations, and this is called Extent Fragmentation. Index pages also maintain a logical order of pages inside the extent. Every index page is linked with previous and next page in the logical order of column data. However, because of Page Split, the pages turn into out-of-order pages. An out-of-order page is a page for which the next physical page allocated to the index is not the page pointed to by the next-page pointer in the current leaf page. This is called Logical Fragmentation.

External fragmentation mainly slows down scans, which should not be that frequent in OLTP environments; however, they are very important in the data warehousing area. Internal fragmentation is a problem in both scenarios because the table is much bigger than it would be with a sequential key.

How to estimate fragmentation?

You can check both the internal and external fragmentation with the DMF - sys.dm_db_index_physical_stats: the internal fragmentation - the avg_page_space_used_in_ percent column, and the external fragmentation - the avg_fragmentation_in_percent column.

SELECT OBJECT_NAME(OBJECT_ID) ,

index_id ,

index_type_desc ,

index_level ,

avg_fragmentation_in_percent ,

avg_page_space_used_in_percent ,

page_count

FROM sys.dm_db_index_physical_stats(DB_ID(N'AdventureWorks2014'), NULL, NULL, NULL, 'SAMPLED')

ORDER BY avg_fragmentation_in_percent DESC

How to reduce database fragmentation?

Reducing Fragmentation in a Heap: To reduce the fragmentation of a heap, create a clustered index on the table. Creating the clustered index, rearrange the records in an order, and then place the pages contiguously on disk.

Reducing Fragmentation in an Index: There are three choices for reducing external fragmentation, and we can choose one according to the percentage of fragmentation:

If avg_fragmentation_in_percent > 5% and < 30%, then use ALTER INDEXREORGANIZE: This statement is replacement for DBCC INDEXDEFRAG to reorder the leaf level pages of the index in a logical order. As this is an online operation, the index is available while the statement is running.
If avg_fragmentation_in_percent > 30%, then use ALTER INDEX REBUILD: This is replacement for DBCC DBREINDEX to rebuild the index online or offline. In such case, we can also use the drop and re-create index method.
NOT recommended - Drop and re-create the clustered index: Re-creating a clustered index redistributes the data and results in full data pages. The level of fullness can be configured by using the FILLFACTOR option in CREATE INDEX. See more on this option from a reply from Paul Randal:

Don’t ever suggest create/drop clustered index for heap fragmentation removal, or drop/create of a clustered index for clustered index fragmentation removal. That’s almost the worst advice you can give. Both will cause all non-clustered indexes to be removed twice.

Don’t ever suggest drop/recreate nonclustered index either – it allows any constraints being enforced to be broken.

Rebuild or reorganize, using the thresholds I put in Books Online, or whatever works for you.

References:
1. http://blog.sqlauthority.com/2010/01/12/sql-server-fragmentation-detect-fragmentation-and-eliminate-fragmentation/)
2. Training Kit (Exam 70-461) Querying Microsoft SQL Server 2012 (MCSA)

Performance Tune-up with DMVs

Imagine that you get complaints from end users about SQL Server performance. You have to start investigating the problem immediately. How would you start?

In a production system, end users can submit thousands of queries per hour. Which query would you analyze first?

You could try to start an Extended Events monitoring session. You could use SQL Trace. In both cases, you would have to wait for quite a while before you gather enough data to start analysis and find the most problematic queries. And what if the problematic queries are not executed soon again after you start your monitoring session? You could only hope that you would be able to catch the problems in a reasonable time.

This is the point at which DMOs become extremely helpful. With DMOs, a lot of the data that you need is already gathered. All you need to do is query appropriate DMOs with regular T-SQL queries and extract useful information. DMOs are not materialized in any database; DMOs are virtual objects that give you access to the data SQL Server collects in memory.

Although DMOs are really useful, they have some drawbacks. The most important issue you should take care of is when the last restart of the instance you are inspecting occurred.

Cumulative information is useless if the instance was restarted recently.

You can start an analyzing session by gathering some system information about your instance, as the following query shows.

SELECT cpu_count AS logical_cpu_count,

cpu_count / hyperthread_ratio AS physical_cpu_count,

CAST(physical_memory_kb / 1024. AS int) AS physical_memory__mb,

sqlserver_start_time

FROM sys.dm_os_sys_info;

The SQLOS-related sys.dm_os_waiting_tasks DMO gives you information about sessions that are currently waiting on something. For example, the sessions could be blocked by another session because of locking. You can join this DMO to the execution-related sys.dm_exec_sessions DMO to get information about the user, host, and application that are waiting.

You can also use the is_user_process flag from the sys.dm_exec_sessions DMO to filter out system sessions. The following query gives this information.

SELECT S.login_name, S.host_name, S.program_name,

WT.session_id, WT.wait_duration_ms, WT.wait_type,

WT.blocking_session_id, WT.resource_description

FROM sys.dm_os_waiting_tasks AS WT

INNER JOIN sys.dm_exec_sessions AS S

ON WT.session_id = S.session_id

WHERE s.is_user_process = 1;

The sys.dm_exec_requests execution-related DMO returns information about currently executing requests. It includes a column called sql_handle, which is a hash map of the T-SQL batch text that is executed. You can use this handle to retrieve the complete text of the batch with the help of the execution-related sys.dm_exec_sql_text dynamic management function that accepts this handle as a parameter. The following query joins information about current requests, their waits, and text of their SQL batch with the sys.dm_exec_sessions dynamic management view to also get user, host, and application info.

SELECT S.login_name, S.host_name, S.program_name,

R.command, T.text,

R.wait_type, R.wait_time, R.blocking_session_id

FROM sys.dm_exec_requests AS R

INNER JOIN sys.dm_exec_sessions AS S

ON R.session_id = S.session_id

OUTER APPLY sys.dm_exec_sql_text(R.sql_handle) AS T

WHERE S.is_user_process = 1;

You can retrieve a lot of information about executed queries from the execution-related sys.dm_exec_query_stats DMO. You can retrieve information about disk IO per query, CPU consumption per query, elapsed time per query, and more. With the help of the sys.dm_exec_sql_text DMO, you can retrieve the text of the query as well. You can extract specific query text from batch text with the help of the statement_start_offset and statement_end_offset columns from the sys.dm_exec_query_stats DMO. The extraction is somewhat tricky. The

following query lists five queries that used the most logical disk IO with their query text extracted from the batch text.

SELECT TOP (5)

(total_logical_reads + total_logical_writes) AS total_logical_IO,

execution_count,

(total_logical_reads/execution_count) AS avg_logical_reads,

(total_logical_writes/execution_count) AS avg_logical_writes,

(SELECT SUBSTRING(text, statement_start_offset/2 + 1,

(CASE WHEN statement_end_offset = -1

THEN LEN(CONVERT(nvarchar(MAX),text)) * 2

ELSE statement_end_offset

END - statement_start_offset)/2)

FROM sys.dm_exec_sql_text(sql_handle)) AS query_text

FROM sys.dm_exec_query_stats

ORDER BY (total_logical_reads + total_logical_writes) DESC;

Find nonclustered indexes that were not used from the last start of the instance by using the following query. It is important is to have a representative sample of queries collected before using DMOs that return cumulative values.

SELECT OBJECT_NAME(I.object_id) AS objectname,

I.name AS indexname,

I.index_id AS indexid

FROM sys.indexes AS I

INNER JOIN sys.objects AS O

ON O.object_id = I.object_id

WHERE I.object_id > 100

AND I.type_desc = 'NONCLUSTERED'

AND I.index_id NOT IN

(SELECT S.index_id

FROM sys.dm_db_index_usage_stats AS S

WHERE S.object_id=I.object_id

AND I.index_id=S.index_id

AND database_id = DB_ID('TSQL2012'))

ORDER BY objectname, indexname;

In the following exercise, you find missing indexes.

1. Quickly create the table and the index on that table from the practice for the previous lesson, but use 10 times less data. Then index it and execute the query that could benefit from an additional index. Here is the code.

SELECT N1.n * 100000 + O.orderid AS norderid,

O.*

INTO dbo.NewOrders

FROM Sales.Orders AS O

CROSS JOIN (VALUES(1),(2),(3)) AS N1(n);

CREATE NONCLUSTERED INDEX idx_nc_orderid

ON dbo.NewOrders(orderid);

SELECT norderid

FROM dbo.NewOrders

WHERE norderid = 110248

ORDER BY norderid;

2. Find missing indexes by using index-related DMOs. Use the following query.

SELECT MID.statement AS [Database.Schema.Table],

MIC.column_id AS ColumnId,

MIC.column_name AS ColumnName,

MIC.column_usage AS ColumnUsage,

MIGS.user_seeks AS UserSeeks,

MIGS.user_scans AS UserScans,

MIGS.last_user_seek AS LastUserSeek,

MIGS.avg_total_user_cost AS AvgQueryCostReduction,

MIGS.avg_user_impact AS AvgPctBenefit

FROM sys.dm_db_missing_index_details AS MID

CROSS APPLY sys.dm_db_missing_index_columns (MID.index_handle) AS MIC

INNER JOIN sys.dm_db_missing_index_groups AS MIG

ON MIG.index_handle = MID.index_handle

INNER JOIN sys.dm_db_missing_index_group_stats AS MIGS

ON MIG.index_group_handle=MIGS.group_handle

ORDER BY MIGS.avg_user_impact DESC;

(Source: Training Kit (Exam 70-461) Querying Microsoft SQL Server 2012 (MCSA) (Microsoft Press Training Kit) by Dejan Sarka, Itzik Ben-Gan, Ron Talmage)

How to back up all of the databases with a stored procedure?

IF OBJECT_ID('dbo.BackupDatabases', 'P') IS NOT NULL

DROP PROCEDURE dbo.BackupDatabases;

CREATE PROCEDURE dbo.BackupDatabases

@databasetype AS NVARCHAR(30)

BEGIN

DECLARE @databasename AS NVARCHAR(128) ,

@timecomponent AS NVARCHAR(50) ,

@sqlcommand AS NVARCHAR(1000);

IF @databasetype NOT IN ( 'User', 'System' )

BEGIN

THROW 50000, 'dbo.BackupDatabases: @databasename must be User or System', 0;

RETURN;

END;

IF @databasetype = 'System'

SET @databasename = ( SELECT MIN(name)

FROM sys.databases

WHERE name IN ( 'master', 'model',

'msdb' )

);

ELSE

SET @databasename = ( SELECT MIN(name)

FROM sys.databases

WHERE name NOT IN ( 'master', 'model',

'msdb', 'tempdb' )

);

WHILE @databasename IS NOT NULL

BEGIN

SET @timecomponent = REPLACE(REPLACE(REPLACE(CONVERT(NVARCHAR, GETDATE(), 120),

' ', '_'), ':',

''), '-', '');

SET @sqlcommand = 'BACKUP DATABASE ' + @databasename

+ ' TO DISK =

''C:\Backups\' + @databasename + '_' + @timecomponent + '.bak''';

PRINT @sqlcommand;

--EXEC(@sqlcommand);

IF @databasetype = 'System'

SET @databasename = ( SELECT MIN(name)

FROM sys.databases

WHERE name IN ( 'master',

'model', 'msdb' )

AND name > @databasename

);

ELSE

SET @databasename = ( SELECT MIN(name)

FROM sys.databases

WHERE name NOT IN ( 'master',

'model', 'msdb',

'tempdb' )

AND name > @databasename

);

END;

RETURN;

END

Subscribe to: Posts ( Atom )