On the Journey to be a SQL Server Professional: Iterations for Operations That Must Be Done per Row

Problem Scenarios:

It should be understood that some tasks simply have to be handled by using iterative solutions. Consider management tasks that need to be done per object in a set, such as a set of databases, tables, or indexes. You need to query a catalog view or other system object to return the set of objects in question, iterate through the result rows one at a time, and then perform the task at hand per object. An example of such a management task is rebuilding indexes that have a higher level of fragmentation than a specific percentage that you decide on.

As another example of a task that requires an iterative solution, suppose that you have a stored procedure that performs some work for an input customer. The work involves multiple statements implemented in the procedure’s body. The logic cannot be implemented for multiple customers at once; it can be implemented only for a single customer.

The following code defines such a procedure, called Sales.ProcessCustomer.

USE TSQL2012;

IF OBJECT_ID('Sales.ProcessCustomer') IS NOT NULL

DROP PROC Sales.ProcessCustomer;

CREATE PROC Sales.ProcessCustomer ( @custid AS INT )

PRINT 'Processing customer ' + CAST(@custid AS VARCHAR(10));

The PRINT statement represents the part that would normally implement the work for the input customer. Suppose that you now need to write code that executes the stored procedure for each customer from the Sales.Customers table. You have to iterate through the customer rows one at a time, obtain the customer ID, and execute the procedure with that ID as input.

Three possible options to achieve the goal:

Option 1 – Using a Cursor

You can implement a solution by using a cursor. You first use the DECLARE command to declare the cursor based on a query that returns all customer IDs from the Sales.Customers table. You can use the FAST_FORWARD option to make it a read-only, forward-only cursor.

SET NOCOUNT ON;

DECLARE @curcustid AS INT;

DECLARE cust_cursor CURSOR FAST_FORWARD

FOR

SELECT custid

FROM Sales.Customers;

OPEN cust_cursor;

FETCH NEXT FROM cust_cursor INTO @curcustid;

WHILE @@FETCH_STATUS = 0

BEGIN

EXEC Sales.ProcessCustomer @custid = @curcustid;

FETCH NEXT FROM cust_cursor INTO @curcustid;

END;

CLOSE cust_cursor;

DEALLOCATE cust_cursor;

Option 2 – Using TOP(1) and Min()

You can also achieve the same task by using another iterative solution, but one that doesn’t use a cursor. You can use a query with a TOP (1) option ordered by the custid column to return the minimum customer ID. Then loop while the last query does not return a NULL. In each iteration of the loop, execute the stored procedure by using the current customer ID as input.

To get the next customer ID, issue a query with a TOP (1) option, where the custid column is greater than the previous one, ordered by custid. Here’s how the complete solution looks.

SET NOCOUNT ON;

DECLARE @curcustid AS INT;

SET @curcustid = ( SELECT TOP ( 1 )

custid

FROM Sales.Customers

ORDER BY custid

);

WHILE @curcustid IS NOT NULL

BEGIN

EXEC Sales.ProcessCustomer @custid = @curcustid;

SET @curcustid = ( SELECT TOP ( 1 )

custid

FROM Sales.Customers

WHERE custid > @curcustid

ORDER BY custid

);

END;

Option 3 – Using Min() and Max()

You may have considered another option; to retrieve the minimum and maximum customer IDs from the table, form a loop that keeps incrementing the current customer ID by 1 until it is equal to the maximum. If at any point gaps appear between existing customer IDs (due to deletions or aspects related to the generation of the keys), your code will end up trying to process customer IDs that don’t exist in your table. In short, this approach isn’t recommended even if currently there are no gaps between keys.

In terms of performance, the first solution (with the cursor) doesn’t really need any special indexes to support it. The second solution (without the cursor) does. You need an index on the custid column. If you don’t have one, each query with the TOP option will end up scanning all table rows and apply a TOP N sort in the plan. In other words, without an index, the second solution will perform badly. Even with an index in place, the second solution does more I/O operations than the first because it needs to perform a seek operation in the index per row.

In conclusion, if you need to do iterations for operations that must be done per row, the cursor option is probably the right choice.

(Source: Training Kit (Exam 70-461) Querying Microsoft SQL Server 2012 (MCSA))