Case Scenario 1 (Chapter 1 - Case Scenario 1): Importance of Theory
You and a colleague on your team get into a discussion about the importance of understanding the theoretical foundations of T-SQL. Your colleague argues that there’s no point in understanding the foundations, and that it’s enough to just learn the technical aspects of T-SQL to be a good developer and to write correct code. Answer the following questions posed to you by your colleague:- Can you give an example for an element from set theory that can improve your understanding of T-SQL? Answer.
-
Can you explain why understanding the relational model is important for people who
write T-SQL code?
Answer.
Even though T-SQL is based on the relational model, it deviates from it in a number of ways. But it gives you enough tools that if you understand the relational model, you can write in a relational way. Following the relational model helps you write code more correctly. Here are some examples :
■ You shouldn’t rely on order of columns or rows.
■ You should always name result columns.
■ You should eliminate duplicates if they are possible in the result of your query.
Hide the Answer.
One of most typical mistakes that T-SQL developers make is to assume that a query
without an ORDER BY clause always returns the data in a certain order—for example,
clustered index order. But if you understand that in set theory, a set has no particular
order to its elements, you know that you shouldn’t make such assumptions. The only
way in SQL to guarantee that the rows will be returned in a certain order is to add an
ORDER BY clause. That’s just one of many examples for aspects of T-SQL that can be
better understood if you understand the foundations of the language.
Hide the Answer.
Hide the Answer.
Case Scenario 2 (Chapter 1 - Case Scenario 2): Interviewing for a Code Reviewer Position
You are interviewed for a position as a code reviewer to help improve code quality. The organization’s application has queries written by untrained people. The queries have numerous problems, including logical bugs. Your interviewer poses a number of questions and asks for a concise answer of a few sentences to each question. Answer the following questions addressed to you by your interviewer:- Is it important to use standard code when possible, and why? Answer.
- We have many queries that use ordinal positions in the ORDER BY clause. Is that a bad practice, and if so why? Answer.
- If a query doesn’t have an ORDER BY clause, what is the order in which the records are returned? Answer.
- Would you recommend putting a DISTINCT clause in every query? Answer.
It is important to use standard SQL code. This way, both the code and people’s knowledge is more portable. Especially in cases where there are both standard and nonstan-
dard forms for a language element, it’s recommended to use the standard form.
Hide the Answer.
Hide the Answer.
Using ordinal positions in the ORDER BY clause is a bad practice. From a relational perspective, you are supposed to refer to attributes by name, and not by ordinal position.
Also, what if the SELECT list is revised in the future and the developer forgets to revise
the ORDER BY list accordingly?
Hide the Answer.
Hide the Answer.
When the query doesn’t have an ORDER BY clause, there are no assurances for any
particular order in the result. The order should be considered arbitrary. You also notice
that the interviewer used the incorrect term record instead of row. You might want to
mention something about this, because the interviewer may have done so on purpose
to test you.
Hide the Answer.
Hide the Answer.
From a pure relational perspective, this actually could be valid, and perhaps even
recommended. But from a practical perspective, there is the chance that SQL Server
will try to remove duplicates even when there are none, and this will incur extra cost.
Therefore, it is recommended that you add the DISTINCT clause only when duplicates
are possible in the result and you’re not supposed to return the duplicates.
Hide the Answer.
Hide the Answer.
Case Scenario 3 (Chapter 2 - Case Scenario 1): Reviewing the Use of Types
You are hired as a consultant to help address performance issues in an existing system. The system was developed originally by using SQL Server 2005 and has recently been upgraded to SQL Server 2012. Write rates in the system are fairly low, and their performance is more than adequate. Also, write performance is not a priority. However, read performance is a priority, and currently it is not satisfactory. One of the main goals of the consulting engagement is to provide recommendations that will help improve read performance. You have a meeting with representatives of the customer, and they ask for your recommendations in different potential areas for improvement. One of the areas they inquire about is the use of data types. Your task is to respond to the following customer queries:- We have many attributes that represent a date, like order date, invoice date, and so on, and currently we use the DATETIME data type for those. Do you recommend sticking to the existing type or replacing it with another? Any other recommendations along similar lines? Answer.
- We have our own custom table partitioning solution because we’re using the Standard edition of SQL Server. We use a surrogate key of a UNIQUEIDENTIFIER type with the NEWID function invoked by a default constraint expression as the primary key for the tables. We chose this approach because we do not want keys to conflict across the different tables. This primary key is also our clustered index key. Do you have any recom- mendations concerning our choice of a key? Answer.
The DATETIME data type uses 8 bytes of storage. SQL Server 2012 supports the DATE
data type, which uses 3 bytes of storage. In all those attributes that represent a date
only, it is recommended to switch to using DATE. The lower the storage requirement,
the better the reads can perform.
As for other recommendations, the general rule “smaller is better, provided that you cover the needs of the attribute in the long run” is suitable for read performance. For example, if you have descriptions of varying lengths stored in a CHAR or NCHAR type, consider switching to VARCHAR or NVARCHAR, respectively. Also, if you’re currently using Unicode types but need to store strings of only one language—say, US English— consider using regular characters instead.
Hide the Answer.
As for other recommendations, the general rule “smaller is better, provided that you cover the needs of the attribute in the long run” is suitable for read performance. For example, if you have descriptions of varying lengths stored in a CHAR or NCHAR type, consider switching to VARCHAR or NVARCHAR, respectively. Also, if you’re currently using Unicode types but need to store strings of only one language—say, US English— consider using regular characters instead.
Hide the Answer.
For one, the UNIQUEIDENTIFIER type is large—16 bytes. And because it’s also the
clustered index key, it is copied to all nonclustered indexes. Also, due to the random
order in which the NEWID function generates values, there’s probably a high level
of fragmentation in the index. A different approach to consider (and test!) is switching to an integer type and using the sequence object to generate keys that do not
conflict across tables. Due to the reduced size of the type, with the multiplied effect
on nonclustered indexes, performance of reads will likely improve. The values will be
increasing, and as a result, there will be less fragmentation, which will also likely have a
positive effect on reads.
Hide the Answer.
Hide the Answer.
Case Scenario 4 (Chapter 2 - Case Scenario 2): Reviewing the Use of Functions
The same company who hired you to review their use of data types would like you to also review their use of functions. They pose the following question to you:■ Our application has worked with SQL Server so far, but due to a recent merger with another company, we need to support other database platforms as well. What can you recommend in terms of use of functions? Answer.
To improve the portability of the code, it’s important to use standard code when possible, and this of course applies more specifically to the use of built-in functions. For
example, use COALESCE and not ISNULL, use CURRENT_TIMESTAMP and not GETDATE,
and use CASE and not IIF.
Hide the Answer.
Hide the Answer.
Case Scenario 5 (Chapter 3 - Case Scenario 1): Filtering and Sorting Performance Recommendations
You are hired as a consultant to help address query performance problems in a beer factory running SQL Server 2012. You trace a typical workload submitted to the system and observe very slow query run times. You see a lot of network traffic. You see that many queries return all rows to the client and then the client handles the filtering. Queries that do filter data often manipulate the filtered columns. All queries have ORDER BY clauses, and when you inquire about this, you are told that it’s not really needed, but the developers got accustomed to doing so—just in case. You identify a lot of expensive sort operations. The customer is looking for recommendations to improve performance and asks you the following questions:- Can anything be done to improve the way filtering is handled? Answer.
- Is there any harm in specifying ORDER BY even when the data doesn’t need to be returned ordered? Answer.
- Any recommendations related to queries with TOP and OFFSET-FETCH? Answer.
For one thing, as much filtering as possible should be done in the database. Doing most
of the filtering in the client means that you’re scanning more data, which increases the
stress on the storage subsystem, and also that you cause unnecessary network traffic. When you do filter in the databases, for example by using the WHERE clause, you
should use search arguments that increase the likelihood for efficient use of indexes.
You should try as much as possible to avoid manipulating the filtered columns.
Hide the Answer.
Hide the Answer.
Adding an ORDER BY clause means that SQL Server needs to guarantee returning the
rows in the requested order. If there are no existing indexes to support the ordering
requirements, SQL Server will have no choice but to sort the data. Sorting is expensive
with large sets. So the general recommendation is to avoid adding ORDER BY clauses
to queries when there are no ordering requirements. And when you do need to return
the rows in a particular order, consider arranging supporting indexes that can prevent
SQL Server from needing to perform expensive sort operations.
Hide the Answer.
Hide the Answer.
The main way to help queries with TOP and OFFSET-FETCH perform well is by arranging indexes to support the ordering elements. This can prevent scanning all data, in
addition to sorting.
Hide the Answer.
Hide the Answer.
Case Scenario 6 (Chapter 3 - Case Scenario 2): Tutoring a Junior Developer
You are tutoring a junior developer regarding filtering and sorting data with T-SQL. The developer seems to be confused about certain topics and poses some questions to you. Answer the following to the best of your knowledge:- When I try to refer to a column alias that I defined in the SELECT list in the WHERE clause, I get an error. Can you explain why this isn’t allowed and what the workarounds are? Answer.
- Referring to a column alias in the ORDER BY clause seems to be supported. Why is that? Answer.
- Why is it that Microsoft made it mandatory to specify an ORDER BY clause when using OFFSET-FETCH but not when using TOP? Does this mean that only TOP queries can have nondeterministic ordering? Answer.
To be able to understand why you can’t refer to an alias that was defined in the
SELECT list in the WHERE clause, you need to understand logical query processing. Even though the keyed-in order of the clauses is SELECT-FROM-WHERE-GROUP
BY-HAVING-ORDER BY, the logical query processing order is FROM-WHERE-GROUP
BY-HAVING-SELECT-ORDER BY. As you can see, the WHERE clause is evaluated prior to
the SELECT clause, and therefore aliases defined in the SELECT clause aren’t visible to
the WHERE clause.
Hide the Answer.
Hide the Answer.
Logical query processing order explains why the ORDER BY clause can refer to aliases
defined in the SELECT clause. That’s because the ORDER BY clause is logically evaluated
after the SELECT clause.
Hide the Answer.
Hide the Answer.
The ORDER BY clause is mandatory when using OFFSET-FETCH because this clause is
standard, and standard SQL decided to make it mandatory. Microsoft simply followed
the standard. As for TOP, this feature is proprietary, and when Microsoft designed it,
they chose to allow using TOP in a completely nondeterministic manner—without an
ORDER BY clause. Note that the fact that OFFSET-FETCH requires an ORDER BY clause
doesn’t mean that you must use deterministic ordering. For example, if your ORDER
BY list isn’t unique, the ordering isn’t deterministic. And if you want the ordering to be
completely nondeterministic, you can specify ORDER BY (SELECT NULL) and then it’s
equivalent to not specifying an ORDER BY clause at all.
Hide the Answer.
Hide the Answer.
Case Scenario 7 (Chapter 4 - Case Scenario 1): Code Review
You are asked to review the code in a system that suffers from both code maintainability problems and performance problems. You come up with the following findings and need to determine what to recommend to the customer:- You find many queries that use a number of nesting levels of derived tables, making it very hard to follow the logic. You also find a lot of queries that join multiple derived tables that are based on the same query, and you find that some queries are repeated in a number of places in the code. What can you recommend to the customer to reduce the complexity and improve maintainability? Answer.
- During your review, you identify a number of cases where cursors are used to access the instances of a certain entity (like customer, employee, shipper) one at a time; next the code invokes a query per each of those instances, storing the result in a temporary table; then the code just returns all the rows from the temporary tables. The customer has both code maintainability and performance problems with the existing code. What can you recommend? Answer.
- You identify performance issues with joins. You realize that there are no indexes created explicitly in the system; there are only the ones created by default through primary key and unique constraints. What can you recommend? Answer.
To address the nesting complexity of derived tables, in addition to the duplication of
derived table code, you can use CTEs. CTEs don’t nest; instead, they are more modular.
Also, you can define a CTE once and refer to it multiple times in the outer query. As
for queries that are repeated in different places in your code for reusability you can
use views and inline table-valued functions. Use the former if you don’t need to pass
parameters and the latter if you do.
Hide the Answer.
Hide the Answer.
The customer should evaluate the use of the APPLY operator instead of the cursor plus
the query per row. The APPLY operator involves less code and therefore improves the
maintainability, and it does not incur the performance hit that cursors usually do.
Hide the Answer.
Hide the Answer.
The customer should examine foreign key relationships and evaluate creating indexes
on the foreign key columns.
Hide the Answer.
Hide the Answer.
Case Scenario 8 (Chapter 4 - Case Scenario 2): Explaining Set Operators
You are presenting a session about set operators in a conference. At the end of the session, you give the audience an opportunity to ask questions. Answer the following questions presented to you by attendees:- In our system, we have a number of views that use a UNION operator to combine disjoint sets from different tables. We see performance problems when querying the views. Do you have any suggestions to try and improve the performance? Answer.
- Can you point out the advantages of using set operators like INTERSECT and EXCEPT compared to the use of inner and outer joins? Answer.
The UNION operator returns distinct rows. When the unified sets are disjoint, there
are no duplicates to remove, but the SQL Server Query Optimizer may not realize it.
Trying to remove duplicates even when there are none involves extra cost. So when the
sets are disjoint, it’s important to use the UNION ALL operator and not UNION. Also,
adding CHECK constraints that define the ranges supported by each table can help the
optimizer realize that the sets are disjoint. Then, even when using UNION, the optimizer can realize it doesn’t need to remove duplicates.
Hide the Answer.
Hide the Answer.
Set operators have a number of benefits. They allow simpler code because you don’t
explicitly compare the columns from the two inputs like you do with joins. Also, when
set operators compare two NULLs, they consider them the same, which is not the case
with joins. When this is the desired behavior, it is easier to use set operators. With join,
you have to add predicates to get such behavior.
Hide the Answer.
Hide the Answer.
Case Scenario 9 (Chapter 5 - Case Scenario 1): Improving Data Analysis Operations
You are a data analyst in a financial company that uses SQL Server 2012 for its database. The company has just recently upgraded the system from SQL Server 2000. You often use T-SQL queries against the company’s database to analyze the data. So far, you were limited to code that was compatible with SQL Server 2000, relying mainly on joins, subqueries, and grouped queries. Your queries were often complex and slow. You are now evaluating the use of features available in SQL Server 2012.- You often need to compute things like running totals, year-to-date calculations, and moving averages. What will you consider now to handle those? What are the things you should watch out for in order to get good performance? Answer.
- Occasionally, you need to create crosstab reports where you rotate the data from rows to columns or the other way around. So far, you imported data to Microsoft Excel and handled such needs there, but you prefer to do it in T-SQL. What will you consider using for this purpose? What should you be careful about when using the features you’re considering? Answer.
- In many of your queries, you need to perform recency computations—that is, identify the time passed between a previous event and the current, or between the current event and the next. So far, you used subqueries for this. What will you consider now instead? Answer.
Window aggregate functions are excellent for such computations. As for things to
watch out for, with the current implementation in SQL Server 2012, you should generally try to avoid using the RANGE window frame unit. And remember that without an
explicit window frame clause, you get RANGE by default, so you want to be explicit and
use the ROWS option.
Hide the Answer.
Hide the Answer.
The PIVOT and UNPIVOT operators are handy for crosstab queries. One thing to be
careful about when using PIVOT is related to the fact that the grouping element is
determined by elimination—what’s left from the input table that wasn’t specified as
either spreading or aggregation elements. Therefore, it is recommended to always define a table expression returning the grouping, spreading, and aggregation elements,
and use that table as the input to the PIVOT operator.
Hide the Answer.
Hide the Answer.
The LAG and LEAD functions are natural for this purpose.
Hide the Answer.
Hide the Answer.
Case Scenario 10 (Chapter 5 - Case Scenario 2): Interviewing for a Developer Position
You are interviewed for a position as a T-SQL developer. Respond to the following questions presented to you by your interviewer.- Describe the difference between ROW_NUMBER and RANK. Answer.
- Describe the difference between the ROWS and RANGE window frame units. Answer.
- Why can you not refer to a window function in the WHERE clause of a query and what is the workaround for that? Answer.
The ROW_NUMBER function isn’t sensitive to ties in the window ordering values. Therefore, the computation is deterministic only when the window ordering is unique. When
the window ordering isn’t unique, the function isn’t deterministic. The RANK function
is sensitive to ties and produces the same rank value to all rows with the same ordering
value. Therefore, it is deterministic even when the window ordering isn’t unique.
Hide the Answer.
Hide the Answer.
The difference between ROWS and RANGE is actually similar to the difference between
ROW_NUMBER and RANK, respectively. When the window ordering isn’t unique, ROWS
doesn’t include peers, and therefore it isn’t deterministic, whereas RANGE includes
peers, and therefore it is deterministic. Also, the ROWS option can be optimized with an
efficient in-memory spool; RANGE is optimized with an on-disk spool and therefore is
usually slower.
More info: As for the RANGE window frame extent, according to standard SQL, it allows you to define delimiters based on logical offsets from the current row’s sort key. Remember that ROWS defines the delimiters based on physical offsets in terms of number of rows from the current row. However, SQL Server 2012 has a very limited implementation of the RANGE option, supporting only UNBOUNDED PRECEDING or FOLLOWING and CURRENT ROW as delimiters. One subtle difference between ROWS and RANGE when using the same delimiters is that the former doesn’t include peers (tied rows in terms of the sort key) and the latter does. IMPORTANT ROWS vS. RANGE
In SQL Server 2012, the ROWS option usually gets optimized much better than RANGE when using the same delimiters. If you define a window with a window order clause but without a window frame clause, the default is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. Therefore, unless you are after the special behavior you get from RANGE that includes peers, make sure you explicitly use the ROWS option. Hide the Answer.
More info: As for the RANGE window frame extent, according to standard SQL, it allows you to define delimiters based on logical offsets from the current row’s sort key. Remember that ROWS defines the delimiters based on physical offsets in terms of number of rows from the current row. However, SQL Server 2012 has a very limited implementation of the RANGE option, supporting only UNBOUNDED PRECEDING or FOLLOWING and CURRENT ROW as delimiters. One subtle difference between ROWS and RANGE when using the same delimiters is that the former doesn’t include peers (tied rows in terms of the sort key) and the latter does. IMPORTANT ROWS vS. RANGE
In SQL Server 2012, the ROWS option usually gets optimized much better than RANGE when using the same delimiters. If you define a window with a window order clause but without a window frame clause, the default is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. Therefore, unless you are after the special behavior you get from RANGE that includes peers, make sure you explicitly use the ROWS option. Hide the Answer.
Window functions are allowed only in the SELECT and ORDER BY clauses because the
initial window they are supposed to work with is the underlying query’s result set. If
you need to filter rows based on a window function, you need to use a table expression like a CTE or derived table. You specify the window function in the inner query’s
SELECT clause and assign the target column an alias. You can then filter the rows by
referring to that column alias in the outer query’s WHERE clause.
Hide the Answer.
Hide the Answer.
Case Scenario 11 (Chapter 6 - Case Scenario 1): Enhancing the Searches
After you deploy a line-of-business (LOB) application to your customer, you realize it is not user friendly enough. End users have to perform many searches; however, they always have to know the exact phrase they are searching for.- How could you enhance the end users’ experience? Answer.
- How should you change your queries to support the enhanced user interface? Answer.
You should use the Full-Text Search feature of SQL Server.
Hide the Answer.
Hide the Answer.
You should revise your queries to include the full-text predicates, or use the full-text
and semantic search table-valued functions.
Hide the Answer.
Hide the Answer.
Case Scenario 12 (Chapter 6 - Case Scenario 2): Using the Semantic Search
You need to analyze some Microsoft Word documents to find the documents that are semantically similar to a document that you get from your manager. You need to provide a quick and simple solution for this problem.- Would you create a Microsoft .NET application or use T-SQL queries for this problem? Answer.
- If you decide to use a T-SQL solution, which T-SQL function would you use? Answer.
A T-SQL solution is simpler in this scenario because the SQL Server Full-Text Search and
Semantic Search features support the functionality you need out of the box.
Hide the Answer.
Hide the Answer.
You should use the SEMANTICSIMILARITYTABLE function.
Hide the Answer.
Hide the Answer.
Case Scenario 13 (Chapter 7 - Case Scenario 1): Reports from XML Data
A company that hired you as a consultant uses a website to get reviews of their products from their customers. They store those reviews in an XML column called reviewsXML of a table called ProductReviews. The XML column is validated against a schema and contains, among others, firstname, lastname, and datereviewed elements. The company wants to generate a report with names of the reviewers and dates of reviews. Additionally, because there are already many very long reviews, the company worries about the performance of this report.- How could you get the data needed for the report? Answer.
- What would you do to maximize the performance of the report? Answer.
You could use the value() XML data type method to retrieve the scalar values needed
for the report.
Hide the Answer.
Hide the Answer.
You should consider using XML indexes in order to maximize the performance of the
report.
Hide the Answer.
Hide the Answer.
Case Scenario 14 (Chapter 7 - Case Scenario 2): Dynamic Schema
You need to provide a solution for a dynamic schema for the Products table in your company. All products have the same basic attributes, like product ID, product name, and list price. However, different groups of products have different additional attributes. Besides dynamic schema for the variable part of the attributes, you need to ensure at least basic constraints, like data types, for these variable attributes.- How would you make the schema of the Products table dynamic? Answer.
- How would you ensure that at least basic constraints would be enforced? Answer.
You could use the XML data type column to store the variable attributes in XML
format
Hide the Answer.
Hide the Answer.
You could validate the XML against an XML schema collection.
Hide the Answer.
Hide the Answer.
Case Scenario 15 (Chapter 8 - Case Scenario 1): Working with Table Constraints
As the lead database developer on a new project, you notice that database validation occurs in the client application. As a result, database developers periodically run very costly queries to verify the integrity of the data. You have decided that your team should refactor the database to improve the integrity of the database and shorten the costly nightly validation queries. Answer the following questions about the actions you might take.- How can you ensure that certain combinations of columns in a table have a unique value? Answer.
- How can you enforce that values in certain tables are restricted to specified ranges? Answer.
- How can you enforce that all columns that contain values from lookup tables are valid? Answer.
- How can you ensure that all tables have a primary key, even tables that right now do not have any primary key declared? Answer.
You can ensure that certain columns or combinations of columns in a table are unique
by applying primary key and unique constraints. You can also apply a unique index.
Normally, it is preferable to use the declared primary key and unique constraints because they are easy to find and recognize within the SQL Server metadata and management tools. If the uniqueness of a row cannot be specified using a constraint or a
unique index, you may be able to use a trigger.
Hide the Answer.
Hide the Answer.
For simple restrictions of ranges in a table, you can use a check constraint. You can
then specify the restriction in the expression value of the constraint.
Hide the Answer.
Hide the Answer.
To enforce that lookup values are valid, you should normally use foreign key constraints. Foreign key constraints are declared constraints, and as such are known
through metadata to SQL Server and the query optimizer. When joining a table that
has a foreign key constraint to its lookup table, it is helpful to add an index on the
foreign key column to assist join performance.
Hide the Answer.
Hide the Answer.
You cannot actively enforce every table to have a primary key constraint. However,
you can query sys.constraints to monitor the tables to make sure that every table does
include a primary key.
Hide the Answer.
Hide the Answer.
Case Scenario 16 (Chapter 8 - Case Scenario 2): Working with Unique and Default Constraints
As you examine the database of your current project more closely, you find that there are more data integrity problems than you first realized. Here are some of the problems you found. How would you solve them?- Most of the tables have a surrogate key, which you have implemented as a primary key. However, there are other columns or combinations of columns that must be unique, and a table can have only one primary key. How can you enforce that certain other columns or combinations of columns will be unique? Answer.
- Several columns allow NULLs, even though the application is supposed to always populate them. How can you ensure that those columns will never allow NULLs? Answer.
- Often the application must specify specific values for every column when inserting into a row. How can you set up the columns so that if the application does not insert a value, a standard default value will be inserted automatically? Answer.
You can create a unique constraint on a column or set of columns to ensure their
unique values, in addition to the primary key.
Hide the Answer.
Hide the Answer.
You can prevent a column from ever having NULLs by altering the table and redefining
the column as NOT NULL.
Hide the Answer.
Hide the Answer.
You can create a default constraint on a column to ensure that if no value is inserted, a
default value will be inserted in its place.
Hide the Answer.
Hide the Answer.
Case Scenario 17 (Chapter 9 - Case Scenario 1): Comparing Views, Inline Functions, and Synonyms
As the lead database developer on a new project, you need to expose a logical view of the database to applications that produce daily reports. Your job is to prepare a report for the DBA team, showing the advantages and disadvantages of views, inline functions, and synonyms for creating that logical view of the database. What would you recommend using, based on each of the following conditions: views, inline functions, or synonyms?- The application developers do not want to work with complex joins for their reports. For updating data, they will rely on stored procedures. Answer.
- In some cases, you need to be able to change the names of tables or views without having to recode the application. Answer.
- In other cases, the application needs to filter report data on the database by passing parameters, but the developers do not want to use stored procedures for retrieving the data. Answer.
To remove the need for developers working with complex joins, you can present them
with views and inline functions that hide the complexity of the joins. Because they will
use stored procedures to update data, you do not need to ensure that the views are
updatable.
Hide the Answer.
Hide the Answer.
You can change the names or definitions of views and change table names without affecting the application if the application refers to synonyms. You will have to drop and
recreate the synonym when the underlying table or view has a name change, and that
will have to be done when the application is offline.
Hide the Answer.
Hide the Answer.
You can use inline functions to provide viewlike objects that can be filtered by param-
eters. Stored procedures are not required because users can reference the inline function in the FROM clause of a query.
Hide the Answer.
Hide the Answer.
Case Scenario 18 (Chapter 9 - Case Scenario 2): Converting Synonyms to Other Objects
You have just been assigned the database developer responsibility for a database that makes extensive use of synonyms in place of tables and views. Based on user feedback, you need to replace some of the synonyms. In the following cases, identify what actions you can take that will not cause users or applications to change their code.- Some synonyms refer to tables. However, some of the tables must be filtered. You need to leave the synonym in place but somehow filter what the table returns. Answer.
- Some synonyms refer to tables. Sometimes column names of the table can change, but the synonym still needs to return the old column names. Answer.
- Some synonyms refer to views. You need to make it possible for users to see the names and data types of the columns returned by the views when the users browse the database by using SSMS. Answer.
To filter the data coming from the table, you can create a view or inline function that filters the data appropriately, and recreate the synonym to reference the view or function.
Hide the Answer.
Hide the Answer.
To keep synonyms working even if column names of a table are changed, you can create a view that refers to the tables and recreate the synonym to refer to the view.
Hide the Answer.
Hide the Answer.
Synonyms cannot expose metadata. Therefore, when browsing a database in SSMS,
users will not see column names and their data types under the synonym. In order to
enable users to see the column data types of the underlying data tables, you must
replace the synonym with a view.
Hide the Answer.
Hide the Answer.
Case Scenario 19 (Chapter 10 - Case Scenario 1): Using Modifications That Support Optimized Logging
You are a consultant for the IT department of a large retail company. The company has a nightly process that first clears all rows from a table by using a DELETE statement, and then populates the table with the result of a query against other tables. The result contains a few dozen million rows. The process is extremely slow. You are asked to provide recommendations for improvements.- Provide recommendations for improving the delete part of the process. Answer.
- Provide recommendations for improving the insert part of the process. Answer.
Regarding the delete process, if the entire table needs to be cleared, the customer
should consider using the TRUNCATE statement, which is minimally logged.
Hide the Answer.
Hide the Answer.
Regarding the insert process, it could be that it’s currently very slow because it doesn’t
benefit from minimal logging. The customer should evaluate the feasibility of using
minimally logged inserts like the SELECT INTO statement (which would require dropping the target table first), the INSERT SELECT statement with the TABLOCK option,
and others. Note that the recovery model of the database needs to be simple or bulk
logged, so the customer should evaluate whether this is acceptable in terms of the
organization’s requirements for recovery capabilities.
Hide the Answer.
Hide the Answer.
Case Scenario 20 (Chapter 10 - Case Scenario 2): Improving a process That Updates Data
The same company that hired you to consult about its inefficient nightly process from the first scenario hires you again. They ask for your advice regarding the following update processes:- The database has a table holding about 100 million rows. About a third of the existing rows need to be updated. Can you provide recommendations as to how to handle the update in order not to cause unnecessary performance problems in the system? Answer.
- There’s an UPDATE statement that modifies rows in one table based on information from related rows in another table. The UPDATE statement currently uses a separate subquery for each column that needs to be modified, obtaining the value of the respective column from the related row in the source table. The statement also uses a subquery to filter only rows that have matches in the source table. The process is very slow. Can you suggest ways to improve it? Answer.
The customer should consider developing a process that handles the large update in
chunks. If done in one big transaction, the process will very likely result in a significant
increase in the transaction log size. The process will also likely result in lock escalation
leading to blocking problems.
Hide the Answer.
Hide the Answer.
The customer should consider using an UPDATE statement based on a join instead of
the existing use of subqueries. The amount of code will be significantly reduced, and
the performance will likely improve. Each subquery requires a separate visit to the
related row. So using multiple subqueries to obtain values from multiple columns will
result in multiple visits to the data. With a join, through one visit to the matching row,
you can obtain any number of column values that you need.
Hide the Answer.
Hide the Answer.
Case Scenario 21 (Chapter 11 - Case Scenario 1): Providing an Improved Solution for Generating Keys
You’re a member of the database administrator (DBA) group in a company that manufactures hiking gear. Most tables in the company’s OLTP database currently use an IDENTITY property but require more flexibility. For example, often the application needs to generate the new key before using it. Sometimes the application needs to update the key column, overwriting it with new values. Also, the application needs to produce keys that do not conflict across multiple tables.- Suggest an alternative to using the IDENTITY column property. Answer.
- Explain how the alternative solution solves the existing problems. Answer.
You can address all of the existing problems with the IDENTITY property by using the
sequence object instead.
Hide the Answer.
Hide the Answer.
With the sequence object, you can generate values before using them by invoking
the NEXT VALUE FOR function and storing the result in a variable. Unlike with the
IDENTITY property, you can update a column that normally gets its values from a
sequence object. Also, because a sequence object is not tied to a specific column in
a specific table, but instead is an independent object in the database, you can generate values from one sequence and use them in different tables.
Hide the Answer.
Hide the Answer.
Case Scenario 22 (Chapter 11 - Case Scenario 2): Improving Modifications
You work in the database group of a company that has recently upgraded the database from SQL Server 2000 to SQL Server 2005 and then to SQL Server 2012. The code is still SQL Server 2000–compatible. There are issues with modifications submitted by the application to the database.The application uses a procedure that accepts as inputs attributes of a row. The procedure then uses logic that checks whether the key already exists in the target table, and if it does, updates the target row. If it doesn’t, the procedure inserts a new row into the target. The problem is that occasionally the procedure fails due to a primary key violation. This happens when the existence check doesn’t find a row, but between that check and the insertion, some- one else managed to insert a new row with the same key.
The application has a monthly process that archives data that it needs to purge. Currently, the application first copies data that needs to be deleted to the archive table in one state ment and then deletes those rows in another statement. Both statements use a filter that is based on a date column called dt. You need to filter the rows where dt is earlier than a certain date. The problem is that sometimes rows representing late arrivals are inserted into the table between the copying and the deletion of rows, and the deletion process ends up deleting rows that were not archived.
You are tasked with finding solutions to the existing problems.
- Can you suggest a solution to the existing problem with the procedure that updates the row when the source key exists in the target and inserts a row if it doesn’t? Answer.
- Can you suggest a solution to the problem with the archiving process that prevents deleting rows that were not archived? Answer.
A recommended solution is to use the MERGE statement. Define the source for the
MERGE statement as a derived table based on the VALUES clause, with a row made
of the input parameters for the procedure. Specify the table hint HOLDLOCK or
SERIALIZABLE against the target to prevent conflicts such as the ones that currently
exist in the system. Then use the WHEN MATCHED clause to issue an UPDATE action
if the target row exists, and the WHEN NOT MATCHED clause to issue an INSERT action if the target row doesn’t exist.
Hide the Answer.
Hide the Answer.
One option is to work with the SERIALIZABLE isolation level, handling both the statement that copies the rows to the archive environment and the statement that deletes
the rows in one transaction. But a simpler solution is to do both tasks in one statement—a DELETE with an OUTPUT INTO clause. This ensures that only rows that are
copied to the archive table are deleted. And if for whatever reason the copying of the
rows to the archive table fails, the delete operation also fails, because both activities
are part of the same transaction.
Hide the Answer.
Hide the Answer.
Case Scenario 23 (Chapter 12 - Case Scenario 1): Implementing Error Handling
As a database developer on a key project for your company, you have been asked to refactor a set of stored procedures in your production database server. You have observed that the stored procedures have practically no error handling, and when they do have it, it is ad hoc and unstructured. None of the stored procedures are using transactions. You need to put a plan together to justify your activity.- When should you recommend using explicit transactions? Answer.
- When should you recommend using a different isolation level? Answer.
- What type of error handling should you recommend? Answer.
- What plans should you include for refactoring dynamic SQL? Answer.
Whenever more than one data change occurs in a stored procedure, and it is important that the data changes be treated as a logical unit of work, you should add transacion logic to the stored procedure.
Hide the Answer.
Hide the Answer.
You need to adapt the isolation levels to the requirements for transactional consistency. You should investigate the current application and the database for instances
of blocking and especially deadlocking. If you find deadlocks, and establish that they
are not due to mistakes in T-SQL coding, you can use various methods of lowering
the isolation level in order to make deadlocks less likely. However, be aware that some
transactions may require higher levels of isolation.
Hide the Answer.
Hide the Answer.
You should use TRY/CATCH blocks in every stored procedure where errors might occur,
and encourage your team to standardize on that usage. By funneling all errors to the
CATCH block, you can handle errors in just one place in the code.
Hide the Answer.
Hide the Answer.
Check the stored procedures for the use of dynamic SQL, and where possible, replace
calls to the EXECUTE command with the sp_executesql stored procedure.
Hide the Answer.
Hide the Answer.
Case Scenario 24 (Chapter 12 - Case Scenario 2): Implementing Transactions
You have just been assigned to a new project as the database developer on the team. The application will use stored procedures for performing some of the financial operations. You have decided to use T-SQL transactions. Answer the following questions about what you would recommend in the specified situations.- In some transactions that update tables, after a session reads a particular value from another table, it is critical that the other table’s value not change until the transaction is finished. What is the appropriate transaction isolation level to accomplish this? Answer.
- You will use T-SQL scripts to deploy new objects such as tables, views, or T-SQL code to the database. If any kind of T-SQL error occurs, you want the entire deployment script to quit. How can you accomplish this without adding complex error handling? Answer.
- One of the stored procedures will transfer money from one account to another. During that transfer period, neither account can have any data changed, inserted, or deleted for the range of values read by the transaction. What is the appropriate transaction isolation level to accomplish this? Answer.
To ensure that whenever data is read in a transaction the data will not change until the
end of the transaction, you can use the REPEATABLE READ transaction isolation level.
This is the least restrictive level that will satisfy the requirements.
Hide the Answer.
Hide the Answer.
When you deploy new database objects by using T-SQL scripts, you can wrap the
batches in a single transaction and use SET XACT_ABORT ON right after the BEGIN
TRANSACTION statement. Then if any T-SQL error occurs, the entire transaction will
abort and you will not have to add complex error handling.
Hide the Answer.
Hide the Answer.
To ensure that, for the range of values read by the transaction, none of the rows being
read can be changed and that no new rows may be inserted and none deleted, you
can use the SERIALIZABLE isolation level. This is the most restrictive isolation level and
can lead to a lot of blocking, so you need to ensure that the transactions complete as
quickly as possible.
Hide the Answer.
Hide the Answer.
Case Scenario 25 (Chapter 13 - Case Scenario 1): Implementing Stored procedures and UDFs
You have been assigned to a new project. As the lead database developer, you notice that almost all data validation against the database occurs in the client software. Sometimes fatal bugs in the client software have caused database inconsistency, and you want to refactor the system by using stored procedures to help protect the database. Answer the following questions about what actions you can take to improve the reliability of the application.- What steps can be taken to prevent duplicates or inconsistencies on unique keys and mismatched foreign keys? Answer.
- How can you present a standard interface from the application code to the database? Answer.
- The client developers would like to put parameters on views but T-SQL doesn’t allow them. What can you use in place of parameterized views? Answer.
- There is one large table that is searched often based on three different columns, but the user can choose any of the columns and leave the others blank. How can you use stored procedures to make this searching more efficient? Answer.
To prevent inconsistency in the database, ensure that the proper constraints are in
place: primary key and unique key constraints on tables, check constraints on columns,
and foreign key constraints between tables. Other more complex business rules can be
enforced by using triggers.
Hide the Answer.
Hide the Answer.
To present a standard interface to the database, use data tier stored procedures—that
is, use standard insert, update, and delete stored procedures for every table. The client software should only be allowed to change data in tables by using those stored
procedures.
Hide the Answer.
Hide the Answer.
You can use table-valued functions in place of views, and define parameters to match
the requirements of the application developers. You can then call the function from
inside a stored procedure that accepts those parameters and send the results back to
the client.
Hide the Answer.
Hide the Answer.
Consider making a search stored procedure that consists of a driver, and have it call
sub-procedures, one for each combination of parameter. Those sub-procedures will
always have the same query plan, so the procedures will not need to be recompiled.
Hide the Answer.
Hide the Answer.
Case Scenario 26 (Chapter 13 - Case Scenario 2): Implementing Triggers
You have been asked to review the T-SQL code of an existing database application and recommend improvements. Answer the following questions about recommendations you can make about the design.- You notice that the system uses a lot of triggers to enforce foreign key constraints, and the triggers are error-prone and difficult to debug. What changes can you recommend to reduce the use of triggers? Answer.
- You also observe that there are some complex operations that use nested triggers, which have never been made to work correctly in the application. What action can you recommend to eliminate the use of nested triggers? Answer.
- The application must often insert data into a main table and several subsidiary tables in the same action, making the application code very complex. What can you recommend as a way of moving some of that complexity into the database and out of the application? Answer.
- There is an important table that requires some simple logging actions to take place after any changes to the data. The logging is to a custom table built especially to meet application requirements. What recommendation might you make to help implement such a logging action? Answer.
Foreign key constraints can be implemented by using triggers, but the code can
become complex and error prone. You can recommend instead that the database
developers implement true referential integrity by using T-SQL declared foreign key
constraints rather than triggers.
Hide the Answer.
Hide the Answer.
You can recommend that the application disable nested triggers on the development
server so that the database developers can get used to the idea of completing all necessary actions within only one level of a trigger. That should help simplify the trigger
code and improve the ability to debug it.
Hide the Answer.
Hide the Answer.
When the application inserts data into one table, and must also insert into other
subsidiary tables in the same action, you can recommend that the database developers use an INSTEAD OF trigger to execute. In that trigger, multiple inserts can be made
before inserting into the main table.
Hide the Answer.
Hide the Answer.
To support simple logging, you can recommend that the database developers use a
DML AFTER trigger. This type of trigger executes after an INSERT, UPDATE, or DELETE
statement and it can write to the logging table.
Hide the Answer.
Hide the Answer.
Case Scenario 27 (Chapter 14 - Case Scenario 1): Analysis of Queries
You got an urgent call from a manager of a company where you are maintaining SQL Server. The manager complains that their SQL Server database has been unresponsive for a couple of hours. Your task is to optimize one query only, but as soon as possible. However, you need to find the most problematic query. You connect to the SQL Server instance. You realize there are hundreds of concurrent users, and neither Extended Events nor a SQL Trace session is running. You also find out that SQL Server has been running without interruption for six months.- How do you start analysis in this situation? Answer.
- When you find the most problematic query, how do you proceed? Answer.
You should use execution-related DMOs to find the most problematic query.
Hide the Answer.
Hide the Answer.
You could use a graphical estimated execution plan for this query to find the operators
that have the highest cost. You could also check whether there are any missing indexes
reported by index-related DMOs that this query could benefit from.
Hide the Answer.
Hide the Answer.
Case Scenario 28 (Chapter 14 - Case Scenario 2): Constant Monitoring
You need to monitor your SQL Server instance constantly in order to detect potential bottlenecks. Your SQL Server instance is used heavily. You should not overload it with monitoring procedures.- Which tool would you use for monitoring? Answer.
- How would you minimize the impact of the tool? Answer.
You should use SQL Server Extended Events as a very lightweight performance monitoring system.
Hide the Answer.
Hide the Answer.
You should monitor a few of the most important events only for your case. You should
capture only fields you need. You should also filter the session events to include only
the events you really need.
Hide the Answer.
Hide the Answer.
Case Scenario 29 (Chapter 15 - Case Scenario 1): Table Scans
Database administrators from a company where you are maintaining a SQL Server database complain that SQL Server scans entire tables for most of the queries, although the queries are very selective. The performance is not acceptable. You need to help them improve the performance.- What physical structures should you check? Answer.
- Would you check some code as well? Answer.
You should check whether the queries are supported by indexes. In addition, you
should check whether the statistics for the indexes are created and updated.
Hide the Answer.
Hide the Answer.
You should check whether the queries use appropriate search arguments.
Hide the Answer.
Hide the Answer.
Case Scenario 30 (Chapter 15 - Case Scenario 2): Slow Updates
End users from a company where you are responsible for the database optimization complain that data updates are slow, even when updating a single row. Seeking for the updated row is supported by appropriate indexes. SELECT queries are performing well. This is what you expected, because you created nonclustered indexes on all columns used in these queries. You need to improve the performance of the database for updates as well.- What would you suspect to be the reason for slow updates? Answer.
- How would you investigate for possible problems? Answer.
Too many indexes might slow updates. You probably created many indexes that are
useless; however, because SQL Server has to maintain them, the updates are slow.
Hide the Answer.
Hide the Answer.
You can query the sys.dm_db_index_usage_stats dynamic management object to find
which indexes are used for seeks and which are used for updates only.
Hide the Answer.
Hide the Answer.
Case Scenario 31 (Chapter 16 - Case Scenario 1): Performance Improvement Recommendations for Cursors and Temporary Objects
You are hired as a consultant by a startup company who develops an application that uses SQL Server as the database. The company is currently facing performance and scalability problems. You examine the company’s code and identify a number of things.Almost all solutions use cursors. When you examine the solutions you see that they are not the types that have to be implemented with iterative logic.
Some solutions store intermediate results in table variables and then query the table variables. Large numbers of rows are stored in the table variables.
- Can you provide recommendations concerning the fact that most solutions use cursors? Answer.
- Can you provide recommendations concerning the use of table variables? Answer.
- Can you explain to the customer what the circumstances are in which cursors and table variables should be used? Answer.
The customer should evaluate the use of set-based solutions instead of cursor-based
ones. If most of their solutions are using cursors, there could be a problem with a lack
of knowledge and understanding of relational concepts. It would probably be a good
idea to recommend to the company that its developers take training on the subject.
Hide the Answer.
Hide the Answer.
When large numbers of rows need to be stored in the temporary object, the optimizer’s ability to produce accurate selectivity estimates becomes more important for
the efficiency of the plan. The one exception is when the plans are trivial. SQL Server
does not maintain distribution statistics (histograms) on table variables, so with those,
the estimates tend to be inaccurate. Inaccurate estimates can lead to suboptimal plans.
The customer should examine the query plans and look for bad estimates. And if they find them, they should evaluate whether to use temporary tables in those cases
instead. SQL Server does maintain histograms for temporary tables, and therefore the
execution plans for those tend to be more optimal.
Hide the Answer.
Hide the Answer.
In some cases, it’s appropriate to use table variables—for example, when the amount
of data is very small, like a page or two. In such a case, the efficiency of the plan is
simply not important. Also, when the table is large but the plan is trivial, the optimizer doesn’t need histograms in order to choose an efficient plan. The fact that table
variables don’t have histograms does give you some benefits. You don’t pay the costs
associated with maintaining them. You also don’t pay for recompilations of the execution plans that are related to refreshes of the histograms.
As for cursors, in some cases, you have to run a process per each row from a table. For example, for maintenance purposes you might need to perform some work per each index, table, database, or other object. Cursors are designed for such purposes. As for data manipulation, there could be cases where the SQL Server Query Optimizer doesn’t do a good job optimizing a query, and you cannot find a way to help the optimizer generate an efficient plan. With cursors, despite the higher overhead, sometimes you can achieve better results because you do have more control. But such cases are the exception rather than the norm.
Hide the Answer.
As for cursors, in some cases, you have to run a process per each row from a table. For example, for maintenance purposes you might need to perform some work per each index, table, database, or other object. Cursors are designed for such purposes. As for data manipulation, there could be cases where the SQL Server Query Optimizer doesn’t do a good job optimizing a query, and you cannot find a way to help the optimizer generate an efficient plan. With cursors, despite the higher overhead, sometimes you can achieve better results because you do have more control. But such cases are the exception rather than the norm.
Hide the Answer.
Case Scenario 32 (Chapter 16 - Case Scenario 2): Identifying Inaccuracies in Answers
At a conference, you attend a lecture about T-SQL. At the end of the lecture, the speaker conducts a Q&A session. Following are questions members of the audience present to the speaker and the speaker’s answers. Identify the inaccuracies in the speaker’s responses.- Q: From a performance perspective, what are the differences between temporary tables and table variables?A: There are no differences. Microsoft just wants to give you a dozen different ways to do the same thing. Answer.
-
Q: I have a multirow UPDATE trigger that sets the value of a column called lastmod in
the modified rows to the value returned by the function SYSDATETIME(). The trigger
uses a cursor against the inserted table to handle one row at a time. The trigger performs badly. Any suggestions on how to improve the trigger’s performance?A: Instead of using a cursor, write a set-based solution that uses a WHILE loop and a
TOP query to iterate through the keys one at a time.
Answer.
Using a loop-based solution with a TOP query instead of a cursor neither makes the solution set-based nor necessarily more efficient than the existing cursor-based solution. You’re still handling the rows one at a time. A better approach would be to use a single UPDATE or MERGE statement that joins the inserted table with the underlying table, and update all target rows by using one set-based operation.
Hide the Answer. -
Q: Can you give an example for which table expressions are useful?A: One example is when you want to persist the result of an expensive query and then
need to refer to that result a number of times.
Answer.
The result of the table expression’s inner query doesn’t get persisted in a work table. SQL Server unnests all references to table expressions and interacts with the underlying objects directly. Multiple references get unnested multiple times, so the work is repeated. If you want to persist the result of an expensive query to avoid repeating the work, you should consider using temporary tables or table variables. An example of the usefulness of table expressions is when you need to refer to a column alias that was generated in the SELECT list in clauses that are logically processed before the SELECT, like WHERE, GROUP BY, and HAVING.
Hide the Answer.
There are performance-related differences between temporary tables and table
variables. One important difference is the fact that SQL Server maintains distribution
statistics (histograms) against temporary tables but not against table variables. This
means that with temporary tables, the optimizer can usually make better selectivity
estimates. So the plans involving temporary tables tend to be more optimal than plans
involving table variables.
Hide the Answer.
Hide the Answer.
Case Scenario 33 (Chapter 17 - Case Scenario 1): Query Optimization
Database administrators from a company where you are maintaining SQL Server complain that SQL Server does not use indexes for some queries from a third-party application. The queries are generated in the application directly, so you cannot modify them.- What actions can you take? Answer.
- Is it possible to optimize the queries you cannot modify? Answer.
You can check whether the indexes are the most appropriate for the queries. You can
also check whether the statistics for the indexes are updated. In addition, you can create plan guides for the problematic queries.
Hide the Answer.
Hide the Answer.
No, you cannot use the optimizer hints, because you cannot modify the text of the
problematic queries.
Hide the Answer.
Hide the Answer.
Case Scenario 34 (Chapter 17 - Case Scenario 2): Table Hint
There is a query in a stored procedure for which you suspect that SQL Server does not use an optimal plan. Although the very selective WHERE clause of the query is supported by a nonclustered index, SQL Server does not use it. The statistics for all indexes are up to date. You need to optimize this query.- Can you modify the query to use the index? Answer.
- How would you accomplish this task? Answer.
Yes. Because the query is inside a stored procedure and not embedded in an application, you can modify it.
Hide the Answer.
Hide the Answer.
You can use a table hint to force the index usage.
Hide the Answer.
Hide the Answer.