Case Scenario 1: Creating an ETL Solution
The business development department of Adventure Works has requested that you imple- ment a data mart that it can use to analyze reseller sales against salesperson sales targets. Your first task is to create a series of SSIS packages that move data from the source Enterprise Resource Planning (ERP) system to a data mart database that contains fact tables and dimen- sion tables.- How would you work within BIDS to create SSIS project structures, packages, project data sources, and package connections? Answer.
- What transformations would you use, and how would you implement the data flow that loads dimension tables? Answer.
- What transformations would you use, and how would you implement the data flow that loads fact tables? Answer.
The best practice for creating a set of packages that all work together is to create a single
SSIS project within BIDS. You would then create multiple packages within the project, one
for each of the different dimensions and fact tables. Because all the packages would use the
same source connection string to the ERP system and the same destination connection string
to the data mart, you should create two data sources within the project: one for the source
and one for the destination. Last, each package would need to reference the project data
sources, so within each package, you would create package connections based on the project
data sources.
Hide the Answer.
Hide the Answer.
The SSIS data flow contains a Slowly Changing Dimension Transformation that can handle
changes and new records in the dimension tables. Before using the Slowly Changing Dimension Transformation, you need to create a source adapter to the ERP system that pulls the
data to be compared with the dimension tables. You might need to use a transformation to
clean and correct any data anomalies, and after those steps are complete, you can connect the
data to the Slowly Changing Dimension Transformation. Ideally, you would create a separate
SSIS package for each dimension package so that they could be reused for different groups
of package executions.
Hide the Answer.
Hide the Answer.
Fact tables contain surrogate keys that reference the dimension tables but also contain the
business keys from the source. So as you are pulling the reseller sales data and the sales
quotas, you can use a Lookup Transformation to get the surrogate keys from dimension
tables by joining across the business keys. You can then insert new rows into the fact tables
by using the surrogate keys and measures. As you do when using dimensions, you will typically have a separate package for each fact table that needs processing.
Hide the Answer.
Hide the Answer.
Case Scenario 2: Troubleshooting and Handling Errors in SSIS Packages
You are creating a set of SSIS packages that move data from a source transactional system to data mart tables. As you develop the packages, you need a way to troubleshoot both your control flow development and your data flow development. You also need to ensure that the data in your destination database is in a consistent state and not in an intermediate state when an error occurs. In addition, you need to provide an audit trail of information and build alerts into your package design. How would you handle the following requirements during your package development and implementation?- In SSIS, you need to use debugging techniques in the control flow and data flow to speed up package development and troubleshooting so that you can complete your packages quickly with minimal frustration. Answer.
- Each destination table in your data mart must have the inserts, updates, and deletes fully complete and committed, or you need to roll back the changes so that the table is in a consistent state. You also need a way to restart your packages from the point of failure. Answer.
- You need to capture both the count of rows that are inserted into the destination within your data mart and the time when the last row was sent to each destination in the data flows. Answer.
- When a package fails, you must immediately send e-mail messages that identify the task that failed and describe the error in detail. Answer.
When you are developing in the control flow, you can use breakpoints to pause packages
during execution so that you can examine the intermediate state and the results of your tasks
and constraints. When you are working in the data flow, you can use data viewers on your
data paths and error paths to catch errors, and you can watch the rows to isolate any errors
and help determine the best way to fix them.
Hide the Answer.
Hide the Answer.
Because the commit level is configured on a table-by-table basis, all the data flow and control flow tasks that operate on a single task need to be grouped together in a container, and
the TransactionOption property must be set to Required for each container. You should also
implement checkpoints on the containers, which will let you restart the packages at the point
of failure after you have resolved any problems. You can simplify this implementation by
creating a master package that has checkpoints turned on and that uses the Execute Package
Task to call child packages for each destination table that has transactions enabled.
Hide the Answer.
Hide the Answer.
To capture the destination row count, you add several Row Count Transformations to your
package. Place a Row Count Transformation in the pipeline before each destination. The
Row Count Transformation will store in a predefined variable the number of rows that flow
through the component, so you can create a separate package variable for each destination. To capture the variable values, set the RaiseChangeEvent property to True for all new
variables and add the OnVariableValueChange event handler. This event fires when each Row
Count Transformation updates the identified variable, which subsequently calls a SQL statement that adds the VariableName, count, and EventHandlerStartTime to a tracking table.
Hide the Answer.
Hide the Answer.
Using the OnError event, you create a new event handler on the package executable file level.
This event handler contains a single Send Mail Task that you configure to use the SourceName
variable, which is the task or container name that experienced the error as the e-mail message subject and the ErrorDescription variable as the e-mail message body. You hard-code
your e-mail Simple Mail Transport Protocol (SMTP) server and your support team’s Distribution List (DL) address so that all parties will be e-mailed when a failure occurs.
Hide the Answer.
Hide the Answer.
Case Scenario 3: Deploying SSIS Packages
You are asked to manage a set of 25 SSIS packages generated by a development team. These packages perform data consolidation from three source systems: IBM DB2, Oracle, and flat fi les exported from a legacy system. The destination is a SQL Server database that is used for Customer Relationship Management (CRM) and call-center reporting. The sources do not support Windows Authentication, but a user name and password are provided immediately after the monthly password change as part of the company’s security-compliance policy. You need to define deployment procedures and make the packages independent of source user-connection information. How would you satisfy the following requirements?- The packages must point to the correct servers and be updated with the user names and passwords—all without editing the packages in production. However, the packages the development team gave you contain hard-coded connections. How can you implement a flexible architecture to handle the changes? Answer.
- Your organization has invested in a test environment that mirrors the production environment, so the deployment objective is to deploy all the packages for a development cycle to the test server, on which the packages can go through an extensive quality-assurance process before they are deployed to production. Your deployment strategy needs to include a complete set of files with an automated deployment process, and your packages need to be deployed to SQL Server for both environments so that the packages can be backed up through your database backup procedures. How do you proceed? Answer.
- Occasionally, as bugs are identified, you are given a single SSIS package to deploy. To minimize human error in the deployment, you need an automated procedure to deploy the file to the test and development servers. How can you achieve this? Answer.
- The packages contain schema information about a financial database, so you need to be sure that the packages are encrypted. You also need to ensure that the shared connection password in the SQL Server configuration table is secure. Answer.
- The packages need to be scheduled to run at 8:00 every morning by using SQL Server Agent. Answer.
To make the connection strings configurable without manual package editing every time a
server or user name and password changes, you should implement package configurations.
Across the 25 packages, only two source systems contain user names and passwords (the
files are on a network share), so you should create a single XML file that contains the connection strings. First you would create the configurations file in one package by using SSIS
package configurations, and then you would have all the packages point to the same file. If
your policies prevent storing user names and passwords in a flat file, you can choose to use
a Microsoft SQL Server table to store these items. If you do use a SQL Server table, be aware
that you will have a second configuration entry that will be applied first—it will point to the
SQL Server database in which the connection strings are entered. You will need to replicate
the SSIS package configurations on the test and production computers, being sure to use the
appropriate server, user name, and password entries where applicable. This way, as changes
happen, you can modify the connections without opening the packages.
Hide the Answer.
Hide the Answer.
Because SQL Server is the package destination for both the test and production environments, an easy way to automate your deployment is to create a deployment installer set by
using the Package Deployment Utility in BIDS. You can run the Package Installation Wizard
on your test server and deploy all the packages to SQL Server in one step. When the tests
are complete, the same installer set can be deployed on your production server through the
Package Installation Wizard.
Hide the Answer.
Hide the Answer.
You can deploy single SSIS packages manually, but a better choice is to use the DTUtil command-line tool, which lets you write a command for the deployment and then automate the
process. You could deploy the package to the local SQL Server using Windows Authentication
by including a simple command, such as this:
Hide the Answer.
dtutil.exe / FILE c:\IntegrateCRM.dtsx / COPY SQL;IntegrateCRM
Other command-line switches let you specify the destination SQL Server and SQL Server user
name and password as needed.
Note: DTUtil is for package managment and deployment. It allows you to copy, move, delete or verify a package, whereas DTExec is for package configurtion and execution.
Hide the Answer.
Case Scenario 4: Securing and Scheduling SSIS Packages
Your SSIS ETL packages have been deployed to SQL Server, and it is now your responsibility as the database administrator (DBA) to secure and schedule the packages for execution on your production server. One of the shared connections references a legacy database system that requires a user name and password. Therefore, a SQL Server configuration has been set up to manage the user name and password and to share them between packages. Your task is to secure the packages and schedule them for execution, taking into account the following requirements:
To encrypt all the content of the packages, you need to use the DTUtil utility with the
/ENCRYPT command-line parameter, running the utility from the folder of the packages.
Each DTUtil statement will look like the following:
Hide the Answer.
dtutil.exe /file MyPackage.dtsx /encrypt file;MyPackage.dtsx;3;EncPswd
In addition, because all the packages contain a shared connection user name and password,
you can use SSMS to deny access to the configuration table from all accounts except the SQL
Server Agent service account or the SQL Server Agent Proxy account that will be executing
the package.
Hide the Answer.
To schedule the packages to be run, create a SQL Server Agent job and several job steps
to execute the packages. When you identify the package, you need to enter the password
that you specified in the DTUtil command-line statement in step 1 to encrypt the packages
(EncPswd). The job could also be run by using a proxy account. That account would need to
have the proper security credentials to access the SQL Server configuration table so that the
connection user name and password are updated at execution time.
Hide the Answer.
Hide the Answer.