SSIS Interview Questions - Part 3 of 3 (On Applications from TK 70-448)

SSIS Application Questions

Case Scenario 1: Creating an ETL Solution

The business development department of Adventure Works has requested that you imple- ment a data mart that it can use to analyze reseller sales against salesperson sales targets. Your first task is to create a series of SSIS packages that move data from the source Enterprise Resource Planning (ERP) system to a data mart database that contains fact tables and dimen- sion tables.
  1. How would you work within BIDS to create SSIS project structures, packages, project data sources, and package connections? Answer.

  2. The best practice for creating a set of packages that all work together is to create a single SSIS project within BIDS. You would then create multiple packages within the project, one for each of the different dimensions and fact tables. Because all the packages would use the same source connection string to the ERP system and the same destination connection string to the data mart, you should create two data sources within the project: one for the source and one for the destination. Last, each package would need to reference the project data sources, so within each package, you would create package connections based on the project data sources.
    Hide the Answer.
  3. What transformations would you use, and how would you implement the data flow that loads dimension tables? Answer.

  4. The SSIS data flow contains a Slowly Changing Dimension Transformation that can handle changes and new records in the dimension tables. Before using the Slowly Changing Dimension Transformation, you need to create a source adapter to the ERP system that pulls the data to be compared with the dimension tables. You might need to use a transformation to clean and correct any data anomalies, and after those steps are complete, you can connect the data to the Slowly Changing Dimension Transformation. Ideally, you would create a separate SSIS package for each dimension package so that they could be reused for different groups of package executions.
    Hide the Answer.
  5. What transformations would you use, and how would you implement the data flow that loads fact tables? Answer.

  6. Fact tables contain surrogate keys that reference the dimension tables but also contain the business keys from the source. So as you are pulling the reseller sales data and the sales quotas, you can use a Lookup Transformation to get the surrogate keys from dimension tables by joining across the business keys. You can then insert new rows into the fact tables by using the surrogate keys and measures. As you do when using dimensions, you will typically have a separate package for each fact table that needs processing.
    Hide the Answer.

Case Scenario 2: Troubleshooting and Handling Errors in SSIS Packages

You are creating a set of SSIS packages that move data from a source transactional system to data mart tables. As you develop the packages, you need a way to troubleshoot both your control flow development and your data flow development. You also need to ensure that the data in your destination database is in a consistent state and not in an intermediate state when an error occurs. In addition, you need to provide an audit trail of information and build alerts into your package design. How would you handle the following requirements during your package development and implementation?
  1. In SSIS, you need to use debugging techniques in the control flow and data flow to speed up package development and troubleshooting so that you can complete your packages quickly with minimal frustration. Answer.

  2. When you are developing in the control flow, you can use breakpoints to pause packages during execution so that you can examine the intermediate state and the results of your tasks and constraints. When you are working in the data flow, you can use data viewers on your data paths and error paths to catch errors, and you can watch the rows to isolate any errors and help determine the best way to fix them.
    Hide the Answer.
  3. Each destination table in your data mart must have the inserts, updates, and deletes fully complete and committed, or you need to roll back the changes so that the table is in a consistent state. You also need a way to restart your packages from the point of failure. Answer.

  4. Because the commit level is configured on a table-by-table basis, all the data flow and control flow tasks that operate on a single task need to be grouped together in a container, and the TransactionOption property must be set to Required for each container. You should also implement checkpoints on the containers, which will let you restart the packages at the point of failure after you have resolved any problems. You can simplify this implementation by creating a master package that has checkpoints turned on and that uses the Execute Package Task to call child packages for each destination table that has transactions enabled.
    Hide the Answer.
  5. You need to capture both the count of rows that are inserted into the destination within your data mart and the time when the last row was sent to each destination in the data flows. Answer.

  6. To capture the destination row count, you add several Row Count Transformations to your package. Place a Row Count Transformation in the pipeline before each destination. The Row Count Transformation will store in a predefined variable the number of rows that flow through the component, so you can create a separate package variable for each destination. To capture the variable values, set the RaiseChangeEvent property to True for all new variables and add the OnVariableValueChange event handler. This event fires when each Row Count Transformation updates the identified variable, which subsequently calls a SQL statement that adds the VariableName, count, and EventHandlerStartTime to a tracking table.
    Hide the Answer.
  7. When a package fails, you must immediately send e-mail messages that identify the task that failed and describe the error in detail. Answer.

  8. Using the OnError event, you create a new event handler on the package executable file level. This event handler contains a single Send Mail Task that you configure to use the SourceName variable, which is the task or container name that experienced the error as the e-mail message subject and the ErrorDescription variable as the e-mail message body. You hard-code your e-mail Simple Mail Transport Protocol (SMTP) server and your support team’s Distribution List (DL) address so that all parties will be e-mailed when a failure occurs.
    Hide the Answer.

Case Scenario 3: Deploying SSIS Packages

You are asked to manage a set of 25 SSIS packages generated by a development team. These packages perform data consolidation from three source systems: IBM DB2, Oracle, and flat fi les exported from a legacy system. The destination is a SQL Server database that is used for Customer Relationship Management (CRM) and call-center reporting. The sources do not support Windows Authentication, but a user name and password are provided immediately after the monthly password change as part of the company’s security-compliance policy. You need to define deployment procedures and make the packages independent of source user-connection information. How would you satisfy the following requirements?
  1. The packages must point to the correct servers and be updated with the user names and passwords—all without editing the packages in production. However, the packages the development team gave you contain hard-coded connections. How can you implement a flexible architecture to handle the changes? Answer.

  2. To make the connection strings configurable without manual package editing every time a server or user name and password changes, you should implement package configurations. Across the 25 packages, only two source systems contain user names and passwords (the files are on a network share), so you should create a single XML file that contains the connection strings. First you would create the configurations file in one package by using SSIS package configurations, and then you would have all the packages point to the same file. If your policies prevent storing user names and passwords in a flat file, you can choose to use a Microsoft SQL Server table to store these items. If you do use a SQL Server table, be aware that you will have a second configuration entry that will be applied first—it will point to the SQL Server database in which the connection strings are entered. You will need to replicate the SSIS package configurations on the test and production computers, being sure to use the appropriate server, user name, and password entries where applicable. This way, as changes happen, you can modify the connections without opening the packages.
    Hide the Answer.
  3. Your organization has invested in a test environment that mirrors the production environment, so the deployment objective is to deploy all the packages for a development cycle to the test server, on which the packages can go through an extensive quality-assurance process before they are deployed to production. Your deployment strategy needs to include a complete set of files with an automated deployment process, and your packages need to be deployed to SQL Server for both environments so that the packages can be backed up through your database backup procedures. How do you proceed? Answer.

  4. Because SQL Server is the package destination for both the test and production environments, an easy way to automate your deployment is to create a deployment installer set by using the Package Deployment Utility in BIDS. You can run the Package Installation Wizard on your test server and deploy all the packages to SQL Server in one step. When the tests are complete, the same installer set can be deployed on your production server through the Package Installation Wizard.
    Hide the Answer.
  5. Occasionally, as bugs are identified, you are given a single SSIS package to deploy. To minimize human error in the deployment, you need an automated procedure to deploy the file to the test and development servers. How can you achieve this? Answer.

  6. You can deploy single SSIS packages manually, but a better choice is to use the DTUtil command-line tool, which lets you write a command for the deployment and then automate the process. You could deploy the package to the local SQL Server using Windows Authentication by including a simple command, such as this:
    dtutil.exe / FILE c:\IntegrateCRM.dtsx / COPY SQL;IntegrateCRM Other command-line switches let you specify the destination SQL Server and SQL Server user name and password as needed.
    Note: DTUtil is for package managment and deployment. It allows you to copy, move, delete or verify a package, whereas DTExec is for package configurtion and execution.
    Hide the Answer.

Case Scenario 4: Securing and Scheduling SSIS Packages

Your SSIS ETL packages have been deployed to SQL Server, and it is now your responsibility as the database administrator (DBA) to secure and schedule the packages for execution on your production server. One of the shared connections references a legacy database system that requires a user name and password. Therefore, a SQL Server configuration has been set up to manage the user name and password and to share them between packages. Your task is to secure the packages and schedule them for execution, taking into account the following requirements:
  1. The packages contain schema information about a financial database, so you need to be sure that the packages are encrypted. You also need to ensure that the shared connection password in the SQL Server configuration table is secure. Answer.
  2. To encrypt all the content of the packages, you need to use the DTUtil utility with the /ENCRYPT command-line parameter, running the utility from the folder of the packages. Each DTUtil statement will look like the following:
    dtutil.exe /file MyPackage.dtsx /encrypt file;MyPackage.dtsx;3;EncPswd
    In addition, because all the packages contain a shared connection user name and password, you can use SSMS to deny access to the configuration table from all accounts except the SQL Server Agent service account or the SQL Server Agent Proxy account that will be executing the package.
    Hide the Answer.

  3. The packages need to be scheduled to run at 8:00 every morning by using SQL Server Agent. Answer.

  4. To schedule the packages to be run, create a SQL Server Agent job and several job steps to execute the packages. When you identify the package, you need to enter the password that you specified in the DTUtil command-line statement in step 1 to encrypt the packages (EncPswd). The job could also be run by using a proxy account. That account would need to have the proper security credentials to access the SQL Server configuration table so that the connection user name and password are updated at execution time.
    Hide the Answer.