SSIS Interview Questions - Part 1 of 3 (On Concepts from TK 70-448)

1.   Creating SSIS Packages

a.     Creating an SSIS Package: Using the built-in Import and Export Wizard or BIDS

b.     Creating a Data Source

                                     I.          Data sources contain connection strings that point to files or databases, and you can reference them within one or more packages. Thus, a data source resides outside the package and can be used as the connection reference for more than one package.

                                   II.          Data sources are optional within SSIS, but they are beneficial during development if you have a large number of packages that need to use the same database or fi le connection.

                                  III.          But be aware that after a package is deployed to a new environment and executed outside the project, the connection string is no longer updated by the project data source. Instead, you must use package configurations to share connection strings.

                                 IV.          When you create a package connection from a data source (see below), that connection is updated only during development whenever the package is opened and the data source has been changed. Package connections are not updated when they are run separately from the associated SSIS project—for example, when they are run from the command line.

c.      Creating Package Connection Managers

                                     I.          It is independent of project data sources. However, package connections can reference a project data source.

                                   II.          A package connection lets the different components in SSIS communicate with an object (such as a database, file, or server) outside the package. You can use package connections as source adapters, FTP or e-mail servers, or flat files.

2.   Creating and Editing Control Flow Objects

a.     Control flow tasks - Workflow objects that perform operational-level jobs

b.     Control flow containers - Provide a grouping mechanism for tasks and other containers

                                     I.          Sequence Container - Lets you organize subordinate tasks by grouping them together, and lets you apply transactions or assign logging to the container.

                                   II.          For Loop Container - Runs a control flow repeatedly by testing a condition.

                                  III.          Foreach Loop Container - Runs a control flow repeatedly by using an enumerator.

                                 IV.          Task Host container - Provides services to a single task. It is not visible within the control flow toolbox, it is designed to provide service to a single task.

c.      Constraints - Let you connect tasks and containers and define execution ordering and precedence

d.     Package variables

                                     I.          Passing values between tasks and containers,

                                   II.          Accepting values from external sources,

                                  III.          Building code dynamically that is then executed,

                                 IV.          Using variables for auditing and logging.

3.   Data Flow Task

a.     Data flow source adapters

                                     I.          ADO.NET Source 

                                   II.          Excel Source

                                  III.          Flat File Source

                                 IV.          OLE DB Source

                                   V.          Raw File Source

                                 VI.          XML Source

b.     Data flow transformations

N = non-blocking transformations - Each row is passed through the transformation without any waits.
P = partial-blocking transformation - Waits until a sufficient number of rows is stored and then it passes them through.
B = blocking transformation- All rows must be read into the transformation before the transformation can pass them through.

                                     I.          Logical Row-Level Transformations

1)              Audit

Adds additional columns to each row based on system package variables such as ExecutionStartTime andPackageName.

N

2)              Cache Transform

Allows you to write data to a cache with the Cache connection manager. The data can then be used by the Lookup transformation. This is useful if you are using multiple Lookup transformations against the same data, because SSIS will cache the needed data only once and not for each Lookup component.

N

3)              Character Map

Performs common text operations such as Uppercase and allows advanced linguistic bit-conversion operations.

N

4)              Copy Column

Duplicates column values in each row to a new named column.

N

5)              Data Conversion

Creates a new column in each row based on a new data type converted from the existing column. An example is converting text to numeric data or text to Unicode text.

N

6)              Derived Column

Creates or replaces a column for each row based on a specified SSIS expression. This is the most often used logical row-level transformation because it enables the replacement of column values or the creation of new columns based on existing columns, variables, and parameters.

N

7)              Export Column

Exports binary large objects (BLOB) columns, one row at a time, to a file.

N

8)              Import Column

Loads binary files such as images into the pipeline; intended for a BLOB data type destination.

N

9)              Row Count

Tracks the number of rows that flow through the transformation and stores the number in a package variable after the final row.

N

 

                                   II.          Multi-Input and Multi-Output Transformations

10)           Balanced Data Distributor

Takes a single input buffer and distributes the incoming rows to one or more outputs uniformly via multithreading.

N

11)           CDC Splitter

Splits a single flow of changed rows from the CDC source component into multiple data flows based on the type of the source data change (that is, whether it is an insert, update, or delete operation). CDC Splitter routes the data based on the __$operation column into three possible outputs. his transformation is like a specific version of the Conditional Split transformation that automatically handles the standard values of the __$operationcolumn.

N

12)           Conditional Split

Routes or filters data based on a Boolean expression to one or more outputs, from which each row can be sent out only one output path.

N

13)           Lookup

Performs a lookup operation between a current row and an external dataset on one or more columns. Additional columns can be added to the data flow from the external dataset.

N

14)           Merge

Combines the rows of two similar sorted inputs, one on top of the other, based on a defined sort key.

P

15)           Merge Join

Joins the rows of two sorted inputs based on a defined join column or columns, adding columns from each source.

P

16)           Multicast

Generates one or mode identical outputs, from which every row is sent out every output. This transformation creates a logical copy of the data.

N

17)           Union All

Combines one or more similar inputs, stacking rows one on top of another, based on matching columns. The number of rows in the output of Union All is the combined row counts of all the inputs.

P

 

                                  III.          Multi-Row Transformations

18)           Aggregate

Associates rows based on defined grouping and generates aggregations such as SUM, MAX, MIN, and COUNT.

B

19)           Percent Sampling

Filters the input rows by allowing only a defined percent to be passed to the output path.

Percent sample transformation takes the same percentage of each buffer. It knows in advance, how much rows to take from each buffer (based on the size of the buffer), so it can process a buffer immediately and let it fly down the pipeline.

N

20)           Pivot

Takes multiple input rows and pivots the rows to generate an output with more columns based on the original row values.

P

21)           Row Sampling

Generates a fixed number of rows, sampling the data from the entire input, no matter how much larger than the defined output the input is.

Row sample transformation, on the other hand, doesn't know in advance, how many rows it needs from each buffer, because it doesn't know how much buffers it is waiting for. So this transformation has to wait, count all the buffers, then calculate, how much rows to take from each buffer, and then process all those buffers. 

B

22)           Sort

Orders the input based on defined sort columns and sort directions. The Sort transformation also allows the removal of duplicates across the sort columns.

B

23)           Unpivot

Takes a single row and generates multiple rows, moving column values to the new row based on defined columns.

P

 

                                 IV.          Advanced Data-Preparation Transformations

24)           DQS Cleansing

Validates rows by automatically performing data cleansing using an existing knowledge base in Data Quality Services (DQS).

P

25)           OLE DB Command

Performs database operations such as updates or deletions, one row at a time, based on mapped parameters from input rows.

N

26)           Slowly Changing Dimension

Generates transformations necessary to support loading dimension tables in data warehouse scenarios. This transformation handles SCD (Slowly Changing Dimension) Type 1 and Type 2 and also has support for inferred members.

1)      Fixed Attribute, which means the change should not happen;

2)   Changing Attribute, which means that an update happens; or

3)   Historical Attribute.

N

27)           Data Mining Query

Applies input rows against a data mining model for prediction.

P

28)           Fuzzy Grouping

Performs de-duplication based on similarity of string values in selected columns.

B

29)           Fuzzy Lookup

Joins a data flow input to a reference table based on column similarity. The Similarity Threshold setting specifies the closeness of allowed matches—a high setting means that matching values are close in similarity.

B

30)           Script Component

Applies custom .NET scripting capabilities against rows, columns, inputs, and outputs in the data flow pipeline. This is the most powerful component. Chapter 19, “Implementing Custom Code in SSIS Packages” looks at some of its possibilities.

N

31)           Term Extraction

Analyzes text input columns for English-language nouns and noun phrases.

P

32)           Term Lookup

Analyzes text input columns against a user-defined set of words for association.

P

 

Non-Blocking

1)     Audit

2)     Balanced Data Distributor (added in SQL 2012)

3)     Cache Transform

4)     CDC Splitter

5)     Character Map

6)     Conditional Split

7)     Copy Column

8)     Data Conversion

9)     Derived Column

10)  Import Column

11)  Lookup

12)  Multicast

13)  OLE DB

14)  Output Column

15)  Percentage sampling

16)  Row count

17)  SCD

18)  Script component

Partially Blocking

4)     Data mining

5)     DQS Cleansing

6)     Merge

7)     Merge Join

8)     Pivot

9)     Unpivot

10)  Term Extraction

11)  Term Lookup

12)  Union All

Blocking

1)     Aggregate

2)     Fuzzy Grouping

3)     Fuzzy Lookup

4)     Sort

5)     Row Sampling

 

Changes of the Data Transformations in SSIS from SQL 2005 to SQL Server 2014

2005

2008/2008R2

2012/2014

Business Intelligence Transformations

-    Perform business intelligence operations such as cleaning data, mining text, and running data mining prediction queries.

Slowly Changing Dimension Transformation

Fuzzy Grouping Transformation

Fuzzy Lookup Transformation

Term Extraction Transformation

Term Lookup Transformation

Data Mining Query Transformation

 

 

 

 

 

 

DQS Cleansing Transformation added in 2012

Row Transformations

-    Update column values and create new columns. The transformation is applied to each row in the transformation input.

Character Map Transformation

Copy Column Transformation

Data Conversion Transformation

Derived Column Transformation

Export Column Transformation

Import Column Transformation

Script Component

OLE DB Command Transformation

Rowset Transformations

-    Create new rowsets. The rowset can include aggregate and sorted values, sample rowsets, or pivoted and unpivoted rowsets.

Aggregate Transformation

Sort Transformation

Percentage Sampling Transformation

Row Sampling Transformation

Pivot Transformation

Unpivot Transformation

Split and Join Transformations

-   Distribute rows to different outputs, create copies of the transformation inputs, join multiple inputs into one output, and perform lookup operations.

Conditional Split Transformation

Multicast Transformation

Union All Transformation

Merge Transformation

Merge Join Transformation

Lookup Transformation

 

 

 

 

 

 

Cache Transformation added in 2008

 

 

 

 

 

 

 

Balanced Data Distributor Transformation added in 2012

Auditing Transformations

-    Add audit information and count rows.

Audit Transformation

Row Count Transformation

Custom Transformations

-    Write custom transformations.

 

c.      Data flow destination adapters

                                     I.          ADO.NET Destination - Allows insertion of data by using an ADO.NET managed provider.

                                   II.          Excel Destination - Used for inserting data into Excel, including Excel 2007.

                                  III.          Flat File Destination - Allows insertion of data to a flat fi le such as a comma-delimited or tab-delimited file.

                                 IV.          OLE DB Destination - Uses the OLE DB provider to insert rows into a destination system that allows an OLE DB connection.

·       Generally the fast load option is the preferred method from the performance stand point. But this option commits rows in batches. If you want to capture only the offending rows when an error occurs in an OLE DB destination, then you should not use this option. 

                                   V.          Raw File Destination - Stores native SSIS data in a binary fi le type useful for data staging.

                                 VI.          Recordset Destination - Takes the data flow data and creates a recordset in a package variable of type object.

                                VII.          DataReader Destination - Lets you put data in an ADO.NET Recordset that can be programmatically referenced.

                              VIII.          SQL Server Compact Destination - Lets you send data to a mobile device running SQL Mobile.

                                  IX.          SQL Server Destination - Provides a high-speed destination specific to SQL Server 2008 if the package is running on SQL Server.

                                   X.          Data Mining Model Training - Lets you pass data from the data flow into a data mining model in SSAS.

                                  XI.          Dimension Processing - Lets SSAS dimensions be processed directly from data flowing through the data flow.

                                XII.          Partition Processing - Allows SSAS partitions to be processed directly from data flowing through the data flow.

                              XIII.          ODBC since SQL Server 2008R2

4.   Debugging and Error Handling in SSIS

a.     Defining Package and Task Transaction Settings

                                     I.          You can set package transactions at the entire package level (set the TransactionOption property at the control flow level for the package) or at any control flow container level or task level.

                                   II.          Transactions in SSIS use the Microsoft Distributed Transaction Coordinator (MSDTC)

                                  III.          To enable a transaction within a package, you need to set the TransactionOption property of the task or container to Required.

                                 IV.          Transactions work at the control flow level and not within a data flow.

                                   V.          At times, you might want to enable a transaction for a container but exclude some of the tasks within the container. For example, if you have a couple of Execute SQL Tasks in the container that are used for auditing purposes but the Transactionoption property for the container is set to Required, if an error occurs, the logging tasks will also be rolled back. To prevent the auditing tasks from rolling back, set the Transactionoption property for those tasks to not Supported. This will still let the other tasks in the container be in the transaction, but it will not include the auditing tasks as part of the transaction.

b.     Implementing Restartability Checkpoints

                                     I.          First enabling a package to use checkpoints at the package level

·       Package properties | Set the SaveCheckpoints = True, CheckpointsUsage = IfExists, CheckpointsFileName = …

·       Using checkpoints is not allowed if you have set the Transactionoption of the package to “Required” because the entire package will be rolled back if there is an error and thus you cannot write checkpoints for any specific tasks or containers within the package.

                                   II.          Second setting the specific tasks and containers to write checkpoints.

·       Set the FailPackageOnFailure property at each task or container to True.

                                  III.          BEST PRACTICES - Manage multiple tasks with transactions and restartability

·       Sometimes you will need to roll back a series of tasks if one of the tasks fails, but you want to go back only to the point of failure, not to the start of the package. To do this, you need to enable checkpoints in the package and then use a Sequence Container that holds all the associated tasks. The Sequence Container needs to have the transactions turned on by setting the TransactionOption to ‘Required’ at the container level.

c.      Configuring Execution Logging

                                     I.          You enable logging on a package-by-package basis,

                                   II.          How?

1)              You turn on logging for a package by first opening a package in BIDS and then selecting Logging from the SSIS menu.

2)              Choosing what to log – Typically package, but container and tasks OK too

3)              Setting the log provider type - Text Files, SQL Server, Windows Event Log, SQL Server Profiler, or an XML file.

4)              Selecting the log events

d.     Connecting Control flow objects with Precedence

                                     I.          Precedence Constraint - The connectors between objects in the control flow are precedence constraints, whereas the connectors in the data flow are data paths.

                                   II.          Expression - Expressions in the precedence constraints are advanced Boolean evaluators

e.     Handling Data flow Errors and Debugging

                                     I.          Data paths

                                   II.          Error paths - to configure the error output

1)              Fail Transformation

2)              Ignore Failure

3)              Redirect Row

                                  III.          Using Data Viewers to Identify Data flow Issues – not working in command-line

f.      Handling Package Errors with Event Handlers

                                     I.          Event handlers

a.      To track package execution;

b.     To audit the execution,

c.      To capture the errors that occur in a task.

d.     To notify an administrator of a certain predefined condition that requires a special response by using the Sending Mail task

                                   II.          Debugging the Control flow with Breakpoints – F9

5.   Package Configurations

a.     What’s for?

·       Package configurations let you update properties, variable values, and connections at run time.

b.     Why?

·       Letting you update package settings without having to open each package in BIDS.

·       Giving you a central location for settings that can be shared between packages. For example, using package configurations lets you maintain the connection strings and the variable settings for all your packages in a single location. This can save you time and effort when you need to change multiple packages to deploy them to a new server environment.

·       Configurations are very beneficial when you are moving packages from one environment to another and when you need to update connections or settings for multiple packages at one time.

c.      What can be configured by using package configurations?

·       Package variable properties

·       Connection properties

·       Package properties - These include any property that you defined at the package level, such as package-level transaction options, checkpoint settings, and security settings.

·       Task and container properties

·       The above properties are grouped by the following folders in SSIS wizard :

o   Variables 

o   Connection managers 

o   Log providers - Lets you dynamically set the log configuration, which is the connection that the log provider will use.

o   Properties - Displays all the package-level (rather than task-level or container-level) properties that you can use to configure your package.

o   Executables - Contains the tree structure of your tasks and containers. By navigating through this tree, you can configure the specific properties of your tasks and containers.

·       Note: Data flow component properties CANNOT be updated by Configurations - Most data flow properties cannot be configured by using package configurations. You can configure only the Data flow Container itself. However, outside configurations, there are ways to make some of the data flow components dynamic at run time, such as by using variables and SSIS expressions to modify properties.

d.     How?

                                     I.          Use the Package Configurations Organizer

1)              Enable or disable a package’s package configurations.

2)              Add and remove configurations assigned to the package.

3)              Define the order in which the configurations are applied.

                                   II.          Creating a Configuration

1)              Select Configuration Type

·       XML Configuration File

·       Environment Variable - indirect configuration. Indirect configurations are useful when the location of the file changes from the development server to the deployment server. To use the indirect configuration, you first need to create the file by using the wizard and then go back to edit the configuration and assign the environment variable.

·       Registry Entry

·       Parent Package Variable - Provides a way to inherit the value of a variable from a parent package. When a package is executed from another SSIS package by using the Execute Package Task, the values of its variables are available to the child package through the Parent Package Variable configuration. With this configuration type, you can choose only one package property setting at a time.

·       SQL Server

o   Two ways to store the SQL Server configurations

§  You can specify an environment variable as the location of your configuration (which is the actually data source name)

§  Yu can specify the configuration settings directly.

·       Connection database.

·       Configuration Table.

·       Configuration Filter Multiple SQL Server configurations can share the same table, which you can specify by using the Configuration Filter dropdown list. You can either enter a new filter or choose one that already exists in the table.

o   Advantages - Packages can share SQL Server configurations even if they (i.e., the packages) are running on different servers because the SQL Server configuration does not have to be stored on the local server, unlike the other configuration types.

2)              Adding Properties to your Configuration – see the figure above

                                  III.          Sharing, ordering, and editing your Configurations

e.     When to use which?

                                     I.          If you have multiple packages that need to share a configuration but the packages are executing on different computers, you should use the SQL Server configuration type. All the packages can point to the SQL Server database, which can reside on a different server. You need to create all the other configuration types locally on the server, so you cannot share those configurations across servers.

                                   II.          If you are executing packages together, with one package executing a second package by using the Execute Package Task, and you need to pass the value of a variable from the parent package to the child package, you should use the Parent Package Variable configuration.

                                  III.          If you are planning to move packages from a development server to a test server and then to a production server and each environment needs to use a different location for the configuration (such as the path to an XML Configuration File), you should use the indirect configuration setup for the configuration. This uses an environment variable on the system, which points to the correct XML file.

                                 IV.          If you have more than one configuration entry that you want to put in a single SSIS configuration, you need to use either the XML Configuration File or the SQL Server configuration because these configuration types allow the inclusion of more than one property (such as a connection string and a variable value) per configuration. The other configuration types (Environment Variable, Registry Entry, and Parent Package Variable) allow only one property per configuration.

6.   SSIS Expressions

a.     What’s for? – The expression language helps generate dynamic properties OR test Boolean logic.

b.     Key components

                                     I.          A list of package variables

                                   II.          A list of columns in the data flow

                                  III.          SSIS functions

1)              Mathematical functions

2)              String functions

3)              Date/time functions

4)              NULL functions

5)              Type casts

6)              Operators

c.      Where to use SSIS expressions?

                                     I.          Derived Column Transformation

                                   II.          Conditional Split Transformation

                                  III.          For Loop Container / Foreach Loop Container (I added)

·       Not only can you set package, task, and container properties through expressions, but you can also update the value of connection properties while a package is running. This capability is especially useful when you have a ForEach loop that is iterating over files. You can use the variable that the full path is being captured to and update the connection for the file with a property expression.

                                 IV.          Variable Expression - To use this capability, you need to set the EvaluateAsExpression property to True and then enter an expression in the Expression property.

                                   V.          Property Expressions

7.   Deploying SSIS Packages - moving the package to a new environment

a.     How?

                                     I.          Creating a package installer set by using the Package Deployment Utility.

·       SSIS project |Properties | Deployment Utility property | CreateDeploymentUtility = true | Build è .SSISDeploymentManifest ( and a copy of all the packages, XML configuration files, and the project’s miscellaneous files are all created with the deployment utility. These files are copied to the location specified in the project’s Deployment OutputPath property.)

                                   II.          Deploying the installer set to your new deployment destination.

1)              Manual

·       Move the actual .dtsx files to a new server environment on a share

·       Manually import them into SQL Server through SSMS (Open the Stored Packages folder, right-click the MSDB folder, and then click Import Package to open the Import Package dialog box)

·       Using the DTUtil command-line utility

o   /COPY - Copies packages from a source to a destination

o   /MOVE - Moves packages from a source to a destination

o   /DELETE - Deletes packages from a destination

o   /EXISTS - Checks for the existence of a package

o   /ENCRYPT - Encrypts an existing package

·       0 = DontSaveSensitive

·       1 = EncryptSensitiveWithUserKey

·       2 = EncryptSensitiveWithPassword

·       3 = EncryptAllWithPassword

·       4 = EncryptAllWithUserKey

·       5 = SQLServerStorage

 

-- Encrypting MyPackage.dtsx with the password EncPswd:

dtutil.exe /file MyPackage.dtsx /encrypt file;MyPackage.dtsx;3;EncPswd

 

o   /DECRYPT - Decrypts a package by using a package password

o   /SIGN - Digitally signs a package

 

-- Signing a package named MyPackage, located on the local SQL Server instance:

dtutil.exe /SIGN SQL;MyPackage

 

o   /FCREATE - Creates and manages folders for the SSIS Package Store and in the SQL Server msdb database

2)              Automatic - Using the Package Installation Wizard

·       Whether you are deploying to SQL Server or to the fi le system, the Package Installation wizard copies dependent fi les to the folder you specify. This includes XML configuration files that are used by any package in the project (but only where the XML configuration path is hard-coded in the package and does not use an environment variable). The Package Installation Wizard will also update the XML configuration entries in the package and change them to point to the new XML Configuration file path.

b.     Three options for package storage:

                                     I.          SSIS Deployment to the file system Independently of the projects in which they were created, packages can be placed on file shares or in file folders from which they can then be executed (legacy since SQL Server 2012).

                                   II.          SSIS Deployment to SQL Server – msdb.dbo.sysssispackages (legacy since SQL Server 2012)

                                  III.          SSISDB catalog.

8.   Managing the SSIS Service

a.     What is it?

·       The SSIS service is a Windows service named SQL Server Integration Services 10.0.

·       The SSIS service is installed when you install the Integration Services components of SQL Server 2008 on a server.

·       Be sure to use the SQL Server Configuration Manager to configure the startup requirements and the service account.

b.     What’s for?

·       The SSIS service is not required to run packages, but

·       It can be beneficial to package execution because the service caches the SSIS components into memory, allowing a faster startup time during execution.

·       The biggest benefit to using the SSIS service is the management and securing of packages deployed to SQL Server. After you connect to the SSIS service, the Object Explorer window in SSMS lets you manage, secure, and execute packages deployed to the server.

c.      Managing and Securing the Packages deployed to SQL Server

                                     I.          Packages in SSMS

1)              The Running Packages folder

2)              The Stored Packages folder

·       File System - It does not show all the files on the server, but instead shows only files that are stored in the package store location. Default: %Program Files%\Microsoft SQL Server\100\DTS\Packages\ folder. But you can change it

·       MSDB - packages stored in the SQL Server msdb database in a table named [dbo].[sysssispackages].

                                   II.          Managing Packages in SSMS – Import/export, upgrade, run, delete

                                  III.          Configuring the SSIS Service in a Windows Cluster Environment

·       There can be only one service for SSIS installed per server.

·       The SSIS service is not cluster aware during the installation, so you need to configure it separately if you are working with a Windows cluster.

·       There are two ways to configure the SSIS service in a clustered environment (https://msdn.microsoft.com/en-us/library/ms345193.aspx):

o   Installing the SSIS service independent from the cluster resources - You can install the SSIS components on all nodes of the cluster, and by default, they will not be part of any cluster resource group. The service will be started on all nodes, and you can run packages from any node in the cluster.

§  If all your packages are stored on a network share that is not part of the package store, no further configuration is needed.

§  If you want to centralize all your packages in SQL Server, you need to change the MsDtsSrvr.ini.xml file. Change the <ServerName> element to reference a specific server and instance; if the SQL Server is in the cluster, use the virtual server name and instance. Last, change this file on all the nodes so that you can connect to the SSIS service on any machine and see the same packages.

o   Integrating the SSIS service in a cluster group - You can add the SSIS service as a cluster resource.

§  If you want to store packages in the package store, you would choose this approach because the shared service would be running on only one node at a time and you could reference the virtual name of the server. The service would need to point to a shared MsDtsSrvr.ini.xml fi le on a shared drive in the same cluster resource group as the service. This requires a registry change, which is also documented. The package store location also must be on the shared drive in the same cluster resource group.

§  When you are running a package on a server node of a Windows cluster environment and the node fails, the restartability rules apply. you can turn on checkpoints in your packages and have the checkpoint fi le created on a share so that if the package needs to be restarted, it can locate and use the checkpoint file.

                                 IV.          SSIS Package Security

1)              Why?

·       SSIS packages themselves do not contain any source or destination data, but

·       They could provide information about the underlying data they are processing:

o   Connection information

o   Schema information (i.e., if that person acquires an unencrypted package, he or she would have access to what those sources and destinations look like.

Therefore, you need to consider how to prevent access to either the connections or the entire package.

2)              Three ways to limit access to packages:

·       You can apply SQL Server security roles to any package that has been deployed to the SQL Server msdb database. You can define which SQL logins or Windows groups or users have read and write access to a package, which controls whether someone can execute or export a package.

o   The roles are fixed database-level roles, but you create new database roles or assign users and groups to the database roles through the msdb database.

o   Roles are assigned read and/or write actions as follows:

§  Read actions focus on viewing and executing packages.

§  Write actions apply to moving packages in and out of msdb without executing the packages.

o   Guidelines when deciding which roles to use:

§  For SSIS execution managers who should have execute privileges but should not be adding packages to SQL Server, you should assign the db_ssisoperator role, which can view, execute, and export packages.

§  For users who should be able to work only with packages that they created, you should assign the db_ssisltduser role, which has limited access rights and enables those users to execute and export only packages that those particular users imported to the SQL Server. Users assigned to the db_ssisltduser role cannot work with other users’ folders.

·       You can employ security encryption by using the package protection level, in which either part or all of a package is encrypted. You can set security encryption properties when developing a package in BIDS, when deploying a package, or when importing or exporting a package. You can apply these properties either to the entire package or to the sensitive data only. It is important that you thoroughly understand what securing a package means regarding connection information and schema information.

o   A package’s ProtectionLevel property encrypts package definition information that resides in the underlying XML file.

o   You set the ProtectionLevel property at the package level when you are editing the package in BIDS. This means that you define what should be encrypted and how it should be encrypted for the entire package.

o   You can also set a PackagePassword property, which you use when the ProtectionLevel requires encrypting metadata in the package with a password.

o   ProtectionLevel Options

§  DontSaveSensitive - Sensitive information is not saved in the package at all. Each time the package is reopened in BIDS, the connection passwords must be reentered. For package execution, the password needs to be stored in a package configuration or passed to the package through the command-line setting. This is the recommended setting if you are exclusively using Windows Authentication for your connections.

§  EncryptAllWithPassword - The entire package is encrypted with a password that is set in the PackagePassword property.

§  EncryptAllWithUserKey

§  EncryptSensitiveWithPassword

§  EncryptSensitiveWithUserKey

§  ServerStorage - This setting can be used only when a package is stored in a SQL Server database. Server storage means that nothing in the package definition is encrypted. Instead, the entire package is protected by using a SQL Server database role. This option is not supported when a package is saved to the file system.

o   Guidelines for choosing a package ProtectionLevel setting:

§  If all your connections use Windows Authentication and you do not have sensitive information in your package (with the exception of a custom component that has sensitivity defined). In this situation, you should choose DontSaveSensitive because there is nothing to be encrypted. Your package can be executed on any computer by any user as long as the Windows account running the package has the appropriate security privileges for the connections.

§  Sensitive information cannot be stored in clear text in a package file stored in the file system. So if you have sensitive information in a package, such as a connection password, you have to:

·       Use an SSIS configuration to update the connection password at run time, or

·       Be able to decrypt the password either by using the package password, or

·       By running the package on the same computer by the same user account, or

·       Passing the connection password to the package through the DTExec command-line utility.

·       You can use file-level security, in which you can apply access rights to the file or folder if your packages reside on a computer’s file system.

9.   SSIS Server Security in SQL Server 2012+

a.     The functionalities that were once provided by the SSIS Service in SQL Server 2005 – 2008R2 have been replaced in SQL Server 2012 by the SSIS Server - an instance of SQL Server hosting the SSISDB catalog. SQL Server 2012 uses this new, special, database to store SSIS solutions (projects, packages, and parameters), together with the complete operational historyThe SSISDB catalog is also used in managing SSIS security (assignment of permissions on the SSIS objects stored inside the SSISDB catalog).

b.      The SSISDB catalog serves as the dedicated storage facility for SSIS solutions, providing not only a safe location for storing SSIS projects, packages, and configurations, but also SSIS solution maintenance, monitoring, and troubleshooting

 

c.      Does not need msdb and file system storage anymore - In contrast to the older SSIS package store, SSISDB is not shared with any other part of the SQL Server platform (which is the case with the msdb system database) and does not depend on any external features that are not part of SQL Server (which is the case with the file system). Therefore, for your future work, you should plan on exclusively using the SSISDB catalog as the deployment target for all your SSIS solutions.

d.     SSIS Server Security

                                     I.          They are governed by the same general security model in SQL Server, that is, SSISDB permissions define which SSISDB principals can perform which SSISDB operations on which SSISDB securables. They share the same three possible actions on permissions: grant, deny, and revoke as well. However, there are significant differences between them on securables, principals, and permission granularity.

                                    II.          Securables in SSIS Server?

 

1)                Folders - These represent the topmost securables in the SSISDB object hierarchy; they contain all other objects.

2)                Projects - These represent child securables of the folders they belong to; they contain packages and environment references.

3)                Environments - These represent child securables of the folders they belong to; they contain environment variables.

4)                Operations – They represent actions performed against the SSISDB catalog (deployments and catalog-level configurations) and against SSISDB projects and packages (validations and executions).

 

                                   III.          Principals in SSIS Server?

 

1)                Different from the three fixed database roles (i.e., db_ssisadmindb_ssisltduser, and db_ssisoperator) in the msdb database in SSIS Service Security, there is only one special database role in the SSISDB catalog named ssis_admin. Others are the standard fixed database role in SQL Server, including the Public database role.

2)                This ssis_admin role can perform administrative tasks on the SSIS server; they can view and modify any SSISDB securable. The general db_owner database role is a default member of the ssis_admin role; therefore, the person who creates the SSISDB catalog automatically becomes a member of the ssis_admin role.

3)                The other SSISDB users are implemented the same way as in any other SQL Server 2012 databases. That is, you create a login, a user first, and then assign permissions to the user on the SSISDB securables. A SQL Server user who has been granted access to the SSISDB catalog, also referred to as an SSISDB user, can query any public catalog view and execute any public stored procedures, except for the following administrative stored procedures: catalog.configure_server, catalog.grant_permission, and catalog.revoke_permission.

                                  IV.          Permission in SSIS Server security?

 

Permission Name

Applicable Object Types

READ

Folder, Project, Environment, Operation

MODIFY

Folder, Project, Environment, Operation

EXECUTE

Project

MANAGE_PERMISSIONS

Folder, Project, Environment, Operation

CREATE_OBJECTS

Folder

READ_OBJECTS

Folder

MODIFY_OBJECTS

Folder

EXECUTE_OBJECTS

Folder

MANAGE_OBJECT_PERMISSIONS

Folder

 

                                    V.          How to implement SSIS Server Security via Roles, not via individual users?

 

1)                If you want to allow a login to be able to do anything on SSISDB, make it as a member of the ssis_admin role.

2)                If you want to allow a login to be able to read/execute only one project, but not able to access any other objects (projects or environments) in a folder where the project is in:

·       Make it a member of the SSISDB database role -- public. (This leverages SQL security mechanism.)

·       Grant it Read to the folder, and grant it Read/Execute to the project. (This uses SSIS Catalog security mechanism.)

3)                If you like to allow a login (user or group) to be able to read/execute all projects in a folder:

·       Map it to a member of the SSISDB database role -- public.

·       Grant it Read/Execute/Read Objects to the folder.

10.         Executing Packages

a.     By programmatically using the SSIS object model - Load and execute packages programmatically inside an application.

                                     I.          Namespace: Microsoft.SqlServer.Dts.Runtime

                                   II.          Classes:

1)              The Application class includes the LoadFromSQLServer method and the LoadPackage method—to load a package from either SQL Server or the file system, respectively.

2)              The Package class includes an Execute method, which runs the package and returns the DTSExecResult enumeration with the success or failure of the execution.

b.     Using DTExecUI to Configure Package Execution

                                     I.          Entering DTExecUI in the Run dialog box, or through a command prompt.

                                   II.          DTExecUI Execution Configuration Options:

1)              General - As described earlier in steps 2 and 3, you can specify the package location and name on this property page.

2)              Configurations - In addition to the configurations already defined in the package, you can add XML configurations to a package at execution time through this property page. These XML configurations are added to the configurations that currently exist in the package.

3)              Command Files - The DTExec.exe parameters can be stored in a separate file, and the Command Files properties let you specify the text file that will contain the parameters.

4)              Connection Managers - For the selected package, the connections can be overwritten at run time with new connection information. Select the connections to overwrite, and then manually alter the connection string.

5)              Execution Options - The Execution Options property page allows advanced execution details for the package execution, including the following:

·       Validate Package Without Executing

·       Maximum Concurrent Executables

·       Enable Package Checkpoints

·       Browse To Checkpoint File

·       Override Restart Options

·       Restart Options

6)              Reporting - The Reporting properties define the information returned to the command-line output. The selected events determine the level of information returned.

7)              Logging - Log providers can be added at run time.

8)              Set Values - The Set Values properties let you override package properties, including package variables.

9)              Verification - On this property page, you set verification options that allow a package to run, using the following:

·       Execute Only Signed Packages

·       Verify Package Build

·       Build

·       Verify Package ID

·       Package ID

·       Verify Version ID

·       Version ID

10)           Command Line - The Command Line property page shows the command line that will be passed to DTExec. All the options you configured on the other property pages will be accounted for in the command-line text, which you can edit manually.

c.      Using DTExec for Package Execution

                                     I.          The more common execution method is through the command line. SSIS ships with a command-line utility named DTExec, which can be embedded in any other command-line execution (e.g., a batch file) or scheduling tool, and DTExecUI can help you build the command line. In addition, with SQL Server 2008, SQL Server Agent has direct support for SSIS command-line execution.

                                   II.          It is still good in SQL Server 2012+. The dtexec utility lets you load packages from these sources: the Integration Services server, an .ispac project file, a Microsoft SQL Server database, the SSIS Package Store, and the file system.

                                  III.          When a package is executed from the command line, the package will always execute at the location from which DTExec is run—not the location at which it is stored. SSIS packages can be located on remote fi le shares or remote SQL Servers, but the package will execute where the DTExec statement is located.

                                 IV.          Too many parameters, see https://msdn.microsoft.com/en-us/library/hh231187.aspx for details

                                   V.          Examples:

-- To execute an SSIS package saved to SQL Server with Windows Authentication, use the following code:

dtexec.exe /sql MyPackage /server SQLProd

-- To execute a package in the fi le system, use the following code:

dtexec.exe /file "c:\MyPackage.dtsx"

-- To run a package that has been saved in the fi le system with additional logging options, use the following code:

dtexec.exe /f "c:\MyPackage.dtsx" /l "DTS.LogProviderTextFile;c:\SSISlog.txt"

-- To run a package stored in the fi le system, passing in an XML confi guration fi le, run the following code:

dtexec.exe /f "c:\pkgOne.dtsx" /conf "c:\pkgOneConfig.cfg"

d.     Executing Packages in SSMS with the SSIS Service Manually - SSMS will open the DTExecUI utility. It will be preconfigured with the package location information on the General property page of the tool, and you can configure the other property pages as previously described.

e.     Creating SQL Server Agent Jobs to Execute SSIS Packages – Key steps

·       In the Type drop-down list, select SQL Server Integration Services Package

·       The New Job Step window uses the DTExecUI interface, thus, the DTExec command is the final output.

·       Scheduling the job

·       SQL Server Agent or Proxy Account?

o   Any job scheduled under SQL Server Agent will run under the account through which SQL Server Agent is logged in. By default, this is the Local System. (This account is set under SQL Server Configuration Manager.)

o   If your packages are set up to use Windows Authentication and no configurations have been set up, the user will need permissions to perform each of the database tasks in the package.

o   If necessary, create a proxy account to run the job