Changes of the Data Transformations in SSIS from SQL 2005 to SQL Server 2014


2005
2008/2008R2
2012/2014
Business Intelligence Transformations
-        Perform business intelligence operations such as cleaning data, mining text, and running data mining prediction queries.

Slowly Changing Dimension Transformation
Fuzzy Grouping Transformation
Fuzzy Lookup Transformation
Term Extraction Transformation
Term Lookup Transformation
Data Mining Query Transformation








DQS Cleansing Transformation added in 2012
Row Transformations
-        Update column values and create new columns. The transformation is applied to each row in the transformation input.

Character Map Transformation
Copy Column Transformation
Data Conversion Transformation
Derived Column Transformation
Export Column Transformation
Import Column Transformation
Script Component
OLE DB Command Transformation


Rowset Transformations
-        Create new rowsets. The rowset can include aggregate and sorted values, sample rowsets, or pivoted and unpivoted rowsets.

Aggregate Transformation
Sort Transformation
Percentage Sampling Transformation
Row Sampling Transformation
Pivot Transformation
Unpivot Transformation


Split and Join Transformations
-       Distribute rows to different outputs, create copies of the transformation inputs, join multiple inputs into one output, and perform lookup operations.

Conditional Split Transformation
Multicast Transformation
Union All Transformation
Merge Transformation
Merge Join Transformation
Lookup Transformation







Cache Transformation added in 2008







Balanced Data Distributor Transformation added in 2014
Auditing Transformations
-        Add audit information and count rows.

Audit Transformation
Row Count Transformation


Custom Transformations
-        Write custom transformations.

Changes of the Data Destinations in SSIS from SQL 2005 to SQL Server 2014

2005
2008/2008R2
2012/2014

ADO NET Destination added in 2008

    Trains data mining models.


    Exposes the data in a data flow by using the ADO.NET DataReader interface.


    Loads and processes an SQL Server 2005 Analysis Services (SSAS) dimension.


    Writes data to an Excel Workbook.


    Write data to a flat file.




ODBC Destination added in 2012
    Loads data using an OLE DB provider.


    Loads and processes an Analysis Services partition.


    Writes raw data to a file


    Creates an ADO recordset.


    Uses script to extract, transform, or load data.


    Inserts rows into a SQL Server 2005 Compact Edition database.


    Bulk inserts data into a SQL Server 2005 table or view.


Changes of the Data Sources in SSIS from SQL 2005 to SQL Server 2014

2005
2008/2008R2
2012
2014
     Consumes data from a .NET Framework data provider.
Removed, replaced with





    Extracts data from an Excel file.



    Extracts data from a flat file.






    Consumes data from an OLE DB provider.



    Extracts raw data from a file.



    Uses script to extract, transform, or load data.



    Extracts data from an XML file.



Changes of the Control Flow Tasks in SSIS from SQL 2005 to SQL Server 2014


2005
2008/2008R2
2012/2014
Data Flow Task
-        Runs data flows to extract data, apply column level transformations, and load data.

Data Flow


Data Preparation Tasks
-        Copy files and directories; download files and data; run Web methods; apply operations to XML documents; and profile data for cleansing.



File System
FTP
Web Service
XML
 Data Profiling(added in 08)

Expression added in 2012
Workflow Tasks
-        Communicate with other processes to run packages, run programs or batch files, send and receive messages between packages, send e-mail messages, read Windows Management Instrumentation (WMI) data, and watch for WMI events.

Execute DTS 2000 Package
Execute Package
Execute Process
Message Queue
Send Mail
WMI Data Reader
WMI Event Watcher
For back-compatibility

Removed in 2012
SQL Server Tasks
-        Access, copy, insert, delete, and modify SQL Server objects and data.

Bulk Insert

Execute SQL
Transfer Database
Transfer Error Message
Transfer Jobs
Transfer Logins
Transfer Master Stored Procedures
Transfer SQL Server Objects


CDC Control added in 2012
Scripting Tasks
-        Extend package functionality by using scripts.

ActiveX Script
Script
 For back-compatibility

Removed in 2012
Analysis Service Tasks
-        Create, modify, delete, and process Analysis Services objects.

Analysis Services Execute DDL
Analysis Services Processing
Data Mining


Maintenance Tasks
-        Perform administrative functions such as backing up and shrinking SQL Server databases, rebuilding and reorganizing indexes, and running SQL Server Agent jobs.

Backup Database
Check Data Integrity
Execute SQL Server Agent Jobs
Execute T-SQL Statement
History Cleanup
Notify operator

Rebuild index
Reorganize index
Shrink Database
Update Statistics






Maintenance Cleanup added in 08

Custom Tasks
-        Write custom tasks using a programming language that supports COM, such as Visual Basic, or a .NET programming language, such as C#.

SQL Server BI Tools - Confusing...

The happy beginning:
·       The BI Tool in SQL Server 2005 uses the Visual Studio 2005 Shell.
·       The BI Tool in SQL Server 2008 uses the Visual Studio 2008 Shell.
·       The BI Tool in SQL Server 2008 R2 uses the Visual Studio 2008 Shell as well.

The separation starts:
SQL Server 2010 did not get released, but Visual Studio 2010 was released, and it includes SQL Server Data Tools. However, although it sounds it is, it is NOT for SQL Server BI projects. It appears that Visual Studio 2010 can install either the add-in tools for building Integration Services (SSIS), Analysis Services (SSAS) & Reporting Services (SSRS) solutions (let’s call that suite SSDT-BIDS) or the add-in tools for building SQL Server database solutions/projects (let’s call that suite SSDT-DB), but not both. More importantly, the SSDT in Visual Studio 2010 is NOT real BI tool. For instance, if you needed to create a reporting services project, you still need Visual Studio 2008. The SQL Server Data Tools team on Visual Studio was a separate product team that just worked on the database development features. The business intelligence tools were not their priority.

When SQL Server 2012 was released, its installation has “SQL Server Data Tools” on the Feature Selection page, this option installs SSDT-BIDS as well as the Visual Studio 2010 Integrated Shell, but does not include SSDT-DB – the VS programming languages and the features that support their respective project systems. Instead of installing SSDT-DB it installs a pointer to a web install that you will see when you try to create a new database project in Visual Studio 2010 (in other words it installs a “stub project” that allows SQL Server 2012 users to acquire the latest SSDT version from the web).  So to install SSDT-DB, follow Install SQL Server Data Tools (that’s from the VS team).

If you have not installed SQL Server 2012 (or you did install it but did not select  “SQL Server Data Tools” on the Feature Selection page), and go to http://msdn.microsoft.com/en-us/data/gg427686 to download and install SSDT, it will automatically install the Visual Studio 2010 Integrated Shell, apply SP1, and install SSDT-DB, but does NOT install SSDT-BIDS.  Then you install SQL Server 2012 and choose “SQL Server Data Tools” on the Feature Selection page, now the installation will install SSDT-BIDS.  It will notice that you already installed the Visual Studio 2010 Integrated Shell and won’t install it again.

If you have Visual Studio 2010 Professional Edition or above installed before you perform an installation of SSDT through SQL server 2012 or on its own, then you must install SP1 manually before installing SSDT.  The SSDT installation will integrate all the functionalities of SSDT into your existing VS environment.

Please note that the Visual Studio 2012 version of SQL Server Data Tools, like their 2010 predecessor were just for database development projects and not business intelligence. To their credit, Microsoft’s SQL Server Data Tools team maintained the ability to download and install the tool in two flavors, Visual Studio 2010 and Visual Studio 2012. However, neither contained the BI tools.

Things have been changing…the re-union is on the way
Recently, Microsoft releases SQL Server Data Tools - Business Intelligence for Visual Studio 2012, which includes business intelligence project authoring tools and project templates for SQL Server Analysis Services, Reporting Services, and Integration Services integrated into a Visual Studio 2012 shell.

You can install SQL Server Data Tools - Business Intelligence for Visual Studio 2012 as a standalone business intelligence authoring environment or side-by-side with an existing SQL Server Data Tools installed with SQL Server 2012.

If you already have Visual Studio 2012 or SQL Server Data Tools installed, running SQL Server Setup to install SQL Server Data Tools - Business Intelligence for Visual Studio 2012 will not overwrite your existing Visual Studio 2012 installation or SQL Server Data Tools installation. Setup will install only the additional business intelligence feature tools and project templates.

In short, the confusion comes from the same term with different meanings. It appears that there is an SQL Server Version of SSDT and a Visual Studio version of SSDT. The SQL Server version focuses on BI, whereas the Visual Studio version concentrates on database projects. In other words, you cannot do database projects or VS programming in SQL Server SSDT, and you cannot do BI in the VS version of SSDT. But the picture has changed. For instance, with the release of SQL Server Data Tools - Business Intelligence for Visual Studio 2012, we can create BI projects starting from Visual Studio 2012. We no longer need to maintain multiple versions of Visual Studio to support classic development projects and business intelligence projects. 

SQL Server 2014 basically resembles SQL Server 2012. What you need is to install SQL Server Data Tools for Visual Studio 2013 and/or SQL Server Data Tools - Business Intelligence for Visual Studio 2013.

For SQL Server 2016 (up to CTP 3.3 so far), what you need is to 

Download and Install Server Data Tools Preview in Visual Studio 2015.


References:
http://www.jamesserra.com/archive/2012/04/ssdt-installation-confusion/

A Quick Overview of the Book "Training Kit (Exam 70-463) Implementing a Data Warehouse with Microsoft SQL Server 2012" - Table of Contents and Summaries

Almost useless summaries, but great contents. Recommend to read the contents only.

Part I Designing and Implementing a Data Warehouse

Chapter 1 - Data Warehouse Logical Design  3

1.     Lesson 1: Introducing Star and Snowflake Schemas . . . . . . . . . . . . . .  . . . . . . . . . . . 4

a.     Reporting Problems with a Normalized Schema  5

b.     Star Schema  7

c.      Snowflake Schema  9

d.     Granularity Level  12

e.      Auditing and Lineage  13

f.       Lesson Summary  16

·       The Star schema is the most common design for a DW.
·       The Snowflake schema is more appropriate for POC (proof-of-concept) projects.
·       You should also determine the granularity of fact tables, as well as auditing and lineage needs (i.e., auditing in DW auditing).

2.     Lesson 2: Designing Dimensions . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

a.     Dimension Column Types  17

b.     Hierarchies  19

c.      Slowly Changing Dimensions  21

d.     Lesson Summary  26

·       In a dimension, you have the following column types: keys, names, attributes, member properties, translations, and lineage.
·       Some attributes form natural hierarchies.
·       There are standard solutions for the Slowly Changing Dimensions (SCD) problem: Type I, II, III or a mixture.

3.     Lesson 3: Designing Fact Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

a.     Fact Table Column Types  28

b.     Additivity of Measures  29

c.      Additivity of Measures in SSAS  30

d.     Many-to-Many Relationships  30

e.      Lesson Summary  33

·       Fact tables include measures, foreign keys, and possibly an additional primary key and lineage columns.
·       Measures can be additive, non-additive, or semi-additive.
·       For many-to-many relationships, you can introduce an additional intermediate dimension.

Chapter 2 Implementing a Data Warehouse  41

1.     Lesson 1: Implementing Dimensions and Fact Tables . . . . . . . . . . . . . . . . . 42

a.     Creating a Data Warehouse Database  42

b.     Implementing Dimensions  45

c.      Implementing Fact Tables  47

d.     Lesson Summary  54

·       For a data warehouse database, you should use the Simple recovery model.
·       When creating a database, allocate enough space for data files and log files to prevent autogrowth of the files.
·       Use surrogate keys in dimensions in which you expect SCD Type 2 changes.
·       Use computed columns.

2.     Lesson 2: Managing the Performance of a Data Warehouse . . . . . . . . . . . 55

a.     Indexing Dimensions and Fact Tables  56

b.     Indexed Views  58

c.      Data Compression  61

d.     Columnstore Indexes and Batch Processing  62

e.      Lesson Summary  69

·       In this lesson, you learned how to optimize data warehouse query performance.
·       In a DW, you should not use many nonclustered indexes.
·       Use small, integer surrogate columns for clustered primary keys.
·       Use indexed views.
·       Use columnstore indexes and exploit batch processing.

3.     Lesson 3: Loading and Auditing Loads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

a.     Using Partitions  71

b.     Data Lineage  73

c.      Lesson Summary  78

·       Table partitioning is extremely useful for large fact tables with columnstore indexes.
·       Partition switch is a metadata operation only if an index is aligned with its base table.
·       You can add lineage information to your dimensions and fact tables to audit changes to your DW on a row level.

Part II  Developing SSIS Packages

Chapter 3 creating ssis packages  87

1.     Lesson 1: Using the SQL Server Import and Export Wizard . . . . . . . . . . . . 89

a.     Planning a Simple Data Movement  89

b.     Lesson Summary  99

·       The SQL Server Import and Export Wizard can be used for simple data movement operations.
·       The wizard allows you to create the destination database.
·       Multiple objects can be transferred in the same operation.
·       If the destination objects do not already exist, they can be created by the process.
·       The SSIS package created by the wizard can be saved and reused.

2.     Lesson 2: Developing SSIS Packages in SSDT . . . . . . . . . . . . . . . . . . . . . . . .101

a.     Introducing SSDT  102

b.     Lesson Summary  107

·       SSIS projects are developed by using SSDT, a specialized version of Visual Studio.
·       SSDT provides the complete integrated development environment (IDE) required for efficient development of SSIS packages.
·       The SSIS toolbox is context-aware and will either allow you access to control flow tasks or data flow components, depending on whether you are designing the control flow or the data flow.

3.     Lesson 3: Introducing Control Flow, Data Flow, and

a.     Connection Managers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .109

b.     Introducing SSIS Development  110

c.      Introducing SSIS Project Deployment  110

d.     Lesson Summary  124

·       Existing SSIS packages can be added to SSIS projects in SQL Server Data Tools (SSDT).
·       Control flows contain the definitions of data management operations.
·       Control flows determine the order and the conditions of execution.
·       SSIS package settings can be parameterized, which allows them to be changed without direct access to SSIS package definitions.

Chapter 4 Designing and implementing control flow  131

1.     Lesson 1: Connection Managers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .133

a.     Lesson Summary  144

·       Connection managers are used to establish connections to data sources.
·       Different data sources require different types of connection managers.
·       The usability of a connection manager within an SSIS project or an SSIS package is determined by its scope.

2.     Lesson 2: Control Flow Tasks and Containers . . . . . . . . . . . . . . . . . . . . . . .145

a.     Planning a Complex Data Movement  145

b.     Tasks  147

c.      Containers  155

d.     Lesson Summary  163

·       A rich collection of tasks supporting the most common data management operations is provided by the SSIS design model.
·       Control flow is defined by precedence constraints that determine the order and conditions of execution.
·       Tasks representing logical units of work can be grouped in containers.
·       Loop containers allow a unit of work to be executed repeatedly.

3.     Lesson 3: Precedence Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .164

a.     Lesson Summary  169

·       Precedence constraints determine the order of execution and the conditions that must be met for the process to either continue or stop.
·       Precedence constraints can even be used to allow the process to recover from failures.

Chapter 5 Designing and Implementing Data flow  177

1.     Lesson 1: Defining Data Sources and Destinations . . . . . . . . . . . . . . . . . . . 178

a.     Creating a Data Flow Task  178

b.     Defining Data Flow Source Adapters  180

c.      Defining Data Flow Destination Adapters  184

d.     SSIS Data Types  187

e.      Lesson Summary  197

·       Use appropriate data source or data destination adapters.
·       Always extract only the columns you need.
·       Use Fast Load or Batch mode when inserting data by using an ODBC or OLE DB destination adapter.
·       Use a Raw File destination if you have to temporarily store data to be used by SSIS later.

2.     Lesson 2: Working with Data Flow Transformations . . . . . . . . . . . . . . . . . .198

a.     Selecting Transformations  198

b.     Using Transformations  205

c.      Lesson Summary  215

·       Remember which transformations are non-blocking, partly-blocking, and blocking.
·       Use the Resolve References dialog box to solve mapping errors.
·       Use Derived Column transformation to add new columns or replace the value in existing ones.

3.     Lesson 3: Determining Appropriate ETL Strategy and Tools . . . . . . . . . . .216

a.     ETL Strategy  217

b.     Lookup Transformations  218

c.      Sorting the Data  224

d.     Set-Based Updates  225

e.      Lesson Summary  231

·       Use sorting on the database layer as much as possible.
·       When joining large tables, consider doing so on the database layer.
·       Insert the data that needs to be updated into a temporary table and then perform a set-based update using SQL.

Part III Enhancing SSIS Packages

Chapter 6 enhancing control flow  239

1.     Lesson 1: SSIS Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241

a.     System and User Variables  243

b.     Variable Data Types  245

c.      Variable Scope  248

d.     Property Parameterization  251

e.      Lesson Summary  253

·       You can use variables in SSIS packages to determine certain values once, and then reuse them multiple times.
·       Variable values can be assigned literally or by using expressions. SSIS implements a variety of data types to be used in SSIS variables.
·       SSIS variables can be assigned dynamically as the package is execute - either once per execution or iteratively.
·       Variable accessibility is determined by scope.
·       SSIS variables can be used to parameterize SSIS object properties.

2.     Lesson 2: Connection Managers, Tasks, and Precedence

a.     Constraint Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .254

b.     Expressions  255

c.      Property Expressions  259

d.     Precedence Constraint Expressions  259

e.      Lesson Summary  263

·       The SSIS runtime provides information about the execution environment and other system information via SSIS system variables.
·       Expressions can be used to compute the values of SSIS object properties at run time.
·       Variables and expressions can also be used to extend the elementary functionality of precedence constraints based on information available at run time that usually is not available at design time.

3.     Lesson 3: Using a Master Package for Advanced Control Flow . . . . . . . .265

a.     Separating Workloads, Purposes, and Objectives  267

b.     Harmonizing Workflow and Configuration  268

c.      The Execute Package Task  269

d.     The Execute SQL Server Agent Job Task  269

e.      The Execute Process Task  270

f.       Lesson Summary  275

·       SSIS operations can be distributed across multiple SSIS packages by using the master package concept.
·       The master package (also referred to as the parent package) can execute dependent packages (or child packages) by using the Execute Package task; this task can also be used to set child package parameters from the parent package at run time.

Chapter 7 Enhancing Data flow  283

1.     Lesson 1: Slowly Changing Dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . .284

a.     Defining Attribute Types  284

b.     Inferred Dimension Members  285

c.      Using the Slowly Changing Dimension Task  285

d.     Effectively Updating Dimensions  290

e.      Lesson Summary  298

·       Define each attribute’s SCD type in the data modeling phase of the data warehouse project.
·       Use the Slowly Changing Dimension Wizard for dimensions with a small number of rows.
·       Use an alternative solution to the Slowly Changing Dimension Wizard for larger dimensions to solve the problem of updating the Type 1 and Type 2 SCD attributes of a dimension.

2.     Lesson 2: Preparing a Package for Incremental Load . . . . . . . . . . . . . . . . .299

a.     Using Dynamic SQL to Read Data  299

b.     Implementing CDC by Using SSIS  304

c.      ETL Strategy for Incrementally Loading Fact Tables  307

d.     Lesson Summary  316

·       You can use expressions to dynamically set properties for tasks and transformations inside the data flow task.
·       Use CDC when you cannot define an exact rule for changed data and when your source systems support this functionality.
·       Use partitions for your fact tables and apply partition switching instead of deleting data.

3.     Lesson 3: Error Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317

a.     Using Error Flows  317

b.     Lesson Summary  321

·       Error flow is a very powerful method for making your data flow task robust when you are loading data into a data warehouse.
·       Use error flows to capture problematic rows and store them in a table or file for business users to inspect.

Chapter 8 creating a robust and restartable package  327

1.     Lesson 1: Package Transactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .328

a.     Defining Package and Task Transaction Settings  328

b.     Transaction Isolation Levels  331

c.      Manually Handling Transactions  332

d.     Lesson Summary  335

·       You can enlist a container or a task to be part of a transaction.
·       It is important to understand different transaction isolation levels.

2.     Lesson 2: Checkpoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .336

a.     Implementing Restartability Checkpoints  336

b.     Lesson Summary  341

·       Enable checkpoints to restart your package from the last successful step.
·       Enable checkpoints on the package level and then for each task inside the control flow.

3.     Lesson 3: Event Handlers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .342

a.     Using Event Handlers  342

b.     Lesson Summary  346

·       Event handlers enable you to have more control over the execution of a package.
·       You can use all the control flow tasks and containers when creating event handlers.
·       Use event handlers if you need to integrate the execution information of SSIS packages into a central logging database.

Chapter 9 Implementing Dynamic Packages  353

1.     Lesson 1: Package-Level and Project-Level Connection

a.     Managers and Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .354

b.     Using Project-Level Connection Managers  355

c.      Parameters  356

d.     Build Configurations in SQL Server 2012 Integration Services  358

e.      Property Expressions  361

f.       Lesson Summary  366

·       Use parameters to set up connection properties at run time.
·       Parameters and project-level connection mangers can only be used with the new project deployment model introduced with SSIS in SQL Server 2012.
·       Use property expressions to change the control flow properties at run time.

2.     Lesson 2: Package Configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .367

a.     Implementing Package Configurations  368

b.     Lesson Summary  377

·       Package configurations are available in the package deployment model.
·       Use package configurations if you are using previous versions of SSIS to set connection properties at run time.
·       Use a combination of XML and SQL Server configurations to provide additional portability for your packages.

Chapter 10 Auditing and Logging  381

1.     Lesson 1: Logging Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .383

a.     Log Providers  383

b.     Configuring Logging  386

c.      Lesson Summary  393

·       Before enabling SSIS package logging, you should determine how much information you actually need to log and choose the appropriate log providers.
·       Depending on the selected log provider, log information is written to the Windows logs, to a SQL Server Profiler trace, to a database, or to files in the file system.

2.     Lesson 2: Implementing Auditing and Lineage . . . . . . . . . . . . . . . . . . . . . .394

a.     Auditing Techniques  395

b.     Correlating Audit Data with SSIS Logs  401

c.      Retention  401

d.     Lesson Summary  405

·       SSIS auditing complements SSIS logging by providing additional information about how the data in the database was affected by the SSIS process.
·       The SSIS run time provides information about the current execution instance, which you can include in your audits.

3.     Lesson 3: Preparing Package Templates . . . . . . . . . . . . . . . . . . . . . . . . . . . .406

a.     SSIS Package Templates  407

b.     Lesson Summary  410

·       You can prepare SSIS package templates to simplify SSIS package development.
·       When you share SSIS package templates with your colleagues, your team can implement common SSIS development practices.

Part iv  Managing and Maintaining SSIS Packages

Chapter 11 Installing SSIS and Deploying Packages  421

1.     Lesson 1: Installing SSIS Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .423

a.     Preparing an SSIS Installation  424

b.     Installing SSIS  428

c.      Lesson Summary  436

·       Before attempting the installation of SSIS, you need to know whether the target environment is going to be used in development and testing or in production, whether the environment is a 32-bit or 64-bit environment, and whether SQL Server has already been installed on the target server.
·       Service accounts demand special consideration for security and manageability reasons.

2.     Lesson 2: Deploying SSIS Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .437

a.     SSISDB Catalog  438

b.     SSISDB Objects  440

c.      Project Deployment  442

d.     Lesson Summary  449

·       In SQL Server 2012, deployed SSIS projects are stored in the SSISDB catalog.
·       The SSISDB catalog is used to store SSIS project and package definitions, as well as other SSIS metadata.

Chapter 12 executing and securing packages  455

1.     Lesson 1: Executing SSIS Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .456

a.     On-Demand SSIS Execution  457

b.     Automated SSIS Execution  462

c.      Monitoring SSIS Execution  465

d.     Lesson Summary  479

·       SSIS packages can be executed manually (for instance, by using SSMS or the DTExec utility) or programmatically (for instance, by using Windows PowerShell or proprietary applications). SSIS execution can even be automated—for instance, by using SQL Server Agent jobs.
·       You are free to determine the most appropriate methods to execute SSIS processes in your environment.
·       SSIS execution monitoring is available in SSMS through standard built-in reports, but you can also implement your own custom reports if needed.

2.     Lesson 2: Securing SSIS Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .480

a.     SSISDB Security  481

b.     Lesson Summary  490

·       SSISDB catalog security is determined based on information about which SSISDB principal (SSISDB user) is allowed access to which SSISDB securables (folders, projects, and environments) and what the nature of this access is (to view, to execute, to modify, or to remove them).
·       Only members of the ssis_admin database role have full access to any of the objects stored in the SSISDB catalog.
·       Permissions on SSIS folders, projects, and environments are managed explicitly; permissions on packages, environment references, and variables, are inherited from the objects they belong to.

Chapter 13 troubleshooting and performance tuning  497

1.     Lesson 1: Troubleshooting Package Execution . . . . . . . . . . . . . . . . . . . . . .498

a.     Design-Time Troubleshooting  498

b.     Production-Time Troubleshooting  506

c.      Lesson Summary  510

·       Use breakpoints when debugging packages at design time.
·       SQL Server 2012 provides new ways to efficiently monitor and troubleshoot packages in a production environment.

2.     Lesson 2: Performance Tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .511

a.     SSIS Data Flow Engine  512

b.     Data Flow Tuning Options  514

c.      Parallel Execution in SSIS  517

d.     Troubleshooting and Benchmarking Performance  518

e.      Lesson Summary  522

·       Package design is crucial for good performance.
·       Monitor package execution with the new functionality of SSISDB catalog views.

Part V  Building Data Quality Solutions

Chapter 14 Installing and Maintaining Data Quality Services  529

1.     Lesson 1: Data Quality Problems and Roles . . . . . . . . . . . . . . . . . . . . . . . . .530

a.     Data Quality Dimensions  531

b.     Data Quality Activities and Roles  535

c.      Lesson Summary  539

·       Data quality issues can be categorized into data quality dimensions.
·       Data governance is the activity of taking care of data quality, and data stewards are people responsible for the quality of particular data.

2.     Lesson 2: Installing Data Quality Services . . . . . . . . . . . . . . . . . . . . . . . . . . .540

a.     DQS Architecture  540

b.     DQS Installation  542

c.      Lesson Summary  548

·       Data Quality Services consists of Data Quality Server and Data Quality Client.
·       There are some prerequisites for installing both components.
·       You start installation with SQL Server Setup.
·       You finish the installation with the DQSInstaller.exe application.

3.     Lesson 3: Maintaining and Securing Data Quality Services . . . . . . . . . . . .549

a.     Performing Administrative Activities with Data Quality Client  549

b.     Performing Administrative Activities with Other Tools  553

c.      Lesson Summary  558

·       DQS administration involves setting up reference data sources, configuring general and log settings, monitoring activity, managing security, and backing up and restoring DQS databases.
·       You perform all of these tasks except for security management and backup and restoration with Data Quality Client.
·       You manage security and backup and restoration of DQS databases with SQL Server Management Studio or with Transact-SQL commands.

Chapter 15 Implementing Master Data Services  565

1.     Lesson 1: Defining Master Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .566

a.     What Is Master Data?  567

b.     Master Data Management  569

c.      MDM Challenges  572

d.     Lesson Summary  574

·       Master data management is a set of coordinated processes, policies, tools, and technologies used to create and maintain accurate master data.
·       Master data management has to deal with many challenges.
·       Master data typically needs auditing, versioning, or some other kind of maintenance of history.
·       Reusage increases the value of master data.
·       If you use a data warehouse, you should map your master data to dimensions in the data warehouse.
·       SQL Server 2012 Master Data Services (MDS) is a specialized master data management application.

2.     Lesson 2: Installing Master Data Services . . . . . . . . . . . . . . . . . . . . . . . . . . .575

a.     Master Data Services Architecture  576

b.     MDS Installation  577

c.      Lesson Summary  587

·       During the pre-installation tasks, you have to prepare your system for the MDS database and the web application.
·       During the installation operations, you use SQL Server Setup to install MDS.
·       During the post-installation tasks, you configure the MDS database, the Master Data Manager web application, and the MDS add-in for Excel.

3.     Lesson 3: Creating a Master Data Services Model . . . . . . . . . . . . . . . . . . .588

a.     MDS Models and Objects in Models  588

b.     MDS Objects  589

c.      Lesson Summary  599

·       In MDS, models are containers for other objects.
·       Entities have attributes, derived and explicit hierarchies, collections, and attribute groups.
·       Attributes can be free-form, file, or domain based.
·       You can define several business rules for each attribute.

Chapter 16 Managing Master Data  605

1.     Lesson 1: Importing and Exporting Master Data . . . . . . . . . . . . . . . . . . . .606

a.     Creating and Deploying MDS Packages  606

b.     Importing Batches of Data  607

c.      Exporting Data  609

d.     Lesson Summary  615

·       You can use model deployment packages to export and import metadata, and in some cases data as well.
·       You can use the staging process for importing batches of data.
·       Applications can use the MDS web service and subscription views to get master data from an MDS instance.

2.     Lesson 2: Defining Master Data Security . . . . . . . . . . . . . . . . . . . . . . . . . . .616

a.     Users and Permissions  617

b.     Overlapping Permissions  619

c.      Lesson Summary  624

·       Keep security settings simple.
·       An end user must have at least Explorer functional area access and permissions on at least one object from a model in order to get access to the model.
·       Overlapping permissions make security more complicated.

3.     Lesson 3: Using Master Data Services Add-in for Excel . . . . . . . . . . . . . . .624

a.     Editing MDS Data in Excel  625

b.     Creating MDS Objects in Excel  627

c.      Lesson Summary  632

·       With MDS Add-in for Excel, you can edit MDS data in batches by using all Excel capabilities.
·       Advanced users can even create new entities and manage attributes from Excel.
·       MDS maintains security and validation for Excel data the same way it does for data updated with Master Data Manager.

Chapter 17 Creating a Data Quality Project to Clean Data  637

1.     Lesson 1: Creating and Maintaining a Knowledge Base . . . . . . . . . . . . . .638

a.     Building a DQS Knowledge Base  638

b.     Domain Management  639

c.      Lesson Summary  645

·       You can start building a knowledge base with knowledge discovery.
·       You can manually edit domains after the discovery process is finished.
·       You can import and export knowledge bases and domains.

2.     Lesson 2: Creating a Data Quality Project . . . . . . . . . . . . . . . . . . . . . . . . . .646

a.     DQS Projects  646

b.     Data Cleansing  647

c.      Lesson Summary  653

·       A DQS project has four stages: mapping, computer-assisted cleansing, interactive cleansing, and export.
·       You can cleanse data that comes from SQL Server or from Excel.
·       You never modify the source data directly; you can export the data with the cleansing info as the result of your cleansing project.

3.     Lesson 3: Profiling Data and Improving Data Quality . . . . . . . . . . . . . . . .654

a.     Using Queries to Profile Data  654

b.     SSIS Data Profiling Task  656

c.      Lesson Summary  659

·       In addition to DQS, you can use many other tools from the SQL Server suite to improve the quality of your data.
·       The SSIS Data Profiling task is easy to use for quick profiling of your data.

Part VI Advanced SSIS and Data Quality Topics

Chapter 18 SSIS and Data Mining  667

1.     Lesson 1: Data Mining Task and Transformation . . . . . . . . . . . . . . . . . . . . .668

a.     What Is Data Mining?  668

b.     SSAS Data Mining Algorithms  670

c.      Using Data Mining Predictions in SSIS  671

d.     Lesson Summary  679

·       You create data mining models with SSDT by using the Analysis Services Multidimensional and Data Mining Project template.
·       You can use mining model predictions in SSIS packages.

2.     Lesson 2: Text Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .679

a.     Term Extraction  680

b.     Term Lookup  681

c.      Lesson Summary  686

·       You can extract terms with the Term Extraction transformation.
·       You can count how many times a term appears in a document with the Term Lookup transformation.

3.     Lesson 3: Preparing Data for Data Mining . . . . . . . . . . . . . . . . . . . . . . . . . .687

a.     Preparing the Data  688

b.     SSIS Sampling  689

c.      Lesson Summary  693

·       You can extract terms with the Term Extraction transformation.
·       You can count how many times a term appears in a document with the Term Lookup transformation.

Chapter 19 implementing custom code in ssis packages  699

1.     Lesson 1: Script Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .700

a.     Configuring the Script Task  701

b.     Coding the Script Task  702

c.      Lesson Summary  707

·       You can extend the control flow of your packages with the script task.
·       Before writing the code, you configure the script task.

2.     Lesson 2: Script Component . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .707

a.     Configuring the Script Component  708

b.     Coding the Script Component  709

c.      Lesson Summary  715

·       For the script component, you need to configure inputs and outputs.
·       An output of an SSIS component can be synchronous or asynchronous.
·       The run-time methods of the script component are the same as the run-time methods of a custom component.

3.     Lesson 3: Implementing Custom Components . . . . . . . . . . . . . . . . . . . . . .716

a.     Planning a Custom Component  717

b.     Developing a Custom Component  718

c.      Design Time and Run Time  719

d.     Design-Time Methods  719

e.      Run-Time Methods  721

f.       Lesson Summary  730

·       If a particular problem cannot be solved by using any of the standard built-in SSIS data flow components, you can design your own custom component: a custom source, a custom destination, or a custom transformation component.
·       Before custom components can be used in SSIS development, they need to be deployed; this means that they must be copied to the workstation and registered in the workstation’s global assembly cache. Of course, to be used in deployed SSIS processes, the components must also be deployed on the target server.

Chapter 20 Identity Mapping and De-Duplicating  735

1.     Lesson 1: Understanding the Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . .736

a.     Identity Mapping and De-Duplicating Problems  736

b.     Solving the Problems  738

c.      Lesson Summary  744

·       You can use DQS and SSIS from the SQL Server suite for identity mapping and de-duplicating tasks.
·       Data merging has to be done based on similarities of strings.
·       When performing the matching, you have to avoid cross joins of huge
·       You can even use MDS functions for your own mapping procedures.

2.     Lesson 2: Using DQS and the DQS Cleansing Transformation . . . . . . . . . 745

a.     DQS Cleansing Transformation  746

b.     DQS Matching  746

c.      Lesson Summary  755

·       You can use the DQS Cleansing transformation to take advantage of DQS knowledge bases to correct data in the SSIS data flow.
·       For DQS matching, use a matching policy DQS knowledge base.
·       You can use sample data to speed up the learning process when creating a matching policy DQS knowledge base.

3.     Lesson 3: Implementing SSIS Fuzzy Transformations . . . . . . . . . . . . . . . . .756

a.     Fuzzy Transformations Algorithm  756

b.     Versions of Fuzzy Transformations 758

c.      Lesson Summary  764

·       SSIS Fuzzy Lookup and Fuzzy Grouping can handle bigger amounts of data than DQS can.

·       DQS matching is very useful for master data matching when you are using MDS Add-in for Excel to edit your master data.