2005
|
2008/2008R2
|
2012/2014
|
|
Business
Intelligence Transformations
-
Perform business intelligence operations
such as cleaning data, mining text, and running data mining prediction
queries.
|
|||
Slowly
Changing Dimension Transformation
Fuzzy
Grouping Transformation
Fuzzy
Lookup Transformation
Term
Extraction Transformation
Term
Lookup Transformation
Data
Mining Query Transformation
|
DQS Cleansing Transformation added in 2012
|
||
Row
Transformations
-
Update column values and create new
columns. The transformation is applied to each row in the transformation
input.
|
|||
Character Map Transformation
Copy Column Transformation
Data Conversion Transformation
Derived Column Transformation
Export Column Transformation
Import Column Transformation
Script Component
OLE DB Command Transformation
|
|||
Rowset
Transformations
-
Create new rowsets. The rowset can
include aggregate and sorted values, sample rowsets, or pivoted and unpivoted
rowsets.
|
|||
Aggregate Transformation
Sort Transformation
Percentage Sampling Transformation
Row Sampling Transformation
Pivot Transformation
Unpivot Transformation
|
|||
Split and Join
Transformations
- Distribute rows to different outputs, create copies of the transformation inputs, join multiple inputs into one output, and perform lookup operations.
|
|||
Conditional Split Transformation
Multicast Transformation
Union All Transformation
Merge Transformation
Merge Join Transformation
Lookup Transformation
|
Cache Transformation added in 2008
|
Balanced
Data Distributor Transformation added in 2014
|
|
Auditing Transformations
-
Add audit information and count rows.
|
|||
Audit Transformation
Row Count Transformation
|
|||
Custom
Transformations
-
Write custom transformations.
|
SQL Server Administration, Development and B.I. Development related
Changes of the Data Transformations in SSIS from SQL 2005 to SQL Server 2014
Changes of the Data Destinations in SSIS from SQL 2005 to SQL Server 2014
2005
|
2008/2008R2
|
2012/2014
|
ADO
NET Destination added in 2008
|
||
Trains data mining
models.
|
||
Exposes the data in a
data flow by using the ADO.NET DataReader interface.
|
||
Loads and processes
an SQL Server 2005 Analysis Services (SSAS) dimension.
|
||
Writes data to an
Excel Workbook.
|
||
Write data to a flat
file.
|
||
ODBC Destination added in 2012
|
||
Loads data using an
OLE DB provider.
|
||
Loads and processes
an Analysis Services partition.
|
||
Writes raw data to a
file
|
||
Creates an ADO
recordset.
|
||
Uses script to
extract, transform, or load data.
|
||
Inserts rows into a
SQL Server 2005 Compact Edition database.
|
||
Bulk inserts data
into a SQL Server 2005 table or view.
|
Changes of the Data Sources in SSIS from SQL 2005 to SQL Server 2014
2005
|
2008/2008R2
|
2012
|
2014
|
Consumes data from a
.NET Framework data provider.
|
Removed, replaced with
ADO
NET Source in 2008
|
||
CDC Source added
|
|||
Extracts data from an
Excel file.
|
|||
Extracts data from a
flat file.
|
|||
ODBC Source added
|
|||
Consumes data from an
OLE DB provider.
|
|||
Extracts raw data
from a file.
|
|||
Uses script to
extract, transform, or load data.
|
|||
Extracts data from an
XML file.
|
Changes of the Control Flow Tasks in SSIS from SQL 2005 to SQL Server 2014
2005
|
2008/2008R2
|
2012/2014
|
|
Data Flow Task
-
Runs data flows to extract data, apply
column level transformations, and load data.
|
|||
Data Flow
|
|||
Data Preparation Tasks
-
Copy files and directories; download
files and data; run Web methods; apply operations to XML documents; and
profile data for cleansing.
|
|||
File System
FTP
Web Service
XML
|
Data Profiling(added in 08)
|
Expression added in 2012
|
|
Workflow Tasks
-
Communicate with other processes to run
packages, run programs or batch files, send and receive messages between
packages, send e-mail messages, read Windows Management Instrumentation (WMI)
data, and watch for WMI events.
|
|||
Execute DTS 2000 Package
Execute Package
Execute Process
Message Queue
Send Mail
WMI Data Reader
WMI Event Watcher
|
For back-compatibility
|
Removed in 2012
|
|
SQL Server Tasks
-
Access, copy, insert, delete, and modify
SQL Server objects and data.
|
|||
Bulk Insert
Execute SQL
Transfer Database
Transfer Error Message
Transfer Jobs
Transfer Logins
Transfer Master Stored Procedures
Transfer SQL Server Objects
|
CDC Control added in 2012
|
||
Scripting Tasks
-
Extend package functionality by using
scripts.
|
|||
ActiveX Script
Script
|
For back-compatibility
|
Removed in 2012
|
|
Analysis Service Tasks
-
Create, modify, delete, and process
Analysis Services objects.
|
|||
Analysis Services Execute DDL
Analysis Services Processing
Data Mining
|
|||
Maintenance Tasks
-
Perform administrative functions such as
backing up and shrinking SQL Server databases, rebuilding and reorganizing
indexes, and running SQL Server Agent jobs.
|
|||
Backup Database
Check Data Integrity
Execute SQL Server Agent Jobs
Execute T-SQL Statement
History Cleanup
Notify operator
Rebuild index
Reorganize index
Shrink Database
Update Statistics
|
Maintenance
Cleanup added in 08
|
||
Custom Tasks
-
Write custom tasks using a programming
language that supports COM, such as Visual Basic, or a .NET programming
language, such as C#.
|
SQL Server BI Tools - Confusing...
The happy beginning:
· The
BI Tool in SQL Server 2005 uses the Visual Studio 2005 Shell.
· The
BI Tool in SQL Server 2008 uses the Visual Studio 2008 Shell.
· The
BI Tool in SQL Server 2008 R2 uses the Visual Studio 2008 Shell as well.
The separation starts:
SQL Server 2010 did not
get released, but Visual Studio 2010 was released, and it includes SQL Server
Data Tools. However, although it sounds it is, it is NOT for SQL Server BI
projects. It appears that Visual Studio 2010 can install either the add-in
tools for building Integration Services (SSIS), Analysis Services (SSAS) &
Reporting Services (SSRS) solutions (let’s call that suite SSDT-BIDS) or the
add-in tools for building SQL Server database solutions/projects (let’s call that
suite SSDT-DB), but not both. More importantly, the SSDT in Visual
Studio 2010 is NOT real BI tool. For instance, if you needed to create a
reporting services project, you still need Visual Studio 2008. The SQL Server
Data Tools team on Visual Studio was a separate product team that just worked
on the database development features. The business intelligence tools were not
their priority.
When SQL Server 2012 was
released, its installation has “SQL Server Data Tools” on the Feature Selection
page, this option installs SSDT-BIDS as well as the Visual Studio 2010
Integrated Shell, but does not include SSDT-DB – the VS programming
languages and the features that support their respective project systems.
Instead of installing SSDT-DB it installs a pointer to a web install that you
will see when you try to create a new database project in Visual Studio 2010
(in other words it installs a “stub project” that allows SQL Server 2012
users to acquire the latest SSDT version from the web). So to install
SSDT-DB, follow Install
SQL Server Data Tools (that’s from the VS team).
If you have not installed
SQL Server 2012 (or you did install it but did not select “SQL
Server Data Tools” on the Feature Selection page), and go to http://msdn.microsoft.com/en-us/data/gg427686 to
download and install SSDT, it will automatically install the Visual Studio 2010
Integrated Shell, apply SP1, and install SSDT-DB, but does NOT install
SSDT-BIDS. Then you install SQL Server 2012 and choose “SQL Server
Data Tools” on the Feature Selection page, now the installation will install
SSDT-BIDS. It will notice that you already installed the Visual Studio 2010
Integrated Shell and won’t install it again.
If you have Visual
Studio 2010 Professional Edition or above installed before you perform an
installation of SSDT through SQL server 2012 or on its own, then you must
install SP1 manually
before installing SSDT. The SSDT installation will integrate all the
functionalities of SSDT into your existing VS environment.
Please note that the
Visual Studio 2012 version of SQL Server Data Tools, like their 2010
predecessor were just for database development projects and not business
intelligence. To their credit, Microsoft’s SQL Server Data Tools team
maintained the ability to download and install the tool in two flavors, Visual
Studio 2010 and Visual Studio 2012. However, neither contained the BI tools.
Things have been
changing…the re-union is on the way
Recently, Microsoft
releases SQL Server Data Tools - Business Intelligence for Visual Studio
2012, which includes business intelligence project authoring tools and project
templates for SQL Server Analysis Services, Reporting Services, and Integration
Services integrated into a Visual Studio 2012 shell.
You can install SQL
Server Data Tools - Business Intelligence for Visual Studio 2012 as a
standalone business intelligence authoring environment or side-by-side with an
existing SQL Server Data Tools installed with SQL Server 2012.
If you already have
Visual Studio 2012 or SQL Server Data Tools installed, running SQL Server Setup
to install SQL Server Data Tools - Business Intelligence for Visual Studio 2012
will not overwrite your existing Visual Studio 2012 installation or SQL Server
Data Tools installation. Setup will install only the additional business
intelligence feature tools and project templates.
In short, the confusion
comes from the same term with different meanings. It appears that there is an
SQL Server Version of SSDT and a Visual Studio version of SSDT. The SQL Server
version focuses on BI, whereas the Visual Studio version concentrates on
database projects. In other words, you cannot do database projects or VS
programming in SQL Server SSDT, and you cannot do BI in the VS version of SSDT.
But the picture has changed. For instance, with the release of SQL Server
Data Tools - Business Intelligence for Visual Studio 2012, we can create
BI projects starting from Visual Studio 2012. We no longer need to maintain
multiple versions of Visual Studio to support classic development projects and
business intelligence projects.
If necessary, download Microsoft SQL Server Data Tools - Business Intelligence for Visual Studio 2012 here.
SQL Server 2014 basically
resembles SQL Server 2012. What you need is to install SQL Server Data Tools for
Visual Studio 2013 and/or SQL
Server Data Tools - Business Intelligence for Visual Studio 2013.
For SQL Server 2016 (up to CTP 3.3 so far), what you need is to
For SQL Server 2016 (up to CTP 3.3 so far), what you need is to
Download and Install Server Data Tools Preview in Visual Studio 2015.
References:
http://www.jamesserra.com/archive/2012/04/ssdt-installation-confusion/
A Quick Overview of the Book "Training Kit (Exam 70-463) Implementing a Data Warehouse with Microsoft SQL Server 2012" - Table of Contents and Summaries
Almost useless summaries, but great contents. Recommend to read the contents only.
Part I Designing and Implementing a Data Warehouse
Chapter 1 - Data Warehouse Logical Design
3
1.
Lesson 1: Introducing Star and Snowflake Schemas
. . . . . . . . . . . . . . . . . . . .
. . . . . 4
a.
Reporting Problems with a Normalized Schema 5
b.
Star Schema
7
c.
Snowflake Schema
9
d.
Granularity Level 12
e.
Auditing and Lineage 13
f.
Lesson Summary
16
2.
Lesson 2: Designing Dimensions . . . . .. . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . 17
a.
Dimension Column Types 17
b.
Hierarchies
19
c.
Slowly Changing Dimensions 21
d.
Lesson Summary
26
3.
Lesson 3: Designing Fact Tables . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . 27
a.
Fact Table Column Types 28
b.
Additivity of Measures 29
c.
Additivity of Measures in SSAS 30
d.
Many-to-Many Relationships 30
e.
Lesson Summary
33
Chapter 2 Implementing a Data Warehouse
41
1.
Lesson 1: Implementing Dimensions and Fact
Tables . . . . . . . . . . . . . . . . . 42
a.
Creating a Data Warehouse Database 42
b.
Implementing Dimensions 45
c.
Implementing Fact Tables 47
d.
Lesson Summary
54
2.
Lesson 2: Managing the Performance of a Data
Warehouse . . . . . . . . . . . 55
a.
Indexing Dimensions and Fact Tables 56
b.
Indexed Views
58
c.
Data Compression
61
d.
Columnstore Indexes and Batch Processing 62
e.
Lesson Summary
69
3.
Lesson 3: Loading and Auditing Loads . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . 70
a.
Using Partitions
71
b.
Data Lineage
73
c.
Lesson Summary
78
Part II Developing SSIS Packages
Chapter 3 creating ssis packages 87
1.
Lesson 1: Using the SQL Server Import and Export
Wizard . . . . . . . . . . . . 89
a.
Planning a Simple Data Movement 89
b.
Lesson Summary
99
2.
Lesson 2: Developing SSIS Packages in SSDT . . .
. . . . . . . . . . . . . . . . . . . . .101
a.
Introducing SSDT
102
b.
Lesson Summary
107
3.
Lesson 3: Introducing Control Flow, Data Flow,
and
a.
Connection Managers . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .109
b.
Introducing SSIS Development 110
c.
Introducing SSIS Project Deployment 110
d.
Lesson Summary
124
Chapter 4 Designing and implementing control flow 131
1.
Lesson 1: Connection Managers . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .133
a.
Lesson Summary
144
2.
Lesson 2: Control Flow Tasks and Containers . .
. . . . . . . . . . . . . . . . . . . . .145
a.
Planning a Complex Data Movement 145
b.
Tasks 147
c.
Containers
155
d.
Lesson Summary
163
3.
Lesson 3: Precedence Constraints . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .164
a.
Lesson Summary
169
Chapter 5 Designing and Implementing Data flow 177
1.
Lesson 1: Defining Data Sources and Destinations
. . . . . . . . . . . . . . . . . . . 178
a.
Creating a Data Flow Task 178
b.
Defining Data Flow Source Adapters 180
c.
Defining Data Flow Destination Adapters 184
d.
SSIS Data Types
187
e.
Lesson Summary
197
2.
Lesson 2: Working with Data Flow Transformations
. . . . . . . . . . . . . . . . . .198
a.
Selecting Transformations 198
b.
Using Transformations 205
c.
Lesson Summary
215
3.
Lesson 3: Determining Appropriate ETL Strategy
and Tools . . . . . . . . . . .216
a.
ETL Strategy
217
b.
Lookup Transformations 218
c.
Sorting the Data
224
d.
Set-Based Updates 225
e.
Lesson Summary
231
Part III Enhancing SSIS Packages
Chapter 6 enhancing control flow 239
1.
Lesson 1: SSIS Variables . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
a.
System and User Variables 243
b.
Variable Data Types 245
c.
Variable Scope
248
d.
Property Parameterization 251
e.
Lesson Summary
253
2.
Lesson 2: Connection Managers, Tasks, and
Precedence
a.
Constraint Expressions . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .254
b.
Expressions
255
c.
Property Expressions 259
d.
Precedence Constraint Expressions 259
e.
Lesson Summary
263
3.
Lesson 3: Using a Master Package for Advanced
Control Flow . . . . . . . .265
a.
Separating Workloads, Purposes, and
Objectives 267
b.
Harmonizing Workflow and Configuration 268
c.
The Execute Package Task 269
d.
The Execute SQL Server Agent Job Task 269
e.
The Execute Process Task 270
f.
Lesson Summary
275
Chapter 7 Enhancing Data flow 283
1.
Lesson 1: Slowly Changing Dimensions . . . . . .
. . . . . . . . . . . . . . . . . . . . . .284
a.
Defining Attribute Types 284
b.
Inferred Dimension Members 285
c.
Using the Slowly Changing Dimension Task 285
d.
Effectively Updating Dimensions 290
e.
Lesson Summary
298
2.
Lesson 2: Preparing a Package for Incremental
Load . . . . . . . . . . . . . . . . .299
a.
Using Dynamic SQL to Read Data 299
b.
Implementing CDC by Using SSIS 304
c.
ETL Strategy for Incrementally Loading Fact
Tables 307
d.
Lesson Summary
316
3.
Lesson 3: Error Flow . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317
a.
Using Error Flows 317
b.
Lesson Summary
321
Chapter 8 creating a robust and restartable package 327
1.
Lesson 1: Package Transactions . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .328
a.
Defining Package and Task Transaction
Settings 328
b.
Transaction Isolation Levels 331
c.
Manually Handling Transactions 332
d.
Lesson Summary
335
2.
Lesson 2: Checkpoints . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .336
a.
Implementing Restartability Checkpoints 336
b.
Lesson Summary
341
3.
Lesson 3: Event Handlers . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .342
a.
Using Event Handlers 342
b.
Lesson Summary
346
Chapter 9 Implementing Dynamic Packages
353
1.
Lesson 1: Package-Level and Project-Level
Connection
a.
Managers and Parameters . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .354
b.
Using Project-Level Connection Managers 355
c.
Parameters
356
d.
Build Configurations in SQL Server 2012
Integration Services 358
e.
Property Expressions 361
f.
Lesson Summary
366
2.
Lesson 2: Package Configurations . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .367
a.
Implementing Package Configurations 368
b.
Lesson Summary
377
Chapter 10 Auditing and Logging 381
1.
Lesson 1: Logging Packages . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .383
a.
Log Providers
383
b.
Configuring Logging 386
c.
Lesson Summary
393
2.
Lesson 2: Implementing Auditing and Lineage . .
. . . . . . . . . . . . . . . . . . . .394
a.
Auditing Techniques 395
b.
Correlating Audit Data with SSIS Logs 401
c.
Retention
401
d.
Lesson Summary
405
3.
Lesson 3: Preparing Package Templates . . . . .
. . . . . . . . . . . . . . . . . . . . . . .406
a.
SSIS Package Templates 407
b.
Lesson Summary
410
Part iv Managing and Maintaining SSIS
Packages
Chapter 11 Installing SSIS and Deploying Packages 421
1.
Lesson 1: Installing SSIS Components . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .423
a.
Preparing an SSIS Installation 424
b.
Installing SSIS
428
c.
Lesson Summary
436
2.
Lesson 2: Deploying SSIS Packages . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .437
a.
SSISDB Catalog
438
b.
SSISDB Objects
440
c.
Project Deployment 442
d.
Lesson Summary
449
Chapter 12 executing and securing packages
455
1.
Lesson 1: Executing SSIS Packages . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .456
a.
On-Demand SSIS Execution 457
b.
Automated SSIS Execution 462
c.
Monitoring SSIS Execution 465
d.
Lesson Summary
479
2.
Lesson 2: Securing SSIS Packages . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .480
a.
SSISDB Security
481
b.
Lesson Summary
490
Chapter 13 troubleshooting and performance tuning 497
1.
Lesson 1: Troubleshooting Package Execution . .
. . . . . . . . . . . . . . . . . . . .498
a.
Design-Time Troubleshooting 498
b.
Production-Time Troubleshooting 506
c.
Lesson Summary
510
2.
Lesson 2: Performance Tuning . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .511
a.
SSIS Data Flow Engine 512
b.
Data Flow Tuning Options 514
c.
Parallel Execution in SSIS 517
d.
Troubleshooting and Benchmarking
Performance 518
e.
Lesson Summary
522
Part V Building Data Quality Solutions
Chapter 14 Installing and Maintaining Data Quality Services 529
1.
Lesson 1: Data Quality Problems and Roles . . .
. . . . . . . . . . . . . . . . . . . . . .530
a.
Data Quality Dimensions 531
b.
Data Quality Activities and Roles 535
c.
Lesson Summary
539
2.
Lesson 2: Installing Data Quality Services . . .
. . . . . . . . . . . . . . . . . . . . . . . .540
a.
DQS Architecture
540
b.
DQS Installation
542
c.
Lesson Summary
548
3.
Lesson 3: Maintaining and Securing Data Quality
Services . . . . . . . . . . . .549
a.
Performing Administrative Activities with Data
Quality Client 549
b.
Performing Administrative Activities with Other
Tools 553
c.
Lesson Summary
558
Chapter 15 Implementing Master Data Services 565
1.
Lesson 1: Defining Master Data . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .566
a.
What Is Master Data? 567
b.
Master Data Management 569
c.
MDM Challenges
572
d.
Lesson Summary
574
2.
Lesson 2: Installing Master Data Services . . .
. . . . . . . . . . . . . . . . . . . . . . . .575
a.
Master Data Services Architecture 576
b.
MDS Installation
577
c.
Lesson Summary
587
3.
Lesson 3: Creating a Master Data Services Model
. . . . . . . . . . . . . . . . . . .588
a.
MDS Models and Objects in Models 588
b.
MDS Objects
589
c.
Lesson Summary
599
Chapter 16 Managing Master Data 605
1.
Lesson 1: Importing and Exporting Master Data .
. . . . . . . . . . . . . . . . . . .606
a.
Creating and Deploying MDS Packages 606
b.
Importing Batches of Data 607
c.
Exporting Data
609
d.
Lesson Summary
615
2.
Lesson 2: Defining Master Data Security . . . .
. . . . . . . . . . . . . . . . . . . . . . .616
a.
Users and Permissions 617
b.
Overlapping Permissions 619
c.
Lesson Summary
624
3.
Lesson 3: Using Master Data Services Add-in for
Excel . . . . . . . . . . . . . . .624
a.
Editing MDS Data in Excel 625
b.
Creating MDS Objects in Excel 627
c.
Lesson Summary
632
Chapter 17 Creating a Data Quality Project to Clean Data 637
1.
Lesson 1: Creating and Maintaining a Knowledge
Base . . . . . . . . . . . . . .638
a.
Building a DQS Knowledge Base 638
b.
Domain Management 639
c.
Lesson Summary
645
2.
Lesson 2: Creating a Data Quality Project . . .
. . . . . . . . . . . . . . . . . . . . . . .646
a.
DQS Projects
646
b.
Data Cleansing
647
c.
Lesson Summary
653
3.
Lesson 3: Profiling Data and Improving Data
Quality . . . . . . . . . . . . . . . .654
a.
Using Queries to Profile Data 654
b.
SSIS Data Profiling Task 656
c.
Lesson Summary
659
Part VI Advanced SSIS and Data Quality Topics
Chapter 18 SSIS and Data Mining 667
1.
Lesson 1: Data Mining Task and Transformation .
. . . . . . . . . . . . . . . . . . . .668
a.
What Is Data Mining? 668
b.
SSAS Data Mining Algorithms 670
c.
Using Data Mining Predictions in SSIS 671
d.
Lesson Summary
679
2.
Lesson 2: Text Mining . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .679
a.
Term Extraction
680
b.
Term Lookup
681
c.
Lesson Summary
686
3.
Lesson 3: Preparing Data for Data Mining . . . .
. . . . . . . . . . . . . . . . . . . . . .687
a.
Preparing the Data 688
b.
SSIS Sampling
689
c.
Lesson Summary
693
Chapter 19 implementing custom code in ssis packages 699
1.
Lesson 1: Script Task . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .700
a.
Configuring the Script Task 701
b.
Coding the Script Task 702
c.
Lesson Summary
707
2.
Lesson 2: Script Component . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .707
a.
Configuring the Script Component 708
b.
Coding the Script Component 709
c.
Lesson Summary
715
3.
Lesson 3: Implementing Custom Components . . . .
. . . . . . . . . . . . . . . . . .716
a.
Planning a Custom Component 717
b.
Developing a Custom Component 718
c.
Design Time and Run Time 719
d.
Design-Time Methods 719
e.
Run-Time Methods
721
f.
Lesson Summary
730
Chapter 20 Identity Mapping and De-Duplicating 735
1.
Lesson 1: Understanding the Problem . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .736
a.
Identity Mapping and De-Duplicating
Problems 736
b.
Solving the Problems 738
c.
Lesson Summary
744
2.
Lesson 2: Using DQS and the DQS Cleansing Transformation
. . . . . . . . . 745
a.
DQS Cleansing Transformation 746
b.
DQS Matching
746
c.
Lesson Summary
755
3.
Lesson 3: Implementing SSIS Fuzzy
Transformations . . . . . . . . . . . . . . . . .756
a.
Fuzzy Transformations Algorithm 756
b.
Versions of Fuzzy Transformations 758
c.
Lesson Summary
764
Part I Designing and Implementing a Data Warehouse
Chapter 1 - Data Warehouse Logical Design
3
1.
Lesson 1: Introducing Star and Snowflake Schemas
. . . . . . . . . . . . . . . . . . . .
. . . . . 4
a.
Reporting Problems with a Normalized Schema 5
b.
Star Schema
7
c.
Snowflake Schema
9
d.
Granularity Level 12
e.
Auditing and Lineage 13
f.
Lesson Summary
16
·
The Star schema is the most common design for a
DW.
·
The Snowflake schema is more appropriate for POC
(proof-of-concept) projects.
·
You should also determine the granularity of
fact tables, as well as auditing and lineage needs (i.e., auditing in DW auditing).
2.
Lesson 2: Designing Dimensions . . . . .. . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . 17
a.
Dimension Column Types 17
b.
Hierarchies
19
c.
Slowly Changing Dimensions 21
d.
Lesson Summary
26
·
In a dimension, you have the following column
types: keys, names, attributes, member properties, translations, and lineage.
·
Some attributes form natural hierarchies.
·
There are standard solutions for the Slowly
Changing Dimensions (SCD) problem: Type I, II, III or a mixture.
3.
Lesson 3: Designing Fact Tables . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . 27
a.
Fact Table Column Types 28
b.
Additivity of Measures 29
c.
Additivity of Measures in SSAS 30
d.
Many-to-Many Relationships 30
e.
Lesson Summary
33
·
Fact tables include measures, foreign keys, and
possibly an additional primary key and lineage columns.
·
Measures can be additive, non-additive, or
semi-additive.
·
For many-to-many relationships, you can introduce
an additional intermediate dimension.
Chapter 2 Implementing a Data Warehouse
41
1.
Lesson 1: Implementing Dimensions and Fact
Tables . . . . . . . . . . . . . . . . . 42
a.
Creating a Data Warehouse Database 42
b.
Implementing Dimensions 45
c.
Implementing Fact Tables 47
d.
Lesson Summary
54
·
For a data warehouse database, you should use
the Simple recovery model.
·
When creating a database, allocate enough space
for data files and log files to prevent autogrowth of the files.
·
Use surrogate keys in dimensions in which you
expect SCD Type 2 changes.
·
Use computed columns.
2.
Lesson 2: Managing the Performance of a Data
Warehouse . . . . . . . . . . . 55
a.
Indexing Dimensions and Fact Tables 56
b.
Indexed Views
58
c.
Data Compression
61
d.
Columnstore Indexes and Batch Processing 62
e.
Lesson Summary
69
·
In this lesson, you learned how to optimize data
warehouse query performance.
·
In a DW, you should not use many nonclustered
indexes.
·
Use small, integer surrogate columns for
clustered primary keys.
·
Use indexed views.
·
Use columnstore indexes and exploit batch
processing.
3.
Lesson 3: Loading and Auditing Loads . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . 70
a.
Using Partitions
71
b.
Data Lineage
73
c.
Lesson Summary
78
·
Table partitioning is extremely useful for large
fact tables with columnstore indexes.
·
Partition switch is a metadata operation only if
an index is aligned with its base table.
·
You can add lineage information to your
dimensions and fact tables to audit changes to your DW on a row level.
Part II Developing SSIS Packages
Chapter 3 creating ssis packages 87
1.
Lesson 1: Using the SQL Server Import and Export
Wizard . . . . . . . . . . . . 89
a.
Planning a Simple Data Movement 89
b.
Lesson Summary
99
·
The SQL Server Import and Export Wizard can be
used for simple data movement operations.
·
The wizard allows you to create the destination
database.
·
Multiple objects can be transferred in the same
operation.
·
If the destination objects do not already exist,
they can be created by the process.
·
The SSIS package created by the wizard can be
saved and reused.
2.
Lesson 2: Developing SSIS Packages in SSDT . . .
. . . . . . . . . . . . . . . . . . . . .101
a.
Introducing SSDT
102
b.
Lesson Summary
107
·
SSIS projects are developed by using SSDT, a
specialized version of Visual Studio.
·
SSDT provides the complete integrated
development environment (IDE) required for efficient development of SSIS
packages.
·
The SSIS toolbox is context-aware and will
either allow you access to control flow tasks or data flow components,
depending on whether you are designing the control flow or the data flow.
3.
Lesson 3: Introducing Control Flow, Data Flow,
and
a.
Connection Managers . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .109
b.
Introducing SSIS Development 110
c.
Introducing SSIS Project Deployment 110
d.
Lesson Summary
124
·
Existing SSIS packages can be added to SSIS
projects in SQL Server Data Tools (SSDT).
·
Control flows contain the definitions of data
management operations.
·
Control flows determine the order and the
conditions of execution.
·
SSIS package settings can be parameterized,
which allows them to be changed without direct access to SSIS package
definitions.
Chapter 4 Designing and implementing control flow 131
1.
Lesson 1: Connection Managers . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .133
a.
Lesson Summary
144
·
Connection managers are used to establish
connections to data sources.
·
Different data sources require different types
of connection managers.
·
The usability of a connection manager within an
SSIS project or an SSIS package is determined by its scope.
2.
Lesson 2: Control Flow Tasks and Containers . .
. . . . . . . . . . . . . . . . . . . . .145
a.
Planning a Complex Data Movement 145
b.
Tasks 147
c.
Containers
155
d.
Lesson Summary
163
·
A rich collection of tasks supporting the most
common data management operations is provided by the SSIS design model.
·
Control flow is defined by precedence
constraints that determine the order and conditions of execution.
·
Tasks representing logical units of work can be
grouped in containers.
·
Loop containers allow a unit of work to be
executed repeatedly.
3.
Lesson 3: Precedence Constraints . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .164
a.
Lesson Summary
169
·
Precedence constraints determine the order of
execution and the conditions that must be met for the process to either
continue or stop.
·
Precedence constraints can even be used to allow
the process to recover from failures.
Chapter 5 Designing and Implementing Data flow 177
1.
Lesson 1: Defining Data Sources and Destinations
. . . . . . . . . . . . . . . . . . . 178
a.
Creating a Data Flow Task 178
b.
Defining Data Flow Source Adapters 180
c.
Defining Data Flow Destination Adapters 184
d.
SSIS Data Types
187
e.
Lesson Summary
197
·
Use appropriate data source or data destination
adapters.
·
Always extract only the columns you need.
·
Use Fast Load or Batch mode when inserting data
by using an ODBC or OLE DB destination adapter.
·
Use a Raw File destination if you have to
temporarily store data to be used by SSIS later.
2.
Lesson 2: Working with Data Flow Transformations
. . . . . . . . . . . . . . . . . .198
a.
Selecting Transformations 198
b.
Using Transformations 205
c.
Lesson Summary
215
·
Remember which transformations are non-blocking,
partly-blocking, and blocking.
·
Use the Resolve References dialog box to solve
mapping errors.
·
Use Derived Column transformation to add new
columns or replace the value in existing ones.
3.
Lesson 3: Determining Appropriate ETL Strategy
and Tools . . . . . . . . . . .216
a.
ETL Strategy
217
b.
Lookup Transformations 218
c.
Sorting the Data
224
d.
Set-Based Updates 225
e.
Lesson Summary
231
·
Use sorting on the database layer as much as
possible.
·
When joining large tables, consider doing so on
the database layer.
·
Insert the data that needs to be updated into a
temporary table and then perform a set-based update using SQL.
Part III Enhancing SSIS Packages
Chapter 6 enhancing control flow 239
1.
Lesson 1: SSIS Variables . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
a.
System and User Variables 243
b.
Variable Data Types 245
c.
Variable Scope
248
d.
Property Parameterization 251
e.
Lesson Summary
253
·
You can use variables in SSIS packages to
determine certain values once, and then reuse them multiple times.
·
Variable values can be assigned literally or by
using expressions. SSIS implements a variety of data types to be used in SSIS
variables.
·
SSIS variables can be assigned dynamically as
the package is execute - either once per execution or iteratively.
·
Variable accessibility is determined by scope.
·
SSIS variables can be used to parameterize SSIS
object properties.
2.
Lesson 2: Connection Managers, Tasks, and
Precedence
a.
Constraint Expressions . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .254
b.
Expressions
255
c.
Property Expressions 259
d.
Precedence Constraint Expressions 259
e.
Lesson Summary
263
·
The SSIS runtime provides information about the
execution environment and other system information via SSIS system variables.
·
Expressions can be used to compute the values of
SSIS object properties at run time.
·
Variables and expressions can also be used to
extend the elementary functionality of precedence constraints based on
information available at run time that usually is not available at design time.
3.
Lesson 3: Using a Master Package for Advanced
Control Flow . . . . . . . .265
a.
Separating Workloads, Purposes, and
Objectives 267
b.
Harmonizing Workflow and Configuration 268
c.
The Execute Package Task 269
d.
The Execute SQL Server Agent Job Task 269
e.
The Execute Process Task 270
f.
Lesson Summary
275
·
SSIS operations can be distributed across
multiple SSIS packages by using the master package concept.
·
The master package (also referred to as the
parent package) can execute dependent packages (or child packages) by using the
Execute Package task; this task can also be used to set child package
parameters from the parent package at run time.
Chapter 7 Enhancing Data flow 283
1.
Lesson 1: Slowly Changing Dimensions . . . . . .
. . . . . . . . . . . . . . . . . . . . . .284
a.
Defining Attribute Types 284
b.
Inferred Dimension Members 285
c.
Using the Slowly Changing Dimension Task 285
d.
Effectively Updating Dimensions 290
e.
Lesson Summary
298
·
Define each attribute’s SCD type in the data
modeling phase of the data warehouse project.
·
Use the Slowly Changing Dimension Wizard for
dimensions with a small number of rows.
·
Use an alternative solution to the Slowly
Changing Dimension Wizard for larger dimensions to solve the problem of
updating the Type 1 and Type 2 SCD attributes of a dimension.
2.
Lesson 2: Preparing a Package for Incremental
Load . . . . . . . . . . . . . . . . .299
a.
Using Dynamic SQL to Read Data 299
b.
Implementing CDC by Using SSIS 304
c.
ETL Strategy for Incrementally Loading Fact
Tables 307
d.
Lesson Summary
316
·
You can use expressions to dynamically set
properties for tasks and transformations inside the data flow task.
·
Use CDC when you cannot define an exact rule for
changed data and when your source systems support this functionality.
·
Use partitions for your fact tables and apply
partition switching instead of deleting data.
3.
Lesson 3: Error Flow . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317
a.
Using Error Flows 317
b.
Lesson Summary
321
·
Error flow is a very powerful method for making
your data flow task robust when you are loading data into a data warehouse.
·
Use error flows to capture problematic rows and
store them in a table or file for business users to inspect.
Chapter 8 creating a robust and restartable package 327
1.
Lesson 1: Package Transactions . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .328
a.
Defining Package and Task Transaction
Settings 328
b.
Transaction Isolation Levels 331
c.
Manually Handling Transactions 332
d.
Lesson Summary
335
·
You can enlist a container or a task to be part
of a transaction.
·
It is important to understand different
transaction isolation levels.
2.
Lesson 2: Checkpoints . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .336
a.
Implementing Restartability Checkpoints 336
b.
Lesson Summary
341
·
Enable checkpoints to restart your package from
the last successful step.
·
Enable checkpoints on the package level and then
for each task inside the control flow.
3.
Lesson 3: Event Handlers . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .342
a.
Using Event Handlers 342
b.
Lesson Summary
346
·
Event handlers enable you to have more control
over the execution of a package.
·
You can use all the control flow tasks and containers
when creating event handlers.
·
Use event handlers if you need to integrate the
execution information of SSIS packages into a central logging database.
Chapter 9 Implementing Dynamic Packages
353
1.
Lesson 1: Package-Level and Project-Level
Connection
a.
Managers and Parameters . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .354
b.
Using Project-Level Connection Managers 355
c.
Parameters
356
d.
Build Configurations in SQL Server 2012
Integration Services 358
e.
Property Expressions 361
f.
Lesson Summary
366
·
Use parameters to set up connection properties
at run time.
·
Parameters and project-level connection mangers
can only be used with the new project deployment model introduced with SSIS in
SQL Server 2012.
·
Use property expressions to change the control
flow properties at run time.
2.
Lesson 2: Package Configurations . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .367
a.
Implementing Package Configurations 368
b.
Lesson Summary
377
·
Package configurations are available in the package
deployment model.
·
Use package configurations if you are using
previous versions of SSIS to set connection properties at run time.
·
Use a combination of XML and SQL Server
configurations to provide additional portability for your packages.
Chapter 10 Auditing and Logging 381
1.
Lesson 1: Logging Packages . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .383
a.
Log Providers
383
b.
Configuring Logging 386
c.
Lesson Summary
393
·
Before enabling SSIS package logging, you should
determine how much information you actually need to log and choose the
appropriate log providers.
·
Depending on the selected log provider, log
information is written to the Windows logs, to a SQL Server Profiler trace, to
a database, or to files in the file system.
2.
Lesson 2: Implementing Auditing and Lineage . .
. . . . . . . . . . . . . . . . . . . .394
a.
Auditing Techniques 395
b.
Correlating Audit Data with SSIS Logs 401
c.
Retention
401
d.
Lesson Summary
405
·
SSIS auditing complements SSIS logging by
providing additional information about how the data in the database was
affected by the SSIS process.
·
The SSIS run time provides information about the
current execution instance, which you can include in your audits.
3.
Lesson 3: Preparing Package Templates . . . . .
. . . . . . . . . . . . . . . . . . . . . . .406
a.
SSIS Package Templates 407
b.
Lesson Summary
410
·
You can prepare SSIS package templates to
simplify SSIS package development.
·
When you share SSIS package templates with your
colleagues, your team can implement common SSIS development practices.
Part iv Managing and Maintaining SSIS
Packages
Chapter 11 Installing SSIS and Deploying Packages 421
1.
Lesson 1: Installing SSIS Components . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .423
a.
Preparing an SSIS Installation 424
b.
Installing SSIS
428
c.
Lesson Summary
436
·
Before attempting the installation of SSIS, you
need to know whether the target environment is going to be used in development
and testing or in production, whether the environment is a 32-bit or 64-bit environment,
and whether SQL Server has already been installed on the target server.
·
Service accounts demand special consideration
for security and manageability reasons.
2.
Lesson 2: Deploying SSIS Packages . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .437
a.
SSISDB Catalog
438
b.
SSISDB Objects
440
c.
Project Deployment 442
d.
Lesson Summary
449
·
In SQL Server 2012, deployed SSIS projects are
stored in the SSISDB catalog.
·
The SSISDB catalog is used to store SSIS project
and package definitions, as well as other SSIS metadata.
Chapter 12 executing and securing packages
455
1.
Lesson 1: Executing SSIS Packages . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .456
a.
On-Demand SSIS Execution 457
b.
Automated SSIS Execution 462
c.
Monitoring SSIS Execution 465
d.
Lesson Summary
479
·
SSIS packages can be executed manually (for
instance, by using SSMS or the DTExec utility) or programmatically (for
instance, by using Windows PowerShell or proprietary applications). SSIS
execution can even be automated—for instance, by using SQL Server Agent jobs.
·
You are free to determine the most appropriate
methods to execute SSIS processes in your environment.
·
SSIS execution monitoring is available in SSMS
through standard built-in reports, but you can also implement your own custom
reports if needed.
2.
Lesson 2: Securing SSIS Packages . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .480
a.
SSISDB Security
481
b.
Lesson Summary
490
·
SSISDB catalog security is determined based on
information about which SSISDB principal (SSISDB user) is allowed access to
which SSISDB securables (folders, projects, and environments) and what the
nature of this access is (to view, to execute, to modify, or to remove them).
·
Only members of the ssis_admin database role
have full access to any of the objects stored in the SSISDB catalog.
·
Permissions on SSIS folders, projects, and
environments are managed explicitly; permissions on packages, environment
references, and variables, are inherited from the objects they belong to.
Chapter 13 troubleshooting and performance tuning 497
1.
Lesson 1: Troubleshooting Package Execution . .
. . . . . . . . . . . . . . . . . . . .498
a.
Design-Time Troubleshooting 498
b.
Production-Time Troubleshooting 506
c.
Lesson Summary
510
·
Use breakpoints when debugging packages at
design time.
·
SQL Server 2012 provides new ways to efficiently
monitor and troubleshoot packages in a production environment.
2.
Lesson 2: Performance Tuning . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .511
a.
SSIS Data Flow Engine 512
b.
Data Flow Tuning Options 514
c.
Parallel Execution in SSIS 517
d.
Troubleshooting and Benchmarking
Performance 518
e.
Lesson Summary
522
·
Package design is crucial for good performance.
·
Monitor package execution with the new
functionality of SSISDB catalog views.
Part V Building Data Quality Solutions
Chapter 14 Installing and Maintaining Data Quality Services 529
1.
Lesson 1: Data Quality Problems and Roles . . .
. . . . . . . . . . . . . . . . . . . . . .530
a.
Data Quality Dimensions 531
b.
Data Quality Activities and Roles 535
c.
Lesson Summary
539
·
Data quality issues can be categorized into data
quality dimensions.
·
Data governance is the activity of taking care
of data quality, and data stewards are people responsible for the quality of
particular data.
2.
Lesson 2: Installing Data Quality Services . . .
. . . . . . . . . . . . . . . . . . . . . . . .540
a.
DQS Architecture
540
b.
DQS Installation
542
c.
Lesson Summary
548
·
Data Quality Services consists of Data Quality
Server and Data Quality Client.
·
There are some prerequisites for installing both
components.
·
You start installation with SQL Server Setup.
·
You finish the installation with the
DQSInstaller.exe application.
3.
Lesson 3: Maintaining and Securing Data Quality
Services . . . . . . . . . . . .549
a.
Performing Administrative Activities with Data
Quality Client 549
b.
Performing Administrative Activities with Other
Tools 553
c.
Lesson Summary
558
·
DQS administration involves setting up reference
data sources, configuring general and log settings, monitoring activity,
managing security, and backing up and restoring DQS databases.
·
You perform all of these tasks except for
security management and backup and restoration with Data Quality Client.
·
You manage security and backup and restoration
of DQS databases with SQL Server Management Studio or with Transact-SQL
commands.
Chapter 15 Implementing Master Data Services 565
1.
Lesson 1: Defining Master Data . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .566
a.
What Is Master Data? 567
b.
Master Data Management 569
c.
MDM Challenges
572
d.
Lesson Summary
574
·
Master data management is a set of coordinated
processes, policies, tools, and technologies used to create and maintain
accurate master data.
·
Master data management has to deal with many
challenges.
·
Master data typically needs auditing,
versioning, or some other kind of maintenance of history.
·
Reusage increases the value of master data.
·
If you use a data warehouse, you should map your
master data to dimensions in the data warehouse.
·
SQL Server 2012 Master Data Services (MDS) is a
specialized master data management application.
2.
Lesson 2: Installing Master Data Services . . .
. . . . . . . . . . . . . . . . . . . . . . . .575
a.
Master Data Services Architecture 576
b.
MDS Installation
577
c.
Lesson Summary
587
·
During the pre-installation tasks, you have to
prepare your system for the MDS database and the web application.
·
During the installation operations, you use SQL
Server Setup to install MDS.
·
During the post-installation tasks, you
configure the MDS database, the Master Data Manager web application, and the
MDS add-in for Excel.
3.
Lesson 3: Creating a Master Data Services Model
. . . . . . . . . . . . . . . . . . .588
a.
MDS Models and Objects in Models 588
b.
MDS Objects
589
c.
Lesson Summary
599
·
In MDS, models are containers for other objects.
·
Entities have attributes, derived and explicit
hierarchies, collections, and attribute groups.
·
Attributes can be free-form, file, or domain
based.
·
You can define several business rules for each
attribute.
Chapter 16 Managing Master Data 605
1.
Lesson 1: Importing and Exporting Master Data .
. . . . . . . . . . . . . . . . . . .606
a.
Creating and Deploying MDS Packages 606
b.
Importing Batches of Data 607
c.
Exporting Data
609
d.
Lesson Summary
615
·
You can use model deployment packages to export
and import metadata, and in some cases data as well.
·
You can use the staging process for importing
batches of data.
·
Applications can use the MDS web service and
subscription views to get master data from an MDS instance.
2.
Lesson 2: Defining Master Data Security . . . .
. . . . . . . . . . . . . . . . . . . . . . .616
a.
Users and Permissions 617
b.
Overlapping Permissions 619
c.
Lesson Summary
624
·
Keep security settings simple.
·
An end user must have at least Explorer
functional area access and permissions on at least one object from a model in
order to get access to the model.
·
Overlapping permissions make security more
complicated.
3.
Lesson 3: Using Master Data Services Add-in for
Excel . . . . . . . . . . . . . . .624
a.
Editing MDS Data in Excel 625
b.
Creating MDS Objects in Excel 627
c.
Lesson Summary
632
·
With MDS Add-in for Excel, you can edit MDS data
in batches by using all Excel capabilities.
·
Advanced users can even create new entities and
manage attributes from Excel.
·
MDS maintains security and validation for Excel
data the same way it does for data updated with Master Data Manager.
Chapter 17 Creating a Data Quality Project to Clean Data 637
1.
Lesson 1: Creating and Maintaining a Knowledge
Base . . . . . . . . . . . . . .638
a.
Building a DQS Knowledge Base 638
b.
Domain Management 639
c.
Lesson Summary
645
·
You can start building a knowledge base with
knowledge discovery.
·
You can manually edit domains after the
discovery process is finished.
·
You can import and export knowledge bases and
domains.
2.
Lesson 2: Creating a Data Quality Project . . .
. . . . . . . . . . . . . . . . . . . . . . .646
a.
DQS Projects
646
b.
Data Cleansing
647
c.
Lesson Summary
653
·
A DQS project has four stages: mapping,
computer-assisted cleansing, interactive cleansing, and export.
·
You can cleanse data that comes from SQL Server
or from Excel.
·
You never modify the source data directly; you
can export the data with the cleansing info as the result of your cleansing
project.
3.
Lesson 3: Profiling Data and Improving Data
Quality . . . . . . . . . . . . . . . .654
a.
Using Queries to Profile Data 654
b.
SSIS Data Profiling Task 656
c.
Lesson Summary
659
·
In addition to DQS, you can use many other tools
from the SQL Server suite to improve the quality of your data.
·
The SSIS Data Profiling task is easy to use for
quick profiling of your data.
Part VI Advanced SSIS and Data Quality Topics
Chapter 18 SSIS and Data Mining 667
1.
Lesson 1: Data Mining Task and Transformation .
. . . . . . . . . . . . . . . . . . . .668
a.
What Is Data Mining? 668
b.
SSAS Data Mining Algorithms 670
c.
Using Data Mining Predictions in SSIS 671
d.
Lesson Summary
679
·
You create data mining models with SSDT by using
the Analysis Services Multidimensional and Data Mining Project template.
·
You can use mining model predictions in SSIS
packages.
2.
Lesson 2: Text Mining . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .679
a.
Term Extraction
680
b.
Term Lookup
681
c.
Lesson Summary
686
·
You can extract terms with the Term Extraction
transformation.
·
You can count how many times a term appears in a
document with the Term Lookup transformation.
3.
Lesson 3: Preparing Data for Data Mining . . . .
. . . . . . . . . . . . . . . . . . . . . .687
a.
Preparing the Data 688
b.
SSIS Sampling
689
c.
Lesson Summary
693
·
You can extract terms with the Term Extraction
transformation.
·
You can count how many times a term appears in a
document with the Term Lookup transformation.
Chapter 19 implementing custom code in ssis packages 699
1.
Lesson 1: Script Task . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .700
a.
Configuring the Script Task 701
b.
Coding the Script Task 702
c.
Lesson Summary
707
·
You can extend the control flow of your packages
with the script task.
·
Before writing the code, you configure the
script task.
2.
Lesson 2: Script Component . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .707
a.
Configuring the Script Component 708
b.
Coding the Script Component 709
c.
Lesson Summary
715
·
For the script component, you need to configure
inputs and outputs.
·
An output of an SSIS component can be
synchronous or asynchronous.
·
The run-time methods of the script component are
the same as the run-time methods of a custom component.
3.
Lesson 3: Implementing Custom Components . . . .
. . . . . . . . . . . . . . . . . .716
a.
Planning a Custom Component 717
b.
Developing a Custom Component 718
c.
Design Time and Run Time 719
d.
Design-Time Methods 719
e.
Run-Time Methods
721
f.
Lesson Summary
730
·
If a particular problem cannot be solved by
using any of the standard built-in SSIS data flow components, you can design
your own custom component: a custom source, a custom destination, or a custom
transformation component.
·
Before custom components can be used in SSIS
development, they need to be deployed; this means that they must be copied to
the workstation and registered in the workstation’s global assembly cache. Of
course, to be used in deployed SSIS processes, the components must also be
deployed on the target server.
Chapter 20 Identity Mapping and De-Duplicating 735
1.
Lesson 1: Understanding the Problem . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .736
a.
Identity Mapping and De-Duplicating
Problems 736
b.
Solving the Problems 738
c.
Lesson Summary
744
·
You can use DQS and SSIS from the SQL Server
suite for identity mapping and de-duplicating tasks.
·
Data merging has to be done based on
similarities of strings.
·
When performing the matching, you have to avoid
cross joins of huge
·
You can even use MDS functions for your own
mapping procedures.
2.
Lesson 2: Using DQS and the DQS Cleansing Transformation
. . . . . . . . . 745
a.
DQS Cleansing Transformation 746
b.
DQS Matching
746
c.
Lesson Summary
755
·
You can use the DQS Cleansing transformation to
take advantage of DQS knowledge bases to correct data in the SSIS data flow.
·
For DQS matching, use a matching policy DQS
knowledge base.
·
You can use sample data to speed up the learning
process when creating a matching policy DQS knowledge base.
3.
Lesson 3: Implementing SSIS Fuzzy
Transformations . . . . . . . . . . . . . . . . .756
a.
Fuzzy Transformations Algorithm 756
b.
Versions of Fuzzy Transformations 758
c.
Lesson Summary
764
·
SSIS Fuzzy Lookup and Fuzzy Grouping can handle
bigger amounts of data than DQS can.
·
DQS matching is very useful for master data
matching when you are using MDS Add-in for Excel to edit your master data.
Subscribe to:
Posts
(
Atom
)