Part I Designing and Implementing a Data Warehouse
Chapter 1 - Data Warehouse Logical Design
3
1.
Lesson 1: Introducing Star and Snowflake Schemas
. . . . . . . . . . . . . . . . . . . .
. . . . . 4
a.
Reporting Problems with a Normalized Schema 5
b.
Star Schema
7
c.
Snowflake Schema
9
d.
Granularity Level 12
e.
Auditing and Lineage 13
f.
Lesson Summary
16
·
The Star schema is the most common design for a
DW.
·
The Snowflake schema is more appropriate for POC
(proof-of-concept) projects.
·
You should also determine the granularity of
fact tables, as well as auditing and lineage needs (i.e., auditing in DW auditing).
2.
Lesson 2: Designing Dimensions . . . . .. . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . 17
a.
Dimension Column Types 17
b.
Hierarchies
19
c.
Slowly Changing Dimensions 21
d.
Lesson Summary
26
·
In a dimension, you have the following column
types: keys, names, attributes, member properties, translations, and lineage.
·
Some attributes form natural hierarchies.
·
There are standard solutions for the Slowly
Changing Dimensions (SCD) problem: Type I, II, III or a mixture.
3.
Lesson 3: Designing Fact Tables . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . 27
a.
Fact Table Column Types 28
b.
Additivity of Measures 29
c.
Additivity of Measures in SSAS 30
d.
Many-to-Many Relationships 30
e.
Lesson Summary
33
·
Fact tables include measures, foreign keys, and
possibly an additional primary key and lineage columns.
·
Measures can be additive, non-additive, or
semi-additive.
·
For many-to-many relationships, you can introduce
an additional intermediate dimension.
Chapter 2 Implementing a Data Warehouse
41
1.
Lesson 1: Implementing Dimensions and Fact
Tables . . . . . . . . . . . . . . . . . 42
a.
Creating a Data Warehouse Database 42
b.
Implementing Dimensions 45
c.
Implementing Fact Tables 47
d.
Lesson Summary
54
·
For a data warehouse database, you should use
the Simple recovery model.
·
When creating a database, allocate enough space
for data files and log files to prevent autogrowth of the files.
·
Use surrogate keys in dimensions in which you
expect SCD Type 2 changes.
·
Use computed columns.
2.
Lesson 2: Managing the Performance of a Data
Warehouse . . . . . . . . . . . 55
a.
Indexing Dimensions and Fact Tables 56
b.
Indexed Views
58
c.
Data Compression
61
d.
Columnstore Indexes and Batch Processing 62
e.
Lesson Summary
69
·
In this lesson, you learned how to optimize data
warehouse query performance.
·
In a DW, you should not use many nonclustered
indexes.
·
Use small, integer surrogate columns for
clustered primary keys.
·
Use indexed views.
·
Use columnstore indexes and exploit batch
processing.
3.
Lesson 3: Loading and Auditing Loads . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . 70
a.
Using Partitions
71
b.
Data Lineage
73
c.
Lesson Summary
78
·
Table partitioning is extremely useful for large
fact tables with columnstore indexes.
·
Partition switch is a metadata operation only if
an index is aligned with its base table.
·
You can add lineage information to your
dimensions and fact tables to audit changes to your DW on a row level.
Part II Developing SSIS Packages
Chapter 3 creating ssis packages 87
1.
Lesson 1: Using the SQL Server Import and Export
Wizard . . . . . . . . . . . . 89
a.
Planning a Simple Data Movement 89
b.
Lesson Summary
99
·
The SQL Server Import and Export Wizard can be
used for simple data movement operations.
·
The wizard allows you to create the destination
database.
·
Multiple objects can be transferred in the same
operation.
·
If the destination objects do not already exist,
they can be created by the process.
·
The SSIS package created by the wizard can be
saved and reused.
2.
Lesson 2: Developing SSIS Packages in SSDT . . .
. . . . . . . . . . . . . . . . . . . . .101
a.
Introducing SSDT
102
b.
Lesson Summary
107
·
SSIS projects are developed by using SSDT, a
specialized version of Visual Studio.
·
SSDT provides the complete integrated
development environment (IDE) required for efficient development of SSIS
packages.
·
The SSIS toolbox is context-aware and will
either allow you access to control flow tasks or data flow components,
depending on whether you are designing the control flow or the data flow.
3.
Lesson 3: Introducing Control Flow, Data Flow,
and
a.
Connection Managers . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .109
b.
Introducing SSIS Development 110
c.
Introducing SSIS Project Deployment 110
d.
Lesson Summary
124
·
Existing SSIS packages can be added to SSIS
projects in SQL Server Data Tools (SSDT).
·
Control flows contain the definitions of data
management operations.
·
Control flows determine the order and the
conditions of execution.
·
SSIS package settings can be parameterized,
which allows them to be changed without direct access to SSIS package
definitions.
Chapter 4 Designing and implementing control flow 131
1.
Lesson 1: Connection Managers . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .133
a.
Lesson Summary
144
·
Connection managers are used to establish
connections to data sources.
·
Different data sources require different types
of connection managers.
·
The usability of a connection manager within an
SSIS project or an SSIS package is determined by its scope.
2.
Lesson 2: Control Flow Tasks and Containers . .
. . . . . . . . . . . . . . . . . . . . .145
a.
Planning a Complex Data Movement 145
b.
Tasks 147
c.
Containers
155
d.
Lesson Summary
163
·
A rich collection of tasks supporting the most
common data management operations is provided by the SSIS design model.
·
Control flow is defined by precedence
constraints that determine the order and conditions of execution.
·
Tasks representing logical units of work can be
grouped in containers.
·
Loop containers allow a unit of work to be
executed repeatedly.
3.
Lesson 3: Precedence Constraints . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .164
a.
Lesson Summary
169
·
Precedence constraints determine the order of
execution and the conditions that must be met for the process to either
continue or stop.
·
Precedence constraints can even be used to allow
the process to recover from failures.
Chapter 5 Designing and Implementing Data flow 177
1.
Lesson 1: Defining Data Sources and Destinations
. . . . . . . . . . . . . . . . . . . 178
a.
Creating a Data Flow Task 178
b.
Defining Data Flow Source Adapters 180
c.
Defining Data Flow Destination Adapters 184
d.
SSIS Data Types
187
e.
Lesson Summary
197
·
Use appropriate data source or data destination
adapters.
·
Always extract only the columns you need.
·
Use Fast Load or Batch mode when inserting data
by using an ODBC or OLE DB destination adapter.
·
Use a Raw File destination if you have to
temporarily store data to be used by SSIS later.
2.
Lesson 2: Working with Data Flow Transformations
. . . . . . . . . . . . . . . . . .198
a.
Selecting Transformations 198
b.
Using Transformations 205
c.
Lesson Summary
215
·
Remember which transformations are non-blocking,
partly-blocking, and blocking.
·
Use the Resolve References dialog box to solve
mapping errors.
·
Use Derived Column transformation to add new
columns or replace the value in existing ones.
3.
Lesson 3: Determining Appropriate ETL Strategy
and Tools . . . . . . . . . . .216
a.
ETL Strategy
217
b.
Lookup Transformations 218
c.
Sorting the Data
224
d.
Set-Based Updates 225
e.
Lesson Summary
231
·
Use sorting on the database layer as much as
possible.
·
When joining large tables, consider doing so on
the database layer.
·
Insert the data that needs to be updated into a
temporary table and then perform a set-based update using SQL.
Part III Enhancing SSIS Packages
Chapter 6 enhancing control flow 239
1.
Lesson 1: SSIS Variables . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
a.
System and User Variables 243
b.
Variable Data Types 245
c.
Variable Scope
248
d.
Property Parameterization 251
e.
Lesson Summary
253
·
You can use variables in SSIS packages to
determine certain values once, and then reuse them multiple times.
·
Variable values can be assigned literally or by
using expressions. SSIS implements a variety of data types to be used in SSIS
variables.
·
SSIS variables can be assigned dynamically as
the package is execute - either once per execution or iteratively.
·
Variable accessibility is determined by scope.
·
SSIS variables can be used to parameterize SSIS
object properties.
2.
Lesson 2: Connection Managers, Tasks, and
Precedence
a.
Constraint Expressions . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .254
b.
Expressions
255
c.
Property Expressions 259
d.
Precedence Constraint Expressions 259
e.
Lesson Summary
263
·
The SSIS runtime provides information about the
execution environment and other system information via SSIS system variables.
·
Expressions can be used to compute the values of
SSIS object properties at run time.
·
Variables and expressions can also be used to
extend the elementary functionality of precedence constraints based on
information available at run time that usually is not available at design time.
3.
Lesson 3: Using a Master Package for Advanced
Control Flow . . . . . . . .265
a.
Separating Workloads, Purposes, and
Objectives 267
b.
Harmonizing Workflow and Configuration 268
c.
The Execute Package Task 269
d.
The Execute SQL Server Agent Job Task 269
e.
The Execute Process Task 270
f.
Lesson Summary
275
·
SSIS operations can be distributed across
multiple SSIS packages by using the master package concept.
·
The master package (also referred to as the
parent package) can execute dependent packages (or child packages) by using the
Execute Package task; this task can also be used to set child package
parameters from the parent package at run time.
Chapter 7 Enhancing Data flow 283
1.
Lesson 1: Slowly Changing Dimensions . . . . . .
. . . . . . . . . . . . . . . . . . . . . .284
a.
Defining Attribute Types 284
b.
Inferred Dimension Members 285
c.
Using the Slowly Changing Dimension Task 285
d.
Effectively Updating Dimensions 290
e.
Lesson Summary
298
·
Define each attribute’s SCD type in the data
modeling phase of the data warehouse project.
·
Use the Slowly Changing Dimension Wizard for
dimensions with a small number of rows.
·
Use an alternative solution to the Slowly
Changing Dimension Wizard for larger dimensions to solve the problem of
updating the Type 1 and Type 2 SCD attributes of a dimension.
2.
Lesson 2: Preparing a Package for Incremental
Load . . . . . . . . . . . . . . . . .299
a.
Using Dynamic SQL to Read Data 299
b.
Implementing CDC by Using SSIS 304
c.
ETL Strategy for Incrementally Loading Fact
Tables 307
d.
Lesson Summary
316
·
You can use expressions to dynamically set
properties for tasks and transformations inside the data flow task.
·
Use CDC when you cannot define an exact rule for
changed data and when your source systems support this functionality.
·
Use partitions for your fact tables and apply
partition switching instead of deleting data.
3.
Lesson 3: Error Flow . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317
a.
Using Error Flows 317
b.
Lesson Summary
321
·
Error flow is a very powerful method for making
your data flow task robust when you are loading data into a data warehouse.
·
Use error flows to capture problematic rows and
store them in a table or file for business users to inspect.
Chapter 8 creating a robust and restartable package 327
1.
Lesson 1: Package Transactions . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .328
a.
Defining Package and Task Transaction
Settings 328
b.
Transaction Isolation Levels 331
c.
Manually Handling Transactions 332
d.
Lesson Summary
335
·
You can enlist a container or a task to be part
of a transaction.
·
It is important to understand different
transaction isolation levels.
2.
Lesson 2: Checkpoints . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .336
a.
Implementing Restartability Checkpoints 336
b.
Lesson Summary
341
·
Enable checkpoints to restart your package from
the last successful step.
·
Enable checkpoints on the package level and then
for each task inside the control flow.
3.
Lesson 3: Event Handlers . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .342
a.
Using Event Handlers 342
b.
Lesson Summary
346
·
Event handlers enable you to have more control
over the execution of a package.
·
You can use all the control flow tasks and containers
when creating event handlers.
·
Use event handlers if you need to integrate the
execution information of SSIS packages into a central logging database.
Chapter 9 Implementing Dynamic Packages
353
1.
Lesson 1: Package-Level and Project-Level
Connection
a.
Managers and Parameters . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .354
b.
Using Project-Level Connection Managers 355
c.
Parameters
356
d.
Build Configurations in SQL Server 2012
Integration Services 358
e.
Property Expressions 361
f.
Lesson Summary
366
·
Use parameters to set up connection properties
at run time.
·
Parameters and project-level connection mangers
can only be used with the new project deployment model introduced with SSIS in
SQL Server 2012.
·
Use property expressions to change the control
flow properties at run time.
2.
Lesson 2: Package Configurations . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .367
a.
Implementing Package Configurations 368
b.
Lesson Summary
377
·
Package configurations are available in the package
deployment model.
·
Use package configurations if you are using
previous versions of SSIS to set connection properties at run time.
·
Use a combination of XML and SQL Server
configurations to provide additional portability for your packages.
Chapter 10 Auditing and Logging 381
1.
Lesson 1: Logging Packages . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .383
a.
Log Providers
383
b.
Configuring Logging 386
c.
Lesson Summary
393
·
Before enabling SSIS package logging, you should
determine how much information you actually need to log and choose the
appropriate log providers.
·
Depending on the selected log provider, log
information is written to the Windows logs, to a SQL Server Profiler trace, to
a database, or to files in the file system.
2.
Lesson 2: Implementing Auditing and Lineage . .
. . . . . . . . . . . . . . . . . . . .394
a.
Auditing Techniques 395
b.
Correlating Audit Data with SSIS Logs 401
c.
Retention
401
d.
Lesson Summary
405
·
SSIS auditing complements SSIS logging by
providing additional information about how the data in the database was
affected by the SSIS process.
·
The SSIS run time provides information about the
current execution instance, which you can include in your audits.
3.
Lesson 3: Preparing Package Templates . . . . .
. . . . . . . . . . . . . . . . . . . . . . .406
a.
SSIS Package Templates 407
b.
Lesson Summary
410
·
You can prepare SSIS package templates to
simplify SSIS package development.
·
When you share SSIS package templates with your
colleagues, your team can implement common SSIS development practices.
Part iv Managing and Maintaining SSIS
Packages
Chapter 11 Installing SSIS and Deploying Packages 421
1.
Lesson 1: Installing SSIS Components . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .423
a.
Preparing an SSIS Installation 424
b.
Installing SSIS
428
c.
Lesson Summary
436
·
Before attempting the installation of SSIS, you
need to know whether the target environment is going to be used in development
and testing or in production, whether the environment is a 32-bit or 64-bit environment,
and whether SQL Server has already been installed on the target server.
·
Service accounts demand special consideration
for security and manageability reasons.
2.
Lesson 2: Deploying SSIS Packages . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .437
a.
SSISDB Catalog
438
b.
SSISDB Objects
440
c.
Project Deployment 442
d.
Lesson Summary
449
·
In SQL Server 2012, deployed SSIS projects are
stored in the SSISDB catalog.
·
The SSISDB catalog is used to store SSIS project
and package definitions, as well as other SSIS metadata.
Chapter 12 executing and securing packages
455
1.
Lesson 1: Executing SSIS Packages . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .456
a.
On-Demand SSIS Execution 457
b.
Automated SSIS Execution 462
c.
Monitoring SSIS Execution 465
d.
Lesson Summary
479
·
SSIS packages can be executed manually (for
instance, by using SSMS or the DTExec utility) or programmatically (for
instance, by using Windows PowerShell or proprietary applications). SSIS
execution can even be automated—for instance, by using SQL Server Agent jobs.
·
You are free to determine the most appropriate
methods to execute SSIS processes in your environment.
·
SSIS execution monitoring is available in SSMS
through standard built-in reports, but you can also implement your own custom
reports if needed.
2.
Lesson 2: Securing SSIS Packages . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .480
a.
SSISDB Security
481
b.
Lesson Summary
490
·
SSISDB catalog security is determined based on
information about which SSISDB principal (SSISDB user) is allowed access to
which SSISDB securables (folders, projects, and environments) and what the
nature of this access is (to view, to execute, to modify, or to remove them).
·
Only members of the ssis_admin database role
have full access to any of the objects stored in the SSISDB catalog.
·
Permissions on SSIS folders, projects, and
environments are managed explicitly; permissions on packages, environment
references, and variables, are inherited from the objects they belong to.
Chapter 13 troubleshooting and performance tuning 497
1.
Lesson 1: Troubleshooting Package Execution . .
. . . . . . . . . . . . . . . . . . . .498
a.
Design-Time Troubleshooting 498
b.
Production-Time Troubleshooting 506
c.
Lesson Summary
510
·
Use breakpoints when debugging packages at
design time.
·
SQL Server 2012 provides new ways to efficiently
monitor and troubleshoot packages in a production environment.
2.
Lesson 2: Performance Tuning . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .511
a.
SSIS Data Flow Engine 512
b.
Data Flow Tuning Options 514
c.
Parallel Execution in SSIS 517
d.
Troubleshooting and Benchmarking
Performance 518
e.
Lesson Summary
522
·
Package design is crucial for good performance.
·
Monitor package execution with the new
functionality of SSISDB catalog views.
Part V Building Data Quality Solutions
Chapter 14 Installing and Maintaining Data Quality Services 529
1.
Lesson 1: Data Quality Problems and Roles . . .
. . . . . . . . . . . . . . . . . . . . . .530
a.
Data Quality Dimensions 531
b.
Data Quality Activities and Roles 535
c.
Lesson Summary
539
·
Data quality issues can be categorized into data
quality dimensions.
·
Data governance is the activity of taking care
of data quality, and data stewards are people responsible for the quality of
particular data.
2.
Lesson 2: Installing Data Quality Services . . .
. . . . . . . . . . . . . . . . . . . . . . . .540
a.
DQS Architecture
540
b.
DQS Installation
542
c.
Lesson Summary
548
·
Data Quality Services consists of Data Quality
Server and Data Quality Client.
·
There are some prerequisites for installing both
components.
·
You start installation with SQL Server Setup.
·
You finish the installation with the
DQSInstaller.exe application.
3.
Lesson 3: Maintaining and Securing Data Quality
Services . . . . . . . . . . . .549
a.
Performing Administrative Activities with Data
Quality Client 549
b.
Performing Administrative Activities with Other
Tools 553
c.
Lesson Summary
558
·
DQS administration involves setting up reference
data sources, configuring general and log settings, monitoring activity,
managing security, and backing up and restoring DQS databases.
·
You perform all of these tasks except for
security management and backup and restoration with Data Quality Client.
·
You manage security and backup and restoration
of DQS databases with SQL Server Management Studio or with Transact-SQL
commands.
Chapter 15 Implementing Master Data Services 565
1.
Lesson 1: Defining Master Data . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .566
a.
What Is Master Data? 567
b.
Master Data Management 569
c.
MDM Challenges
572
d.
Lesson Summary
574
·
Master data management is a set of coordinated
processes, policies, tools, and technologies used to create and maintain
accurate master data.
·
Master data management has to deal with many
challenges.
·
Master data typically needs auditing,
versioning, or some other kind of maintenance of history.
·
Reusage increases the value of master data.
·
If you use a data warehouse, you should map your
master data to dimensions in the data warehouse.
·
SQL Server 2012 Master Data Services (MDS) is a
specialized master data management application.
2.
Lesson 2: Installing Master Data Services . . .
. . . . . . . . . . . . . . . . . . . . . . . .575
a.
Master Data Services Architecture 576
b.
MDS Installation
577
c.
Lesson Summary
587
·
During the pre-installation tasks, you have to
prepare your system for the MDS database and the web application.
·
During the installation operations, you use SQL
Server Setup to install MDS.
·
During the post-installation tasks, you
configure the MDS database, the Master Data Manager web application, and the
MDS add-in for Excel.
3.
Lesson 3: Creating a Master Data Services Model
. . . . . . . . . . . . . . . . . . .588
a.
MDS Models and Objects in Models 588
b.
MDS Objects
589
c.
Lesson Summary
599
·
In MDS, models are containers for other objects.
·
Entities have attributes, derived and explicit
hierarchies, collections, and attribute groups.
·
Attributes can be free-form, file, or domain
based.
·
You can define several business rules for each
attribute.
Chapter 16 Managing Master Data 605
1.
Lesson 1: Importing and Exporting Master Data .
. . . . . . . . . . . . . . . . . . .606
a.
Creating and Deploying MDS Packages 606
b.
Importing Batches of Data 607
c.
Exporting Data
609
d.
Lesson Summary
615
·
You can use model deployment packages to export
and import metadata, and in some cases data as well.
·
You can use the staging process for importing
batches of data.
·
Applications can use the MDS web service and
subscription views to get master data from an MDS instance.
2.
Lesson 2: Defining Master Data Security . . . .
. . . . . . . . . . . . . . . . . . . . . . .616
a.
Users and Permissions 617
b.
Overlapping Permissions 619
c.
Lesson Summary
624
·
Keep security settings simple.
·
An end user must have at least Explorer
functional area access and permissions on at least one object from a model in
order to get access to the model.
·
Overlapping permissions make security more
complicated.
3.
Lesson 3: Using Master Data Services Add-in for
Excel . . . . . . . . . . . . . . .624
a.
Editing MDS Data in Excel 625
b.
Creating MDS Objects in Excel 627
c.
Lesson Summary
632
·
With MDS Add-in for Excel, you can edit MDS data
in batches by using all Excel capabilities.
·
Advanced users can even create new entities and
manage attributes from Excel.
·
MDS maintains security and validation for Excel
data the same way it does for data updated with Master Data Manager.
Chapter 17 Creating a Data Quality Project to Clean Data 637
1.
Lesson 1: Creating and Maintaining a Knowledge
Base . . . . . . . . . . . . . .638
a.
Building a DQS Knowledge Base 638
b.
Domain Management 639
c.
Lesson Summary
645
·
You can start building a knowledge base with
knowledge discovery.
·
You can manually edit domains after the
discovery process is finished.
·
You can import and export knowledge bases and
domains.
2.
Lesson 2: Creating a Data Quality Project . . .
. . . . . . . . . . . . . . . . . . . . . . .646
a.
DQS Projects
646
b.
Data Cleansing
647
c.
Lesson Summary
653
·
A DQS project has four stages: mapping,
computer-assisted cleansing, interactive cleansing, and export.
·
You can cleanse data that comes from SQL Server
or from Excel.
·
You never modify the source data directly; you
can export the data with the cleansing info as the result of your cleansing
project.
3.
Lesson 3: Profiling Data and Improving Data
Quality . . . . . . . . . . . . . . . .654
a.
Using Queries to Profile Data 654
b.
SSIS Data Profiling Task 656
c.
Lesson Summary
659
·
In addition to DQS, you can use many other tools
from the SQL Server suite to improve the quality of your data.
·
The SSIS Data Profiling task is easy to use for
quick profiling of your data.
Part VI Advanced SSIS and Data Quality Topics
Chapter 18 SSIS and Data Mining 667
1.
Lesson 1: Data Mining Task and Transformation .
. . . . . . . . . . . . . . . . . . . .668
a.
What Is Data Mining? 668
b.
SSAS Data Mining Algorithms 670
c.
Using Data Mining Predictions in SSIS 671
d.
Lesson Summary
679
·
You create data mining models with SSDT by using
the Analysis Services Multidimensional and Data Mining Project template.
·
You can use mining model predictions in SSIS
packages.
2.
Lesson 2: Text Mining . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .679
a.
Term Extraction
680
b.
Term Lookup
681
c.
Lesson Summary
686
·
You can extract terms with the Term Extraction
transformation.
·
You can count how many times a term appears in a
document with the Term Lookup transformation.
3.
Lesson 3: Preparing Data for Data Mining . . . .
. . . . . . . . . . . . . . . . . . . . . .687
a.
Preparing the Data 688
b.
SSIS Sampling
689
c.
Lesson Summary
693
·
You can extract terms with the Term Extraction
transformation.
·
You can count how many times a term appears in a
document with the Term Lookup transformation.
Chapter 19 implementing custom code in ssis packages 699
1.
Lesson 1: Script Task . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .700
a.
Configuring the Script Task 701
b.
Coding the Script Task 702
c.
Lesson Summary
707
·
You can extend the control flow of your packages
with the script task.
·
Before writing the code, you configure the
script task.
2.
Lesson 2: Script Component . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .707
a.
Configuring the Script Component 708
b.
Coding the Script Component 709
c.
Lesson Summary
715
·
For the script component, you need to configure
inputs and outputs.
·
An output of an SSIS component can be
synchronous or asynchronous.
·
The run-time methods of the script component are
the same as the run-time methods of a custom component.
3.
Lesson 3: Implementing Custom Components . . . .
. . . . . . . . . . . . . . . . . .716
a.
Planning a Custom Component 717
b.
Developing a Custom Component 718
c.
Design Time and Run Time 719
d.
Design-Time Methods 719
e.
Run-Time Methods
721
f.
Lesson Summary
730
·
If a particular problem cannot be solved by
using any of the standard built-in SSIS data flow components, you can design
your own custom component: a custom source, a custom destination, or a custom
transformation component.
·
Before custom components can be used in SSIS
development, they need to be deployed; this means that they must be copied to
the workstation and registered in the workstation’s global assembly cache. Of
course, to be used in deployed SSIS processes, the components must also be
deployed on the target server.
Chapter 20 Identity Mapping and De-Duplicating 735
1.
Lesson 1: Understanding the Problem . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .736
a.
Identity Mapping and De-Duplicating
Problems 736
b.
Solving the Problems 738
c.
Lesson Summary
744
·
You can use DQS and SSIS from the SQL Server
suite for identity mapping and de-duplicating tasks.
·
Data merging has to be done based on
similarities of strings.
·
When performing the matching, you have to avoid
cross joins of huge
·
You can even use MDS functions for your own
mapping procedures.
2.
Lesson 2: Using DQS and the DQS Cleansing Transformation
. . . . . . . . . 745
a.
DQS Cleansing Transformation 746
b.
DQS Matching
746
c.
Lesson Summary
755
·
You can use the DQS Cleansing transformation to
take advantage of DQS knowledge bases to correct data in the SSIS data flow.
·
For DQS matching, use a matching policy DQS
knowledge base.
·
You can use sample data to speed up the learning
process when creating a matching policy DQS knowledge base.
3.
Lesson 3: Implementing SSIS Fuzzy
Transformations . . . . . . . . . . . . . . . . .756
a.
Fuzzy Transformations Algorithm 756
b.
Versions of Fuzzy Transformations 758
c.
Lesson Summary
764
·
SSIS Fuzzy Lookup and Fuzzy Grouping can handle
bigger amounts of data than DQS can.
·
DQS matching is very useful for master data
matching when you are using MDS Add-in for Excel to edit your master data.