1.
Classification of DMX
Statements by Task
You can use Data Mining Extensions (DMX) statements to
perform each of these tasks programmatically.
a. Creating mining structures and mining models
i.
CREATE MINING STRUCTURE
To add a new mining
structure to a database.
ii.
ALTER MINING STRUCTURE
To add mining models to the mining
structure.
iii.
CREATE MINING MODEL
To build a new mining model
and associated mining structure.
b. Processing mining structures and mining models
i.
INSERT INTO
To process a mining structure and mining
model.
c. Deleting or dropping mining structures or mining models
i.
DELETE
To remove all the trained
data from a mining model or mining structure.
ii.
DROP MINING STRUCTURE or DROP
MINING MODEL
To completely remove a
mining structure or mining model from a database.
d. Copying mining models
i.
SELECT INTO
To copy the structure of an existing
mining model into a new mining model and to train the new model with the same
data.
e. Browsing mining models
SELECT - to browse the information that
the data mining algorithm calculates and stores in the data mining model during
model training. Much like with Transact-SQL, you can use several clauses with
the SELECT statement, to extend its power. These
clauses include:
i.
SELECT
FROM <model>
The simplest prediction query; also
called an empty prediction join. Returns the most probable value for the
columns specified in the query.
ii.
SELECT DISTINCT FROM <model >
Returns all possible states for the
selected column in the model. The SELECT DISTINCT FROM statement only works
with a single column or with a set of related columns. This clause does not
work with a set of unrelated columns.
iii.
SELECT FROM <model>.CONTENT
Returns the mining model schema rowset for the specified data mining model.
iv.
SELECT FROM <model>.CASES
Supports drillthrough,
and returns the cases that were used to train the model.
v.
SELECT FROM <model>.SAMPLE_CASES
Returns sample cases that are
representative of the cases that are used to train the data mining model.
vi.
SELECT FROM <structure>.CASES
Returns the cases that were
used to create the mining structure.
vii.
SELECT FROM <model>.DIMENSION_CONTENT
A mining model can be used as a
dimension in an OLAP cube, where each node in the model is represented as a
member of the dimension. The SELECT FROM <model>.Dimension_CONTENT
statement returns the content of the model that pertains to its usage as a
dimension.
viii.
SELECT FROM <model> PREDICTION JOIN
Uses a mining model to
predict the states of columns in an external data source.
2.
Classification of DMX
Statements by DDLs and DMLs
a. DMX DDLs
i.
CREATE MINING STRUCTURE - Creates the structure
CREATE MINING STRUCTURE [New Mailing]
(
CustomerKey LONG KEY,
Gender TEXT
DISCRETE,
[Number Cars Owned] LONG DISCRETE,
[Bike Buyer] LONG
DISCRETE
)
ii.
ALTER MINING STRUCTURE - Adds a mining model to
an existing structure
ALTER MINING STRUCTURE [New Mailing]
ADD MINING MODEL [Naive Bayes Women]
(
CustomerKey,
Gender,
[Number Cars Owned],
[Bike Buyer] PREDICT
)
USING Microsoft_Naive_Bayes
WITH FILTER(Gender
= 'F' )
iii.
DROP MINING STRUCTURE - Removes a complete
structure
DROP MINING
STRUCTURE [New Mailing]
iv.
CREATE MINING MODEL - Creates a mining model and
an associated mining structure simultaneously
CREATE MINING MODEL [NBSample]
(
CustomerKey LONG KEY,
Gender TEXT
DISCRETE,
[Number Cars Owned] LONG DISCRETE,
[Bike Buyer] LONG
DISCRETE PREDICT
)
USING Microsoft_Naive_Bayes
v.
DROP MINING MODEL - Removes a model
DROP MINING
MODEL [NBSample]
vi.
EXPORT - Exports a mining model from an SSAS
database
/*Exports
the mining model to a specified file.
Note: abf is the Analysis Services Backup File
created by SSAS. Like other SQL Server backup files, it cannot be opened by
notepad, it can be consumed by XMLA or DMX*/
EXPORT MINING MODEL [TK448 Ch09 Association Rules] TO 'C:\Association_NEW.abf'
vii.
IMPORT - Loads a mining model and mining
structure from an Analysis Services Backup File (.abf)
file onto the server.
DROP MINING MODEL [TK448 Ch09 Association Rules]
DROP
MINING STRUCTURE structure [TK448 Ch09 Association]
IMPORT FROM 'C:\Association_NEW.abf'
viii.
SELECT INTO - Copies the structure of an existing mining model
into a new model and trains it with the same data。
中文意思:使用 SELECT INTO 语句将现有挖掘模型的结构复制到新模型中,并使用相同的数据为其定型 (结构和数据)。Note: This is a DDL.
SELECT * INTO
[Clustering Europe Region] -- the new model name
USING [Microsoft_Clustering]
WITH FILTER(Region='Europe') --Microsoft_Clustering is the algorithm
FROM [TK448 Ch09 Cube Clustering] -- this is the original model
b. DMX DMLs
i.
INSERT INTO - Trains a mining model and create an abstraction of
the mining model.
This statement does not insert the source data
into a data mining model; instead, you use it to create an abstraction that describes the mining model
that the algorithm creates. 中文意思:使用 INSERT INTO 语句为挖掘模型定型。 执行该语句不会将实际源数据插入数据挖掘模型对象,但会创建有关说明算法所创建的挖掘模型的摘要信息。)。Note: This is
a DML.
INSERT INTO NBSample
(
CustomerKey,
Gender,
[Number Cars Owned],
[Bike Buyer]
)
OPENQUERY([Adventure Works DW2008],
'SELECT
CustomerKey,
Gender,
[NumberCarsOwned],
[BikeBuyer]
FROM [dbo].[vTargetMail]');
ii.
UPDATE - Updates a name of a node in the
Clustering and Sequence Clustering models.
/*
Purpose:
Changes the NODE_CAPTION column in the data mining model.
Syntax:
UPDATE <model>.CONTENT
SET NODE_CAPTION='new caption'
[WHERE <condition expression>]
Example:
changes the default name, Cluster 1, for cluster 001 to the more descriptive
name, Likely Customers.
*/
UPDATE [TM Clustering].CONTENT
SET NODE_CAPTION= 'Likely
Customers'
WHERE NODE_UNIQUE_NAME = '001'
iii.
DELETE - Clears the trained content from a
mining model.
/*
Purpose:
Clears a mining model, a mining structure, or a mining structure and all its
associated mining models, depending on the Data Mining Extensions (DMX) clauses
that you use.
Syntax:
DELETE
FROM [MINING MODEL] <model>[.CONTENT]
DELETE FROM
[MINING STRUCTURE] <structure>[.CONTENT]|[.CASES]
Example:
removes all of the content from the NB_Sample model.
*/
DELETE FROM NBSample.CONTENT
iv.
SELECT
Purposes:
The SELECT statement
in Data Mining Extensions (DMX) is used for the following tasks in data mining:
· Browsing the content of an existing mining
model
· Browsing the mining structure
· Creating a copy of an existing mining model
· Creating predictions from an existing mining
model
Syntax:
This is the generic
form. It has various forms.
SELECT [FLATTENED]
[TOP <n>] <select list>
FROM <model/structure>[.aspect]
[WHERE <condition expression>]
[ORDER BY <expression>[DESC|ASC]]
Instantiations:
SELECT FROM <model>
SELECT DISTINCT FROM <model >
SELECT FROM <model>.CONTENT
SELECT FROM <model>.CASES
SELECT FROM <model>.SAMPLE_CASES
SELECT FROM <structure>.CASES
SELECT FROM
<model>.DIMENSION_CONTENT
SELECT FROM <model> PREDICTION
JOIN
1.
SELECT
FROM <model>
·
What's
for?
The simplest
prediction query; also called an empty prediction join.
Returns the most
probable value for the columns specified in the query.
Only the content from
the mining model is used to create the prediction.
The columns in the
expression list in the syntax below must be defined as predict or predict only,
or related to a predictable column.
·
Syntax:
SELECT <expression
list> [TOP <n>] FROM <model>
[WHERE <condition
list>]
[ORDER BY
<expression> [DESC|ASC]]
·
Example
1:
Performs an empty
prediction join on the Bike Buyer column, returning the most likely state in
the mining model.
-- Return 0
SELECT ([Bike Buyer]) FROM
[TK448 Ch09 Prediction Naive Bayes]
· Example 2:
Performs a prediction on the Amount column in
the Forecasting model, returning the next four time steps.
The Model Region column combines bike models
and regions into a single identifier. The query uses the PredictTimeSeries
function to perform the prediction.
-- Return:
Model Region Expression
M200 Europe -Expression
$TIME Amount
200807 264039.420693544
200808 323995.069045674
200809 346405.627525689
200810 337472.763729162
M200 North America +Expression
M200 Pacific +Expression
R250 Europe +Expression
R250 North America +Expression
R250 Pacific +Expression
R750 Europe +Expression
R750 North America +Expression
R750 Pacific +Expression
T1000 Europe +Expression
T1000 North America +Expression
T1000 Pacific +Expression
SELECT [Model Region], PredictTimeSeries(Amount, 4)
FROM [TK448 Ch09 Forecasting Time Series]
2.
SELECT
DISTINCT FROM <model>
·
What's
for?
Returns all possible
states for the selected column in the model. The values that are returned vary depending
on whether the specified column contains discrete values, discretized numeric
values, or continuous numeric values.
The SELECT DISTINCT
FROM statement only works with a single column or with a set of related
columns. This clause does not work with a set of unrelated columns.
The SELECT DISTINCT
FROM statement allows you to directly reference a column inside of a nested
table. For example: <model>.<table column reference>.<column
reference>
The results of the
SELECT DISTINCT FROM <model> statement vary, depending on the column
type.
· Syntax:
SELECT [FLATTENED]
DISTINCT [TOP <n>] <expression list> FROM <model>
[WHERE <condition
list>][ORDER BY <expression>]
· Example 1 - Discrete values
For columns that contain discrete values, the
results always include the Missing state, shown as a null value.
-- Return:
Gender
F
M
SELECT DISTINCT
[Gender]
FROM [TK448 Ch09 Prediction Decision Trees]
·
Example
2: Discretized Column Example
Returns the midpoint,
minimum age, and maximum age by buckets in the column.
/*return
Midpoint Age Minimum
Age Maximum Age
36 30 42
47 42 52
57 52 62
66 62 71
84 71 96
*/
SELECT DISTINCT [Age]
AS [Midpoint Age],
RangeMin([Age])
AS [Minimum Age],
RangeMax([Age])
AS [Maximum Age]
FROM [TK448 Ch09 Prediction Decision Trees]
3.
SELECT
FROM <model>.CONTENT
·
What's
for?
Returns the mining
model schema rowset for the specified data mining
model.
The SELECT FROM
<model>.CONTENT statement returns content that is specific to each
algorithm.
For example, you
might want to use the descriptions of all the rules of an association rules
model in a custom application. You can use a SELECT FROM <model>.CONTENT
statement to return values in the NODE_RULE column of the model.
·
Syntax:
SELECT [FLATTENED] [TOP <n>]
<expression list> FROM <model>.CONTENT
[WHERE <condition expression>]
[ORDER BY <expression> [DESC|ASC]]
·
Example
1: Returns the ID of the parent node for the decision trees model.
Result:
MODEL_NAME NODE_NAME
TK448 Ch09 Prediction
Decision Trees 0
SELECT MODEL_NAME, NODE_NAME FROM
[TK448 Ch09 Prediction Decision Trees].CONTENT
WHERE NODE_TYPE = 1
·
Example
2: Uses the IsDescendant function to return the
immediate children of the node that was returned in the previous query.
Result:
NODE_NAME NODE_TYPE NODE_CAPTION
000000001 2 All
00000000100 3 Number Cars Owned = 0
00000000101 3 Number Cars Owned = 3
00000000102 3 Number Cars Owned = 1
00000000103 3 Number Cars Owned = 4
00000000104 3 Number Cars Owned = 2
0000000010400 3 Region = 'Pacific'
0000000010401 3 Region not = 'Pacific'
000000001040100 4 Yearly Income < 39221.4065840128
............total
127 rows........................
SELECT NODE_NAME, NODE_TYPE, NODE_CAPTION
FROM [TK448 Ch09 Prediction Decision Trees].CONTENT
WHERE ISDESCENDANT('0')
·
Example
3: Returns a single node, the marginal statistics node (NODE_TYPE = 26) from a
Naïve Bayes model.
However, this node
contains a nested table, in the NODE_DISTRIBUTION column. As a result, the
nested table column is flattened and a row is returned for every row in the
nested table. The value of the scalar column MODEL_NAME
is repeated for each row in the nested table.
Results:
MODEL_NAME NODE_DISTR.ATTRIBUTE_NAME NODE_DISTR.ATTRIBUTE_VALUE NODE_DISTR.SUPPORT NODE_DISTR.PROBABILITY
xxx Naive
Bayes Age Missing 0 0
xxx Naive
Bayes Age < 42
3826 0.295695185099312
...................................66
rows.........................
*/
SELECT FLATTENED
MODEL_NAME, NODE_DISTRIBUTION
FROM [TK448 Ch09 Prediction Naive Bayes].CONTENT
WHERE NODE_TYPE = 26
· Example 4: Return only some of the columns from
the nested table by using a sub-select statement.
You can simplify the
display by aliasing the table name of the nested table, as shown.
Result:
MODEL_NAME t.ATTRIBUTE_NAME t.ATTRIBUTE_VALUE t.SUPPORT
TK448 Ch09
Prediction Naive Bayes Age Missing 0
TK448 Ch09
Prediction Naive Bayes Age < 42 3826
TK448 Ch09
Prediction Naive Bayes Age 42 - 52 4303
TK448 Ch09 Prediction
Naive Bayes Age 52 - 62 2872
TK448 Ch09
Prediction Naive Bayes Age 62 - 71 1393
TK448 Ch09
Prediction Naive Bayes Age >= 71 545
TK448 Ch09
Prediction Naive Bayes Bike Buyer Missing 0
TK448 Ch09
Prediction Naive Bayes Bike Buyer 0 6509
TK448 Ch09
Prediction Naive Bayes Bike Buyer 1 6430
.............66
rows....................
SELECT flattened
MODEL_NAME,
(SELECT ATTRIBUTE_NAME, ATTRIBUTE_VALUE, [SUPPORT]
FROM NODE_DISTRIBUTION) AS
t
FROM [TK448 Ch09 Prediction Naive Bayes].CONTENT
WHERE NODE_TYPE = 26
4.
SELECT
FROM <model>.CASES
·
What's
for?
Supports drillthrough, and returns the cases that were used to train
the model.
You can also return
structure columns that are not included in the model, if drillthrough
has been enabled on the mining structure and on the mining model, and if you
have the appropriate permissions.
If drillthrough is not enabled on the mining model, this
statement will fail.
·
Syntax:
SELECT [FLATTENED] [TOP <n>]
<expression list> FROM <model>.CASES
[WHERE <condition expression>][ORDER
BY <expression> [DESC|ASC]]
·
Example
1: Drillthrough to Model Cases and Structure Columns
The following example
returns the columns for all the cases that were used to test the [TK448 Ch09
Prediction Decision Trees] model.
If the mining
structure on which the model is built does not have a holdout test data set,
this query would return 0 cases.
You can use the
expression list to return only the columns that you need like below.
Note: You have to
enable drillthrough for the model. You can enable drillthrough for an existing model in BIDS or SSDT. Find
the model, right-click the property, make AllowDrillThrough
= true, then process the model. Then run the DMX below.
Result:
Customer Key Bike Buyer
11001 1
11005 1
11007 1
11010 1
11015 1
11016 1
11017 1
11018 1
11020 1
11023 0
...........5545 rows.............
SELECT [Customer Key], [Bike Buyer] FROM [TK448 Ch09 Prediction Decision Trees].Cases
WHERE IsTestCase();
· Example 2: Drillthrough
to Training Cases in a Specific Node
The following example
returns just those cases that were used to train Cluster 2. The node for
Cluster 2 has the value '002' for the NODE_UNIQUE_NAME column.
The example also
returns one structure column, [Customer Key], that was
not a part of the mining model, and provides the alias CustomerID
for the column.
Note that the name of
the structure column is passed as a string value and therefore must be enclosed
in quotation marks, not brackets.
Result:
CustomerID
11133
11344
11345
.....return 2169 rows..........
SELECT StructureColumn('Customer Key') AS
CustomerID --, * --disabled
for presentation
FROM [TK448 Ch09 Prediction Clustering].Cases
WHERE IsTrainingCase()
AND IsInNode('002')
5.
SELECT
FROM <model>.SAMPLE_CASES
·
What's
for?
Returns sample cases
that are representative of the cases that are used to train the data mining
model.
To use this
statement, you must enable drillthrough for the
mining model.
Sample cases may be
generated and may not actually exist in the training data.
The returned case is
representative of the specified content node.
Although the
Microsoft Sequence Clustering algorithm is the only Microsoft algorithm that
supports using SELECT FROM <model>.SAMPLE_CASES, third-party algorithms
may also support it.
·
Syntax:
SELECT [FLATTENED] [TOP <n>]
<expression list> FROM <model>.SAMPLE_CASES
[WHERE <condition list>] ORDER BY
<expression> [DESC|ASC]]
· Example:
The following example returns sample cases that
are used to train the TK448 Ch09 Sequence Clustering mining model. Using the IsInNode (DMX) function in the WHERE clause returns only
cases that are associated with the '000000003' node. The node string can be
found in the NODE_UNIQUE_NAME column of the schema rowset.
Result:
Order Number v Assoc Seq Line
Items
SO51176 -v Assoc Seq Line Items
Line
Number Model
1 Mountain Tire Tube
2 Hydration Pack
3 Sport-100
SO51177 +v Assoc Seq Line Items
SO51178 +v Assoc Seq Line Items
.....................200
Rows.......................
Select * from [TK448
Ch09 Sequence Clustering].SAMPLE_Cases
WHERE IsInNode('000000003')
6.
SELECT
FROM <structure>.CASES
·
What's
for?
Returns the cases
that were used to create the mining structure.
If drillthrough is not enabled on the structure, the statement
will fail.
Also, the statement
will fail if the user does not have drillthrough
permissions on the mining structure.
In Analysis Services,
drillthrough on new mining structures is enabled by
default.
You cannot enable or
disable drillthrough on the mining structure by using
Data Mining Extensions (DMX). But you can do it with BIDS or SSDT.
·
Syntax:
SELECT [TOP n] <expression list> FROM
<structure>.CASES
[WHERE <condition expression>][ORDER
BY <expression> [DESC|ASC]]
·
Example
1: Drill through to Structure Cases
The following example
returns a list of the 5 oldest customers in the mining structure. The query
returns all the columns in the mining model, but restricts the rows to those
who purchased a bike, and orders them by age. You can also edit the expression
list to return only the columns that you need as below.
Result:
Age Bike
Buyer Customer Key
84 1 11147
84 1 11251
84 1 11120
84 1 11257
84 1 11252
*/
SELECT TOP 5 Age,
[Bike Buyer],[Customer Key]
FROM [TK448 Ch09 Prediction].Cases
WHERE [Bike Buyer] = 1
ORDER BY Age DESC;
·
Example
2: Drillthrough to Test or Training Cases Only
The following example
returns a list of the structure cases for Targeted Mailing that are reserved for
testing. If the mining structure does not contain a holdout test set, by
default all cases are treated as training cases, and this query would return 0
cases.
Result:
Customer Key Gender Age
11001 M 47
11005 M 47
11007 M 47
11010 F 47
...............5545
Rows.................
SELECT [Customer Key], Gender, Age
FROM [TK448 Ch09 Prediction].Cases
WHERE IsTestCase()
--WHERE IsTrainingCase();
7.
SELECT
FROM <model>.DIMENSION_CONTENT
·
What's
for?
A mining model can be
used as a dimension in an OLAP cube, where each node in the model is
represented as a member of the dimension. The SELECT FROM <model>.Dimension_CONTENT statement returns the content of the
model that pertains to its usage as a dimension.
·
Syntax:
SELECT [FLATTENED] [TOP <n>]
<expression list>
FROM
<model>.Dimension_CONTENT
[WHERE <condition expression>]
[ORDER BY <expression>
[DESC|ASC]]
·
Example -
Selects all columns from the [TK448 Ch09 Prediction Decision Trees] model
content that pertain to using the model as a dimension.
Result:
Too
many columns to fit here. See the pictures below, Note: two screens are for the
same rows. There are total 127 rows in the result set.
SELECT *
FROM [TK448 Ch09 Prediction Decision Trees].Dimension_Content
8.
SELECT
FROM <model> PREDICTION JOIN
·
What's
for?
Uses a mining model
to predict the states of columns in an external data source.
The PREDICTION JOIN
statement matches each case from the source query to the model.
·
Syntax:
SELECT [FLATTENED] [TOP <n>]
<select expression list>
FROM <model> | <sub select>
[NATURAL] PREDICTION JOIN
<source
data query> [ON <join mapping list>]
[WHERE <condition expression>]
[ORDER BY <expression> [DESC|ASC]]
The ON clause defines
the mapping between the columns from the source query and the columns from the
mining model.
This mapping is used
to direct columns from the source query to columns in the mining model so that
the columns can be used as inputs to create the predictions. Columns in the
<join mapping list> are related by using an equal sign (=), as shown in
the following example:
[MiningModel].ColumnA = [source data query].Column1 AND
[MiningModel].ColumnB = [source data query].Column2 AND
·
Example
1: Singleton Query
The following example
shows how to create a query to predict whether a specific person will buy a
bicycle in real time.
In this query the
data is not stored in a table or other data source, but instead is entered
directly into the query.
The person in the
query has the following traits:
35 years old
Owns a house
Owns two cars
Has two children living at home
Using the T [TK448
Ch09 Prediction Decision Trees] and the known characteristics about the
subject, the query returns a Boolean value that describes whether the person
bought the bike and a set of tabular values, returned by the PredictHistogram (DMX) function, that describe how the
prediction was made.
Result:
Bike
Buyer Expression
0 -Expression
Bike
Buyer $SUPPORT $PROBABILITY $ADJUSTEDPROBABILITY $VARIANCE $STDEV
0 2690 0.593763788792469 0.0112902922003023 0 0
1 1840 0.40623621120753 0.00827793729343418 0 0
0 0 0 0 0
SELECT
[TK448 Ch09 Prediction Decision Trees].[Bike Buyer],
PredictHistogram([Bike
Buyer])
FROM
[TK448 Ch09 Prediction Decision Trees]
NATURAL PREDICTION JOIN
(SELECT 35 AS [Age],
'5-10 Miles'
AS [Commute Distance],
'1' AS [House Owner Flag],
2 AS [Number
Cars Owned],
2 AS [Total
Children]) AS t
·
Example
2: Using OPENQUERY
The following example
shows how to create a batch prediction query by using a list of potential customers
stored in an external dataset. Because the table is part of a data source view
that has been defined on an instance of Analysis Services, the query can use
OPENQUERY to retrieve the data.
Because the names of the
columns in the table are different from those in the mining model, the ON clause
must be used to map the columns in the table to the columns in the model.
The query returns the
first and last name of each person in the table, together with a Boolean column
that indicates whether each person is likely to buy a bike, where 0 means
"probably will not buy a bike" and 1 means "probably will buy a
bike". The last column contains the probability for the predicted result.
Result:
LastName FirstName Bike Buyer Expression
Adams Madeline 1 0.558392389
Aggarwal Edwin 1 0.634411008
Aggarwal Misty 1 0.506349356
.............. 1105
Rows.........................
SELECT
t.[LastName],
t.[FirstName],
[TK448 Ch09 Prediction Decision Trees].[Bike Buyer],
PredictProbability([Bike
Buyer])
From
[TK448 Ch09 Prediction Decision Trees]
PREDICTION JOIN
OPENQUERY([Adventure
Works DW2008R2],
'SELECT
[LastName],
[FirstName],
[MaritalStatus],
[Gender],
[YearlyIncome],
[TotalChildren],
[NumberChildrenAtHome],
[Education],
[Occupation],
[HouseOwnerFlag],
[NumberCarsOwned]
FROM
[dbo].[ProspectiveBuyer]
') AS t
ON
[TK448 Ch09 Prediction Decision Trees].[Marital Status] = t.[MaritalStatus]
AND
[TK448 Ch09 Prediction Decision Trees].[Gender] = t.[Gender] AND
[TK448 Ch09 Prediction Decision Trees].[Yearly Income] = t.[YearlyIncome]
AND
[TK448 Ch09 Prediction Decision Trees].[Total Children] = t.[TotalChildren]
AND
[TK448 Ch09 Prediction Decision Trees].[Number Children At Home] = t.[NumberChildrenAtHome]
AND
[TK448 Ch09 Prediction Decision Trees].[English Education] = t.[Education] AND
[TK448 Ch09 Prediction Decision Trees].[English Occupation] = t.[Occupation] AND
[TK448 Ch09 Prediction Decision Trees].[House Owner Flag] = t.[HouseOwnerFlag]
AND
[TK448 Ch09 Prediction Decision Trees].[Number Cars Owned] = t.[NumberCarsOwned]
--To
restrict the data set to only the customers who are predicted to buy a bike,
--and
then sort the list by customer name, you can add a WHERE clause and an ORDER BY
clause to the previous example:
WHERE [BIKE Buyer]=1
ORDER BY [LastName] ASC
·
Example
3: Predicting Associations
The following example
shows how to create a prediction by using a model that is built from the
Microsoft Association algorithm. Predictions on an association model can be
used to recommend related products.
For example, the
following query returns the three products that are most likely to be purchased
together with the following three products:
Mountain Bottle Cage
Mountain Tire Tube
Mountain-200
The Predict (DMX)
function is polymorphic and can be used with all model types. You use the value
3 as an argument to the function to limit the number of items that are returned
by the query. The SELECT list that follows the NATURAL PREDICTION JOIN clause
supplies the values to use as input for prediction.
Note: Because the
column that contains the predictable attribute, [v Assoc
Seq Line Items], is a table column, the query returns
a single column that contains a nested table. By default the nested table column
is named Expression. If your provider does not support hierarchical rowsets, you can use the FLATTENED keyword as shown in this
example to make the results easier to view.
Result:
Expression.Model
Sport-100
Water Bottle
Patch kit
SELECT FLATTENED
PREDICT([TK448
Ch09 Association Rules].[v Assoc Seq
Line Items], 3)
FROM
[TK448 Ch09 Association Rules]
NATURAL PREDICTION JOIN
(SELECT (SELECT 'Mountain Bottle Cage' AS
[Model]
UNION SELECT 'Mountain Tire
Tube' AS [Model]
UNION SELECT 'Mountain-200'
AS [Model]) AS
[v Assoc Seq Line Items]) AS t