A Summary of DMX Statements with Examples

1.              Classification of DMX Statements by Task

You can use Data Mining Extensions (DMX) statements to perform each of these tasks programmatically.

a.     Creating mining structures and mining models

i.                CREATE MINING STRUCTURE

To add a new mining structure to a database.

ii.              ALTER MINING STRUCTURE

To add mining models to the mining structure.

iii.            CREATE MINING MODEL

To build a new mining model and associated mining structure.

b.    Processing mining structures and mining models

i.                INSERT INTO

To process a mining structure and mining model.

c.     Deleting or dropping mining structures or mining models

i.                DELETE

To remove all the trained data from a mining model or mining structure.

ii.              DROP MINING STRUCTURE or DROP MINING MODEL

To completely remove a mining structure or mining model from a database.

d.    Copying mining models

i.                 SELECT INTO

To copy the structure of an existing mining model into a new mining model and to train the new model with the same data.

e.     Browsing mining models

SELECT - to browse the information that the data mining algorithm calculates and stores in the data mining model during model training. Much like with Transact-SQL, you can use several clauses with the SELECT statement, to extend its power. These clauses include:

i.                SELECT FROM <model>

The simplest prediction query; also called an empty prediction join. Returns the most probable value for the columns specified in the query.

ii.              SELECT DISTINCT FROM <model >

Returns all possible states for the selected column in the model. The SELECT DISTINCT FROM statement only works with a single column or with a set of related columns. This clause does not work with a set of unrelated columns.

iii.            SELECT FROM <model>.CONTENT

Returns the mining model schema rowset for the specified data mining model.

iv.            SELECT FROM <model>.CASES

Supports drillthrough, and returns the cases that were used to train the model.

v.              SELECT FROM <model>.SAMPLE_CASES

Returns sample cases that are representative of the cases that are used to train the data mining model.

vi.            SELECT FROM <structure>.CASES

Returns the cases that were used to create the mining structure.

vii.          SELECT FROM <model>.DIMENSION_CONTENT

A mining model can be used as a dimension in an OLAP cube, where each node in the model is represented as a member of the dimension. The SELECT FROM <model>.Dimension_CONTENT statement returns the content of the model that pertains to its usage as a dimension.

viii.        SELECT FROM <model> PREDICTION JOIN

Uses a mining model to predict the states of columns in an external data source.

2.    Classification of DMX Statements by DDLs and DMLs

a.     DMX DDLs

i.                CREATE MINING STRUCTURE - Creates the structure

CREATE MINING STRUCTURE [New Mailing]

(

    CustomerKey LONG KEY,

    Gender TEXT DISCRETE,

    [Number Cars Owned] LONG DISCRETE,

    [Bike Buyer] LONG DISCRETE

)

ii.              ALTER MINING STRUCTURE - Adds a mining model to an existing structure

ALTER MINING STRUCTURE [New Mailing]

ADD MINING MODEL [Naive Bayes Women]

(

    CustomerKey,

    Gender,

    [Number Cars Owned],

    [Bike Buyer] PREDICT

)

USING Microsoft_Naive_Bayes

WITH FILTER(Gender = 'F' )

iii.            DROP MINING STRUCTURE - Removes a complete structure

DROP MINING STRUCTURE [New Mailing]

iv.            CREATE MINING MODEL - Creates a mining model and an associated mining structure simultaneously

CREATE MINING MODEL [NBSample]

(

    CustomerKey LONG KEY,

    Gender TEXT DISCRETE,

    [Number Cars Owned] LONG DISCRETE,

    [Bike Buyer] LONG DISCRETE PREDICT

)

USING Microsoft_Naive_Bayes

v.              DROP MINING MODEL - Removes a model

DROP MINING MODEL [NBSample]

vi.            EXPORT - Exports a mining model from an SSAS database

/*Exports the mining model to a specified file.

Note: abf is the Analysis Services Backup File created by SSAS. Like other SQL Server backup files, it cannot be opened by notepad, it can be consumed by XMLA or DMX*/

EXPORT MINING MODEL [TK448 Ch09 Association Rules] TO 'C:\Association_NEW.abf'

vii.          IMPORT - Loads a mining model and mining structure from an Analysis Services Backup File (.abf) file onto the server.

DROP MINING MODEL  [TK448 Ch09 Association Rules]

DROP MINING STRUCTURE structure [TK448 Ch09 Association]

IMPORT FROM 'C:\Association_NEW.abf'

viii.        SELECT INTO - Copies the structure of an existing mining model into a new model and trains it with the same data

中文意思:使用 SELECT INTO 语句将现有挖掘模型的结构复制到新模型中,并使用相同的数据为其定型 (结构和数据)Note: This is a DDL.

SELECT * INTO [Clustering Europe Region] -- the new model name

USING [Microsoft_Clustering] WITH FILTER(Region='Europe') --Microsoft_Clustering is the algorithm

FROM [TK448 Ch09 Cube Clustering] -- this is the original model

b.    DMX DMLs

i.                INSERT INTO - Trains a mining model and create an abstraction of the mining model.

This statement does not insert the source data into a data mining model; instead, you use it to create an abstraction that describes the mining model that the algorithm creates. 中文意思:使用 INSERT INTO 语句为挖掘模型定型。 执行该语句不会将实际源数据插入数据挖掘模型对象,但会创建有关说明算法所创建的挖掘模型的摘要信息。)Note: This is a DML.

INSERT INTO NBSample

(

CustomerKey,

Gender,

[Number Cars Owned],

[Bike Buyer]

)

OPENQUERY([Adventure Works DW2008],

'SELECT CustomerKey,

Gender,

[NumberCarsOwned],

[BikeBuyer]

FROM [dbo].[vTargetMail]');

ii.              UPDATE - Updates a name of a node in the Clustering and Sequence Clustering models.

/*

Purpose: Changes the NODE_CAPTION column in the data mining model.

Syntax:

         UPDATE <model>.CONTENT

         SET NODE_CAPTION='new caption'

         [WHERE <condition expression>]

Example: changes the default name, Cluster 1, for cluster 001 to the more descriptive name, Likely Customers.

*/

UPDATE [TM Clustering].CONTENT

SET NODE_CAPTION= 'Likely Customers'

WHERE NODE_UNIQUE_NAME = '001'

iii.            DELETE - Clears the trained content from a mining model.

/*

Purpose: Clears a mining model, a mining structure, or a mining structure and all its associated mining models, depending on the Data Mining Extensions (DMX) clauses that you use.

 

Syntax:

 

DELETE FROM [MINING MODEL] <model>[.CONTENT]

DELETE FROM [MINING STRUCTURE] <structure>[.CONTENT]|[.CASES]

Example: removes all of the content from the NB_Sample model.

*/

DELETE FROM NBSample.CONTENT

iv.            SELECT

Purposes:

 

The SELECT statement in Data Mining Extensions (DMX) is used for the following tasks in data mining:

·       Browsing the content of an existing mining model

·       Browsing the mining structure

·       Creating a copy of an existing mining model

·       Creating predictions from an existing mining model

 

Syntax:

This is the generic form. It has various forms.

SELECT [FLATTENED] [TOP <n>] <select list>

         FROM <model/structure>[.aspect]

         [WHERE <condition expression>]

         [ORDER BY <expression>[DESC|ASC]]

Instantiations:

         SELECT FROM <model>

         SELECT DISTINCT FROM <model >

         SELECT FROM <model>.CONTENT

         SELECT FROM <model>.CASES

         SELECT FROM <model>.SAMPLE_CASES

         SELECT FROM <structure>.CASES

         SELECT FROM <model>.DIMENSION_CONTENT

         SELECT FROM <model> PREDICTION JOIN

1.     SELECT FROM <model>

·       What's for?

The simplest prediction query; also called an empty prediction join.

 

Returns the most probable value for the columns specified in the query.

 

Only the content from the mining model is used to create the prediction.

 

The columns in the expression list in the syntax below must be defined as predict or predict only, or related to a predictable column.

 

·       Syntax:

SELECT <expression list> [TOP <n>] FROM <model>

[WHERE <condition list>]

[ORDER BY <expression> [DESC|ASC]]

 

·       Example 1:

Performs an empty prediction join on the Bike Buyer column, returning the most likely state in the mining model.

-- Return 0

SELECT ([Bike Buyer]) FROM [TK448 Ch09 Prediction Naive Bayes]

·       Example 2:

 

Performs a prediction on the Amount column in the Forecasting model, returning the next four time steps.

The Model Region column combines bike models and regions into a single identifier. The query uses the PredictTimeSeries function to perform the prediction.

-- Return:

 

                    Model Region        Expression

                    M200   Europe       -Expression

                                                            $TIME         Amount

                                                            200807 264039.420693544

                                                            200808 323995.069045674

                                                            200809 346405.627525689

                                                            200810 337472.763729162

                    M200 North America  +Expression

                    M200 Pacific        +Expression

                    R250 Europe         +Expression

                    R250 North America  +Expression

                    R250 Pacific        +Expression

                    R750 Europe         +Expression

                    R750 North America  +Expression

                    R750 Pacific        +Expression

                    T1000 Europe        +Expression

                    T1000 North America +Expression

                    T1000 Pacific       +Expression

SELECT [Model Region], PredictTimeSeries(Amount, 4)

FROM [TK448 Ch09 Forecasting Time Series]

 

2.              SELECT DISTINCT FROM <model>

 

·       What's for?

Returns all possible states for the selected column in the model. The values that are returned vary depending on whether the specified column contains discrete values, discretized numeric values, or continuous numeric values.

 

The SELECT DISTINCT FROM statement only works with a single column or with a set of related columns. This clause does not work with a set of unrelated columns.

 

The SELECT DISTINCT FROM statement allows you to directly reference a column inside of a nested table. For example: <model>.<table column reference>.<column reference>

 

The results of the SELECT DISTINCT FROM <model> statement vary, depending on the column type.

 

·       Syntax:

 

SELECT [FLATTENED] DISTINCT [TOP <n>] <expression list> FROM <model>

[WHERE <condition list>][ORDER BY <expression>]

 

·       Example 1 - Discrete values

 

For columns that contain discrete values, the results always include the Missing state, shown as a null value.

 

-- Return:

Gender

F

M

SELECT DISTINCT [Gender]

FROM [TK448 Ch09 Prediction Decision Trees]

 

·       Example 2: Discretized Column Example  

 

Returns the midpoint, minimum age, and maximum age by buckets in the column.

/*return

Midpoint Age              Minimum Age       Maximum Age

            

36                         30                  42

47                         42                  52

57                         52                  62

66                         62                  71

84                         71                  96

*/

SELECT DISTINCT [Age] AS [Midpoint Age],

    RangeMin([Age]) AS [Minimum Age],

    RangeMax([Age]) AS [Maximum Age]

FROM [TK448 Ch09 Prediction Decision Trees]

 

3.              SELECT FROM <model>.CONTENT

 

·       What's for?

 

Returns the mining model schema rowset for the specified data mining model.

 

The SELECT FROM <model>.CONTENT statement returns content that is specific to each algorithm.

For example, you might want to use the descriptions of all the rules of an association rules model in a custom application. You can use a SELECT FROM <model>.CONTENT statement to return values in the NODE_RULE column of the model.

 

·       Syntax:

 

     SELECT [FLATTENED] [TOP <n>] <expression list> FROM <model>.CONTENT

     [WHERE <condition expression>]

     [ORDER BY <expression> [DESC|ASC]]

 

·       Example 1: Returns the ID of the parent node for the decision trees model.

 

                   Result:

                                        MODEL_NAME                              NODE_NAME

                                        TK448 Ch09 Prediction Decision Trees       0

SELECT MODEL_NAME, NODE_NAME FROM [TK448 Ch09 Prediction Decision Trees].CONTENT

WHERE NODE_TYPE = 1

 

·       Example 2: Uses the IsDescendant function to return the immediate children of the node that was returned in the previous query.

 

Result:

          NODE_NAME           NODE_TYPE   NODE_CAPTION

          000000001           2            All

          00000000100         3         Number Cars Owned = 0

          00000000101         3         Number Cars Owned = 3

          00000000102         3         Number Cars Owned = 1

          00000000103         3         Number Cars Owned = 4

          00000000104         3         Number Cars Owned = 2

          0000000010400        3         Region = 'Pacific'

          0000000010401        3         Region not = 'Pacific'

          000000001040100     4         Yearly Income < 39221.4065840128

............total 127 rows........................

 

SELECT NODE_NAME, NODE_TYPE, NODE_CAPTION

FROM [TK448 Ch09 Prediction Decision Trees].CONTENT

WHERE ISDESCENDANT('0')

 

·       Example 3: Returns a single node, the marginal statistics node (NODE_TYPE = 26) from a Naïve Bayes model.

 

However, this node contains a nested table, in the NODE_DISTRIBUTION column. As a result, the nested table column is flattened and a row is returned for every row in the nested table. The value of the scalar column   MODEL_NAME is repeated for each row in the nested table.

 

Results:

MODEL_NAME   NODE_DISTR.ATTRIBUTE_NAME  NODE_DISTR.ATTRIBUTE_VALUE        NODE_DISTR.SUPPORT  NODE_DISTR.PROBABILITY

xxx Naive Bayes     Age                 Missing                           0                    0                         

xxx Naive Bayes     Age                 < 42                             3826                    0.295695185099312

...................................66 rows.........................

*/

 

SELECT FLATTENED MODEL_NAME, NODE_DISTRIBUTION

FROM [TK448 Ch09 Prediction Naive Bayes].CONTENT

WHERE NODE_TYPE = 26

 

·       Example 4: Return only some of the columns from the nested table by using a sub-select statement.

 

You can simplify the display by aliasing the table name of the nested table, as shown.

 

Result:

             MODEL_NAME                       t.ATTRIBUTE_NAME       t.ATTRIBUTE_VALUE          t.SUPPORT

             TK448 Ch09 Prediction Naive Bayes Age                 Missing                    0

             TK448 Ch09 Prediction Naive Bayes Age                 < 42                       3826

             TK448 Ch09 Prediction Naive Bayes Age                 42 - 52                    4303

             TK448 Ch09 Prediction Naive Bayes Age                 52 - 62                    2872

             TK448 Ch09 Prediction Naive Bayes Age                 62 - 71                    1393

             TK448 Ch09 Prediction Naive Bayes Age                 >= 71                      545

             TK448 Ch09 Prediction Naive Bayes Bike Buyer          Missing                    0

             TK448 Ch09 Prediction Naive Bayes Bike Buyer          0                          6509

             TK448 Ch09 Prediction Naive Bayes Bike Buyer          1                          6430

             .............66 rows....................

 

SELECT flattened MODEL_NAME,

(SELECT ATTRIBUTE_NAME, ATTRIBUTE_VALUE, [SUPPORT]

FROM NODE_DISTRIBUTION) AS t

FROM [TK448 Ch09 Prediction Naive Bayes].CONTENT

WHERE NODE_TYPE = 26

 

4.              SELECT FROM <model>.CASES

·       What's for?

Supports drillthrough, and returns the cases that were used to train the model.

You can also return structure columns that are not included in the model, if drillthrough has been enabled on the mining structure and on the mining model, and if you have the appropriate permissions.

 

If drillthrough is not enabled on the mining model, this statement will fail.

 

·       Syntax:

 

     SELECT [FLATTENED] [TOP <n>] <expression list> FROM <model>.CASES

     [WHERE <condition expression>][ORDER BY <expression> [DESC|ASC]]

 

·       Example 1: Drillthrough to Model Cases and Structure Columns

 

The following example returns the columns for all the cases that were used to test the [TK448 Ch09 Prediction Decision Trees] model.

 

If the mining structure on which the model is built does not have a holdout test data set, this query would return 0 cases.

 

You can use the expression list to return only the columns that you need like below.

 

Note: You have to enable drillthrough for the model. You can enable drillthrough for an existing model in BIDS or SSDT. Find the model, right-click the property, make AllowDrillThrough = true, then process the model. Then run the DMX below.

Result:

          Customer Key Bike Buyer

          11001               1

          11005               1

          11007               1

          11010               1

          11015               1

          11016               1

          11017               1

          11018               1

          11020               1

          11023               0

          ...........5545 rows.............

 

SELECT [Customer Key], [Bike Buyer] FROM [TK448 Ch09 Prediction Decision Trees].Cases

WHERE IsTestCase();

 

·       Example 2: Drillthrough to Training Cases in a Specific Node

 

The following example returns just those cases that were used to train Cluster 2. The node for Cluster 2 has the value '002' for the NODE_UNIQUE_NAME column.

 

The example also returns one structure column, [Customer Key], that was not a part of the mining model, and provides the alias CustomerID for the column.

 

Note that the name of the structure column is passed as a string value and therefore must be enclosed in quotation marks, not brackets.

 

Result:

          CustomerID

          11133

          11344

          11345

          .....return 2169 rows..........

SELECT StructureColumn('Customer Key') AS CustomerID --, * --disabled for presentation

FROM [TK448 Ch09 Prediction Clustering].Cases

WHERE IsTrainingCase()

AND IsInNode('002')

 

5.              SELECT FROM <model>.SAMPLE_CASES

 

·       What's for?

 

Returns sample cases that are representative of the cases that are used to train the data mining model.

To use this statement, you must enable drillthrough for the mining model.

 

Sample cases may be generated and may not actually exist in the training data.

The returned case is representative of the specified content node.

 

Although the Microsoft Sequence Clustering algorithm is the only Microsoft algorithm that supports using SELECT FROM <model>.SAMPLE_CASES, third-party algorithms may also support it.

 

·       Syntax:

 

     SELECT [FLATTENED] [TOP <n>] <expression list> FROM <model>.SAMPLE_CASES

     [WHERE <condition list>] ORDER BY <expression> [DESC|ASC]]

 

·       Example:

The following example returns sample cases that are used to train the TK448 Ch09 Sequence Clustering mining model. Using the IsInNode (DMX) function in the WHERE clause returns only cases that are associated with the '000000003' node. The node string can be found in the NODE_UNIQUE_NAME column of the schema rowset.

 

Result:

 

          Order Number v Assoc Seq Line Items

          SO51176                    -v Assoc Seq Line Items

                                           Line Number    Model

                                           1                 Mountain Tire Tube

                                           2                 Hydration Pack

                                           3                 Sport-100

          SO51177                    +v Assoc Seq Line Items

          SO51178                    +v Assoc Seq Line Items

          .....................200 Rows.......................

 

Select * from [TK448 Ch09 Sequence Clustering].SAMPLE_Cases

WHERE IsInNode('000000003')

 

6.              SELECT FROM <structure>.CASES

 

·       What's for?

 

Returns the cases that were used to create the mining structure.

If drillthrough is not enabled on the structure, the statement will fail.

Also, the statement will fail if the user does not have drillthrough permissions on the mining structure.

 

In Analysis Services, drillthrough on new mining structures is enabled by default.

You cannot enable or disable drillthrough on the mining structure by using Data Mining Extensions (DMX). But you can do it with BIDS or SSDT.

 

·       Syntax:

 

     SELECT [TOP n] <expression list> FROM <structure>.CASES

     [WHERE <condition expression>][ORDER BY <expression> [DESC|ASC]]

 

·       Example 1: Drill through to Structure Cases

 

The following example returns a list of the 5 oldest customers in the mining structure. The query returns all the columns in the mining model, but restricts the rows to those who purchased a bike, and orders them by age. You can also edit the expression list to return only the columns that you need as below.

 

Result:

          Age    Bike Buyer   Customer Key

          84     1                   11147

          84     1                   11251

          84     1                   11120

          84     1                   11257

          84     1                   11252

*/

SELECT TOP 5 Age, [Bike Buyer],[Customer Key]

FROM [TK448 Ch09 Prediction].Cases

WHERE [Bike Buyer] = 1

ORDER BY Age DESC;

 

·       Example 2: Drillthrough to Test or Training Cases Only

The following example returns a list of the structure cases for Targeted Mailing that are reserved for testing. If the mining structure does not contain a holdout test set, by default all cases are treated as training cases, and this query would return 0 cases.

Result:

          Customer Key        Gender        Age

          11001               M            47

          11005               M            47

          11007               M            47

          11010               F            47

...............5545 Rows.................

SELECT [Customer Key], Gender, Age

FROM [TK448 Ch09 Prediction].Cases

WHERE IsTestCase()

--WHERE IsTrainingCase();

 

7.              SELECT FROM <model>.DIMENSION_CONTENT

 

·       What's for?

 

A mining model can be used as a dimension in an OLAP cube, where each node in the model is represented as a member of the dimension. The SELECT FROM <model>.Dimension_CONTENT statement returns the content of the model that pertains to its usage as a dimension.

 

·       Syntax:

 

         SELECT [FLATTENED] [TOP <n>] <expression list>

FROM <model>.Dimension_CONTENT

         [WHERE <condition expression>]

         [ORDER BY <expression> [DESC|ASC]]

·       Example - Selects all columns from the [TK448 Ch09 Prediction Decision Trees] model content that pertain to using the model as a dimension.

 

Result:

Too many columns to fit here. See the pictures below, Note: two screens are for the same rows. There are total 127 rows in the result set.

 

SELECT *

FROM [TK448 Ch09 Prediction Decision Trees].Dimension_Content

 

 

 

8.     SELECT FROM <model> PREDICTION JOIN

·       What's for?

 

Uses a mining model to predict the states of columns in an external data source.

The PREDICTION JOIN statement matches each case from the source query to the model.

 

·       Syntax:

 

         SELECT [FLATTENED] [TOP <n>] <select expression list>

         FROM <model> | <sub select> [NATURAL] PREDICTION JOIN

         <source data query> [ON <join mapping list>]

         [WHERE <condition expression>]

         [ORDER BY <expression> [DESC|ASC]]

 

The ON clause defines the mapping between the columns from the source query and the columns from the mining model.

This mapping is used to direct columns from the source query to columns in the mining model so that the columns can be used as inputs to create the predictions. Columns in the <join mapping list> are related by using an equal sign (=), as shown in the following example:

 

[MiningModel].ColumnA = [source data query].Column1 AND

[MiningModel].ColumnB = [source data query].Column2 AND

 

·       Example 1: Singleton Query

The following example shows how to create a query to predict whether a specific person will buy a bicycle in real time.

 

In this query the data is not stored in a table or other data source, but instead is entered directly into the query.

 

The person in the query has the following traits:

       35 years old

       Owns a house

       Owns two cars

       Has two children living at home

 

Using the T [TK448 Ch09 Prediction Decision Trees] and the known characteristics about the subject, the query returns a Boolean value that describes whether the person bought the bike and a set of tabular values, returned by the PredictHistogram (DMX) function, that describe how the prediction was made.

 

Result:

                              Bike Buyer            Expression

                              0                            -Expression

                                                            Bike Buyer            $SUPPORT            $PROBABILITY                     $ADJUSTEDPROBABILITY                  $VARIANCE          $STDEV

                                                            0                            2690                     0.593763788792469               0.0112902922003023                     0                            0

                                                            1                            1840                     0.40623621120753               0.00827793729343418                   0                            0

                                                                                          0                            0                                           0                                                            0                            0

SELECT

  [TK448 Ch09 Prediction Decision Trees].[Bike Buyer],

  PredictHistogram([Bike Buyer])

FROM

  [TK448 Ch09 Prediction Decision Trees]

NATURAL PREDICTION JOIN

(SELECT 35 AS [Age],

  '5-10 Miles' AS [Commute Distance],

  '1' AS [House Owner Flag],

  2 AS [Number Cars Owned],

  2 AS [Total Children]) AS t

 

·       Example 2: Using OPENQUERY

 

The following example shows how to create a batch prediction query by using a list of potential customers stored in an external dataset. Because the table is part of a data source view that has been defined on an instance of Analysis Services, the query can use OPENQUERY to retrieve the data.

           

Because the names of the columns in the table are different from those in the mining model, the ON clause must be used to map the columns in the table to the columns in the model.

 

The query returns the first and last name of each person in the table, together with a Boolean column that indicates whether each person is likely to buy a bike, where 0 means "probably will not buy a bike" and 1 means "probably will buy a bike". The last column contains the probability for the predicted result.

Result:

          LastName     FirstName    Bike Buyer          Expression

          Adams        Madeline     1                0.558392389

          Aggarwal     Edwin        1                0.634411008

          Aggarwal     Misty        1                0.506349356

          .............. 1105 Rows.........................

 

SELECT

  t.[LastName],

  t.[FirstName],

  [TK448 Ch09 Prediction Decision Trees].[Bike Buyer],

  PredictProbability([Bike Buyer])

From

  [TK448 Ch09 Prediction Decision Trees]

PREDICTION JOIN

  OPENQUERY([Adventure Works DW2008R2],

    'SELECT

      [LastName],

      [FirstName],

      [MaritalStatus],

      [Gender],

      [YearlyIncome],

      [TotalChildren],

      [NumberChildrenAtHome],

      [Education],

      [Occupation],

      [HouseOwnerFlag],

      [NumberCarsOwned]

    FROM

      [dbo].[ProspectiveBuyer]

    ') AS t

ON

  [TK448 Ch09 Prediction Decision Trees].[Marital Status] = t.[MaritalStatus] AND

  [TK448 Ch09 Prediction Decision Trees].[Gender] = t.[Gender] AND

  [TK448 Ch09 Prediction Decision Trees].[Yearly Income] = t.[YearlyIncome] AND

  [TK448 Ch09 Prediction Decision Trees].[Total Children] = t.[TotalChildren] AND

  [TK448 Ch09 Prediction Decision Trees].[Number Children At Home] = t.[NumberChildrenAtHome] AND

  [TK448 Ch09 Prediction Decision Trees].[English Education] = t.[Education] AND

  [TK448 Ch09 Prediction Decision Trees].[English Occupation] = t.[Occupation] AND

  [TK448 Ch09 Prediction Decision Trees].[House Owner Flag] = t.[HouseOwnerFlag] AND

  [TK448 Ch09 Prediction Decision Trees].[Number Cars Owned] = t.[NumberCarsOwned]

--To restrict the data set to only the customers who are predicted to buy a bike,

--and then sort the list by customer name, you can add a WHERE clause and an ORDER BY clause to the previous example:

WHERE [BIKE Buyer]=1

ORDER BY [LastName] ASC

 

·       Example 3: Predicting Associations

 

The following example shows how to create a prediction by using a model that is built from the Microsoft Association algorithm. Predictions on an association model can be used to recommend related products.

      

For example, the following query returns the three products that are most likely to be purchased together with the following three products:

            Mountain Bottle Cage

            Mountain Tire Tube

            Mountain-200     

The Predict (DMX) function is polymorphic and can be used with all model types. You use the value 3 as an argument to the function to limit the number of items that are returned by the query. The SELECT list that follows the NATURAL PREDICTION JOIN clause supplies the values to use as input for prediction.

 

Note: Because the column that contains the predictable attribute, [v Assoc Seq Line Items], is a table column, the query returns a single column that contains a nested table. By default the nested table column is named Expression. If your provider does not support hierarchical rowsets, you can use the FLATTENED keyword as shown in this example to make the results easier to view.

Result:

          Expression.Model

          Sport-100

          Water Bottle

          Patch kit

SELECT FLATTENED

  PREDICT([TK448 Ch09 Association Rules].[v Assoc Seq Line Items], 3)

FROM

  [TK448 Ch09 Association Rules]

NATURAL PREDICTION JOIN

(SELECT (SELECT 'Mountain Bottle Cage' AS [Model]

  UNION SELECT 'Mountain Tire Tube' AS [Model]

  UNION SELECT 'Mountain-200' AS [Model]) AS [v Assoc Seq Line Items]) AS t