1. Preparation of Data

For this demo, I use the same project sample (pp. 399-403) in the TK 448 book. But this time, we need a different view - [dbo].[vTimeSeries]. This view contain 7 columns and 360 rows. Below are the partial results. I have abbreviated the columns Quantity to QTY and CalendarYear to CYear.

/******* Return 360 Rows

ModelRegion TimeIndex QTY Amount CYear Month ReportingDate

M200 Europe 200507 6 20324.94 2005 7 2005-07-25 00:00:00.000

M200 Europe 200508 6 20349.94 2005 8 2005-08-25 00:00:00.000

M200 Europe 200509 5 16949.95 2005 9 2005-09-25 00:00:00.000

M200 Europe 200510 5 16949.95 2005 10 2005-10-25 00:00:00.000

M200 Europe 200511 8 27124.92 2005 11 2005-11-25 00:00:00.000

M200 Europe 200512 8 27049.92 2005 12 2005-12-25 00:00:00.000

M200 Europe 200601 8 27124.92 2006 1 2006-01-25 00:00:00.000

*******************/

SELECT * FROM [dbo].[vTimeSeries]

2. The model structure of a time series model

For this particular model, we like to use the past amount and quantity to predict the futre amount or quantity. We are not only interested in the entire population, but we are interested in the ModelRegion level as well. Thus, we use both TimeIndex and ModelRegion as the key. ModelRegion actually is the series key column because there is only one record for each ModeRegion for every time slice.

3. Results

The Time Series model has only two tabs – Model and Charts

a. Model

Let’s start with the items in the ‘Tree’ dropdown list box as below. They are actually all possible predicables because there are 12 ModelRegions in the view and we have chosen Amount and Quantity as the predictable. So there are 24 items in the list.

/************ Total 12 ModelRegion in the view

M200 Europe

M200 North America

M200 Pacific

R250 Europe

R250 North America

R250 Pacific

R750 Europe

R750 North America

R750 Pacific

T1000 Europe

T1000 North America

T1000 Pacific

************************************************/

In general, when we build a time series model, Analysis Services presents the completed model as a tree. If the data contains multiple case series, Analysis Services builds a separate tree for each series. For example, we are predicting sale amount and quantity for model/region. The predictions for each of these Amount/Quantity and ModelRegions are case series. Analysis Services builds a separate tree for each of these series. To view a particular series, select the series from the Tree list.

For each tree, the time series model contains an All node, and then splits into a series of nodes that represent periodic structures discovered by the algorithm. As stated elsewhere, the Microsoft Time Series algorithm includes two separate algorithms for analyzing time series:

· The ARTXP algorithm, which was introduced in SQL Server 2005, is optimized for predicting the next likely value in a series.

· The ARIMA algorithm was added in SQL Server 2008 to improve accuracy for long-term prediction.

By default, the Microsoft Time Series algorithm uses both methods, ARTXP and ARIMA, and blends the results to improve prediction accuracy.

Now let’s pick up a simple tree first – the one on M200 North American: Amount.

It has the root node ‘All’ only. The number of cases is 28 and the ARTxp equation is

Amount = -18044.606 + 0.395 * Amount (M200 North America,-5) -68.127 * Quantity (M200 North America,-5) + 6238.016 * Quantity (R750 Europe,-8) + 947.770 * Quantity (M200 Europe,-5)

The ARIMA equation is:

ARIMA ({1,-0.437850777450388,-0.275446162796555,-0.147157624359275,0.258456568312988},0,{1,0.40589820815716,6.92517291464772E-02,-0.562913240510376,-3.78748928313655E-02,0.764136397379578,0.361501798658526}) X ({1,0.89587094113018},1,{1})(6) Intercept:35168.3950839853

The first predictor in the ARTxp equation – Amount (M200 North American, -5) means the amount for M200 North American 5 periods ago. Other three predictors can be interpreted in a similar way. The equation for the node means that, based on the 28 cases from the dataset, the amount for M200 North American can be best predicated by the four predictors in a relationship as expressed in the equation.

Now let’s look at another one with splitting branches – R250 Europe Amount.

In the figure above, the root node ‘All’ contains the number of cases and the ARIMA equation only. After the root node, the tree splits into separate nodes. For each non-root node, the Mining Legend contains both the ARTxp and ARIMA algorithms, the equation for the node, and the number of cases in the node. The ARTxp equation is listed first, and is labeled as the tree node equation. This is followed by the ARIMA equation.

In our case, the 34 cases in the root node are split into 2 nodes, one for Amount (M200 Europe, 2 periods ago) < 68635.773 with 23 cases and another one for Amount (M200 Europe, 2 periods ago) >= 68635.773 with 11 cases. This implies that the condition - Amount for M200 Europe at 2 periods ago = 68635.773 is the critical criterion to separate the training cases. In a time series model, such a branching indicates the most likely seasonal cycle (i.e., different patterns or equations for periods before and after this point). So depending on the sales amount for M200 Europe at 2 periods ago, they may have different prediction equations. For the 1^st node with amount<68635.773, the ARTxp equation is: Amount = 92555.668 + 46.474 * Quantity (R750 North America,-1). In other words, the amount can be predicted with the sales quantity on R750 in North America at one period ago. For the 2^nd node with 11 cases, the ARTxp equation can be described with four predictors:

Amount = 53890.626 + 0.038 * Amount (M200 Europe,-1) -341.313 * Quantity (R750 North America,-1) -0.014 * Amount (M200 Europe,-2).

b. Charts

The Charts tab displays a graph that shows the behavior of the predicted attribute over time, together with five predicted future values (default). The vertical axis of the chart represents the value of the series, and the horizontal axis represents time.

Let’s first find out what are items in the ‘Tree’ dropdown list box as below. They are the 24 predictable! We can select one or more to display on the chart.

Let’s say we select Prediction Steps = 1 for M200 North America Amount. In other words, we are going to only predict the next period amount after the latest period in the dataset, which is 200807.

/***** 36 rows

TimeIndex

200806

200805

200804

....

SELECT DISTINCT TimeIndex FROM [dbo].[vTimeSeries]

ORDER BY TimeIndex desc

Remember earlier in discussing model, we have found the equation for M200 North American Amount is:

Amount = -18044.606 + 0.395 * Amount (M200 North America,-5) -68.127 * Quantity (M200 North America,-5) + 6238.016 * Quantity (R750 Europe,-8) + 947.770 * Quantity (M200 Europe,-5)

Let’s get the query results on the relevant data: Amount (M200 North America,-5), Quantity (M200 North America,-5), Quantity (R750 Europe,-8), and Quantity (M200 Europe,-5)

ModelRegion TimeIndex Quantity Amount CalendarYear Month ReportingDate

M200 North America 200802 89 205329.11 2008 2 2008-02-25 00:00:00.000

SELECT * FROM [dbo].[vTimeSeries]

WHERE TimeIndex='200802' AND ModelRegion='M200 North America'

ModelRegion TimeIndex Quantity Amount CalendarYear Month ReportingDate

R750 Europe 200710 36 19439.64 2007 11 2007-11-25 00:00:00.000

SELECT * FROM [dbo].[vTimeSeries]

WHERE TimeIndex='200711' AND ModelRegion='R750 Europe'

ModelRegion TimeIndex Quantity Amount CalendarYear Month ReportingDate

M200 Europe 200802 86 198344.14 2008 2 2008-02-25 00:00:00.000

SELECT * FROM [dbo].[vTimeSeries]

WHERE TimeIndex='200802' AND ModelRegion='M200 Europe'

So, the amount for 200807 = -18044.606+ 0.395 * 205329.11-68.127 *89+6238.016 *36+947.77 *86=363074, somewhat different from the value 363390.688 shown on the chart above.

Now let’s pick up another one – R250 Europe Amount. We choose prediction steps = 4, predicting the value for the next 4 periods after 200806. Based on the discussion in the Model section, the prediction equation for the four values will be either Amount = 92555.668 + 46.474 * Quantity (R750 North America,-1) or Amount = 53890.626 + 0.038 * Amount (M200 Europe,-1) -341.313 * Quantity (R750 North America,-1) -0.014 * Amount (M200 Europe,-2), depending on if Amount (M200 Europe, 2 periods ago) < 68635.773.

Finally, let’s look at the Abs button. Sometimes, we have a chart with multiple models. In such a scenario, the scale of the data for each model might be very different. If we use an absolute curve, one model might appear as a flat line, whereas another model shows significant changes. This occurs because the scale of one model is greater than the scale of the other model. For instance, in the chart below, the amount for M200 Europe is often at 1000s, whereas the quantity for M200 Europe is typically less than 100. If we use absolute value, the line for quantity appears to be very low and flat due to the large scale for amount.

In this case, it’s better to use the Abs button to switch to a relative curve. In the chart below, we have changed the scale to show the percentage of change instead of absolute values. This makes it easier to compare models that are based on different scales.

On the Journey to be a SQL Server Professional

How to Interpret the Results from a Time Series Model?

1. Preparation of Data

2. The model structure of a time series model

3. Results

a. Model

b. Charts