1. Preparation of Data
For this demo, I use the same project sample (pp.
399-403) in the TK 448 book. But this time, we need a different view - [dbo].[vTimeSeries]. This
view contain 7 columns and 360 rows. Below are the partial results. I have
abbreviated the columns Quantity to QTY and CalendarYear
to CYear.
/*******
Return 360 Rows
ModelRegion TimeIndex QTY Amount CYear Month ReportingDate
M200 Europe 200507 6
20324.94 2005 7 2005-07-25 00:00:00.000
M200 Europe 200508 6
20349.94 2005 8 2005-08-25 00:00:00.000
M200 Europe 200509 5
16949.95 2005 9 2005-09-25 00:00:00.000
M200 Europe 200510 5
16949.95 2005 10 2005-10-25 00:00:00.000
M200 Europe 200511 8
27124.92 2005 11 2005-11-25 00:00:00.000
M200 Europe 200512 8
27049.92 2005 12 2005-12-25 00:00:00.000
M200 Europe 200601 8
27124.92 2006 1 2006-01-25 00:00:00.000
*******************/
SELECT * FROM [dbo].[vTimeSeries]
2. The model structure of a time series model
For this particular
model, we like to use the past amount and quantity to predict the futre amount or quantity. We are not only interested in the
entire population, but we are interested in the ModelRegion
level as well. Thus, we use both TimeIndex and ModelRegion as the key. ModelRegion
actually is the series key column because there is
only one record for each ModeRegion for every time
slice.
3. Results
The Time Series model has
only two tabs – Model and Charts
a. Model
Let’s start with the
items in the ‘Tree’ dropdown list box as below. They are actually all possible predicables because there are 12 ModelRegions
in the view and we have chosen Amount and Quantity as the predictable. So there
are 24 items in the list.
/************ Total
12 ModelRegion in the view
M200 Europe
M200 North America
M200 Pacific
R250 Europe
R250 North America
R250 Pacific
R750 Europe
R750 North America
R750 Pacific
T1000 Europe
T1000 North America
T1000 Pacific
************************************************/
In general, when we build
a time series model, Analysis Services presents the completed model as a tree.
If the data contains multiple case series, Analysis Services builds a
separate tree for each series. For example, we are predicting sale amount and
quantity for model/region. The predictions for each of these Amount/Quantity
and ModelRegions are case series. Analysis Services
builds a separate tree for each of these series. To view a particular series,
select the series from the Tree list.
For each tree, the time
series model contains an All node, and then splits into a series of nodes that represent
periodic structures discovered by the algorithm. As stated elsewhere, the Microsoft Time Series algorithm includes two
separate algorithms for analyzing time series:
·
The ARTXP algorithm, which was introduced in SQL Server
2005, is optimized for predicting the next likely value in a series.
·
The ARIMA algorithm was added in SQL Server 2008 to improve
accuracy for long-term prediction.
By default, the
Microsoft Time Series algorithm uses both methods, ARTXP and ARIMA, and blends
the results to improve prediction accuracy.
Now let’s pick up a
simple tree first – the one on M200 North American: Amount.
It has the root node
‘All’ only. The number of cases is 28 and the ARTxp equation is
Amount = -18044.606 + 0.395 * Amount (M200 North America,-5)
-68.127 * Quantity (M200 North America,-5) + 6238.016 * Quantity (R750
Europe,-8) + 947.770 * Quantity (M200 Europe,-5)
The ARIMA equation is:
ARIMA ({1,-0.437850777450388,-0.275446162796555,-0.147157624359275,0.258456568312988},0,{1,0.40589820815716,6.92517291464772E-02,-0.562913240510376,-3.78748928313655E-02,0.764136397379578,0.361501798658526})
X ({1,0.89587094113018},1,{1})(6) Intercept:35168.3950839853
The first predictor in
the ARTxp equation – Amount (M200
North American, -5) means the amount for M200 North American 5 periods ago.
Other three predictors can be interpreted in a similar way. The equation for the
node means that, based on the 28 cases from the dataset, the amount for M200
North American can be best predicated by the four predictors in a relationship
as expressed in the equation.
Now let’s look at another
one with splitting branches – R250 Europe Amount.
In the figure above,
the root node ‘All’ contains the number of cases and the ARIMA equation only.
After the root node, the tree splits into separate nodes. For each non-root
node, the Mining Legend contains both the ARTxp and
ARIMA algorithms, the equation for the node, and the number of cases in the
node. The ARTxp equation is listed first, and is
labeled as the tree node equation. This is followed by the ARIMA equation.
In our case, the 34
cases in the root node are split into 2 nodes, one for Amount (M200 Europe, 2
periods ago) < 68635.773 with 23 cases and another one for Amount (M200
Europe, 2 periods ago) >= 68635.773 with 11 cases. This implies that the
condition - Amount for M200 Europe at 2 periods ago = 68635.773 is the critical
criterion to separate the training cases. In a time series model, such a
branching indicates the most likely seasonal cycle (i.e., different patterns or
equations for periods before and after this point). So depending on the sales
amount for M200 Europe at 2 periods ago, they may have different prediction
equations. For the 1st node with amount<68635.773, the ARTxp equation is: Amount =
92555.668 + 46.474 * Quantity (R750 North America,-1). In other words, the
amount can be predicted with the sales quantity on R750 in North America at one
period ago. For the 2nd node with 11 cases, the ARTxp
equation can be described with four predictors:
Amount = 53890.626 +
0.038 * Amount (M200 Europe,-1) -341.313 * Quantity (R750 North America,-1)
-0.014 * Amount (M200 Europe,-2).
b. Charts
The Charts tab displays a graph
that shows the behavior of the predicted attribute over time, together with
five predicted future values (default). The vertical axis of the chart
represents the value of the series, and the horizontal axis represents time.
Let’s first find out what are items in the ‘Tree’
dropdown list box as below. They are the 24 predictable! We can select one or
more to display on the chart.
Let’s say we select Prediction Steps = 1 for M200 North America
Amount. In other words, we are going to only predict the next period amount
after the latest period in the dataset, which is 200807.
/***** 36 rows
TimeIndex
200806
200805
200804
....
*/
SELECT DISTINCT TimeIndex FROM [dbo].[vTimeSeries]
ORDER BY TimeIndex desc
Remember earlier in
discussing model, we have found the equation for M200 North American Amount is:
Amount = -18044.606 +
0.395 * Amount (M200 North America,-5) -68.127 * Quantity (M200 North America,-5)
+ 6238.016 * Quantity (R750 Europe,-8) + 947.770 * Quantity (M200 Europe,-5)
Let’s get the query
results on the relevant data: Amount (M200 North America,-5), Quantity (M200
North America,-5), Quantity (R750 Europe,-8), and Quantity (M200 Europe,-5)
/*
ModelRegion TimeIndex Quantity Amount CalendarYear Month ReportingDate
M200 North America 200802 89 205329.11 2008 2 2008-02-25 00:00:00.000
*/
SELECT * FROM
[dbo].[vTimeSeries]
WHERE TimeIndex='200802' AND ModelRegion='M200 North America'
/*
ModelRegion TimeIndex Quantity Amount CalendarYear Month ReportingDate
R750 Europe 200710 36 19439.64 2007 11 2007-11-25
00:00:00.000
*/
SELECT * FROM
[dbo].[vTimeSeries]
WHERE TimeIndex='200711' AND ModelRegion='R750 Europe'
/*
ModelRegion TimeIndex Quantity Amount CalendarYear Month ReportingDate
M200 Europe 200802 86 198344.14 2008 2 2008-02-25 00:00:00.000
*/
SELECT * FROM
[dbo].[vTimeSeries]
WHERE TimeIndex='200802' AND ModelRegion='M200 Europe'
So, the amount for 200807
= -18044.606+ 0.395 * 205329.11-68.127
*89+6238.016 *36+947.77 *86=363074, somewhat different from the value
363390.688 shown on the chart above.
Now let’s pick up another one – R250 Europe Amount. We choose
prediction steps = 4, predicting the value for the next 4 periods after 200806.
Based on the discussion in the Model section, the prediction equation for the
four values will be either Amount = 92555.668 + 46.474
* Quantity (R750 North America,-1) or Amount = 53890.626 +
0.038 * Amount (M200 Europe,-1) -341.313 * Quantity (R750 North America,-1)
-0.014 * Amount (M200 Europe,-2), depending on if Amount (M200 Europe, 2 periods ago) < 68635.773.
Finally, let’s look at the Abs button. Sometimes, we have a
chart with multiple models. In such a scenario, the scale of the data for each
model might be very different. If we use an absolute curve, one model might
appear as a flat line, whereas another model shows significant changes. This
occurs because the scale of one model is greater than the scale of the other
model. For instance, in the chart below, the amount for M200 Europe is often at
1000s, whereas the quantity for M200 Europe is typically less than 100. If we
use absolute value, the line for quantity appears to be very low and flat due
to the large scale for amount.
In this case, it’s better to use the Abs button to switch to a relative curve. In the
chart below, we have changed the scale to show the percentage of change instead
of absolute values. This makes it easier to compare models that are based on
different scales.