I still use the same project exercises from the TK 70-448
book as the previous post. In the project, 12 variables
are used to find the salient predictors for the predicable variable - Bikebuyer
(yes or no).
CustomerKey is the key column.
BikeBuyer is the predictable column, and an input as
well.
The other 11 input columns are Age, CommuteDistance,
EnglishEducation, EnglishOccupation, Gender, HouseOwnerFlag, MaritalStatus,
NumberCarsOwned, NumberChildrenAtHome, Region, TotalChildren, and YearlyIncome.
Now let us look at the results from the Neural Network Model.
There is only one viewer for the neural network model: Attribute
Discrimination. But we can see the result from two different perspectives: the entire
attribute sets or a specific attribute/value.
Attribute Discrimination
for All Attributes – SSAS calculates the score for each attribute on the two
competing values of the predicable variables to determine which value wins the
attribute. It further arranges the attributes in the descending order, as in
other discrimination charts.
In the chart above, we use all of the 12 predictor variables as the input, the two competing values on the predicable
value are 1 and 0. The attribute ‘Age >=71’ has the largest difference on
the two values of the dependent variable, and favors the non-buyer value. This difference score is
standardized as 100 as explained in an earlier discrimination chart. The
attribute with the 2nd largest standardized score of 88.92 is
Children at Home =3, also favoring the non-buyer. The 3rd one is
Occupation = Manual, etc. The implication is simple: if a customer is older
than 70, with 3 children at home, with a manual occupation, or with 5 total children,
he/she tends to NOT buy a bike. On the other hand, customers with a management
occupation, from the Pacific area, or with a yearly income more than $124,634,
they tend to buy a bike, although the probabilities of purchase in these
groups are smaller than those in the above-mentioned tend-not-to-buy groups. The
other attributes in chart can be interpreted in a similar way.
Attribute Discrimination
for A Specific Attribute – The drop-down box show 12 input variables.
Let’s say, we
select gender, then we click the value drop-down box on the right. It shows the
corresponding possible values for gender: M, F, and Missing (we do not have
missing values in the data set). Let’s say we choose M.
The diagram will show us
the attribute discrimination distribution for the male customers only. The calculation
of the attribute scores and the interpretation of these scores is the same as that
in the chart above for the entire attribute set. Interestingly, the conclusions from this chart are very similar to those in the entire population.
In other words, gender is not a good differentiator.