SSAS Interview Questions - Part 1 of 3 (On Concepts from TK 70-448)

1.   What are the Key Components of UDM in SSAS?

·       Data source

o   Supported: .NET Framework or native OLE DB provider.

o   Not supported: ODBC, JDBC

·       Data source view (Dsv)

o   If the primary keys and table relationships are physically defined within the underlying relational database, the Data Source View Wizard will detect and automatically add them to the DSV. If they are not defined, you must manually add logical (rather than physical) primary key and table relationship definitions to your DSV.

·       Dimensional model

o   Cubes

o   Dimensions with attribute and/or multilevel hierarchies.

·       Calculations (optional)

·       End-user model (optional)

o   KPIs,

o   Actions,

o   Perspectives, and

o   Translations.

·       Management settings

o   Availability,

o   Latency, and

o   Security.

2.   Creating and Modifying SSAS Cubes

a.     Create a Cube by Using the Cube Wizard

b.     Modifying a Cube with the Cube Designer – 10 Tabs

·       Cube Structure

o   Measure Groups and Measures

o   Cube Dimensions

o   DSV

·       Dimension Usage

·       Calculation

·       KPI

·       Actions

·       Partition

·       Aggregation

·       Perspective

·       Translation

·       Browse

3.   Creating and Modifying Dimensions

a.     Creating a Dimension with the Dimension Wizard

b.     Use the Dimension Designer to modify the design of the dimension—adding, changing, and removing attributes and hierarchies.

                                     i.          Four Tabs

1)    Dimension Structure

·       Attributes

·       Hierarchy – define the attribute hierarchy here!

·       DSV

2)    Attribute Relationship – define the attribute relationships here

3)    Translations

4)    Browser

                                    ii.          Modifying Dimension Properties – two important ones

·       Storage Mode

·       Proactive Caching

                                   iii.          Modifying Dimension Attribute Properties

c.      Assigning Dimensions to Cubes

·       A new dimension is not automatically added to the cube in the projects. You need to manually add the dimension to the cube and associate the dimension with the related measure groups.

·       Cube Designer | Cube Structure | Add Cube Dimension button on the toolbar.

4.   Add a new measure group

a.     Why?

·       When a new fact table has been added to the database and you need to bring its data into the cube. Because a measure group typically is bound to a single fact table, you would need as many measure groups as there are fact tables in the data warehouse.

·       Distinct count (such as unique visitor counts on a Web site) calculations provide valuable information but come with a number of performance challenges. The recommended way is to create a measure group with only one measure with a Distinct Count aggregation function. There are other ways (i.e., using the Tabular model or extra dimension), see http://www.mssqltips.com/sqlservertip/3043/different-options-for-creating-a-distinct-count-measure-in-ssas/

b.     How?

·       Add the table to the DSV if necessary.

·       In the BIDS Cube Designer, click the Cube Structure tab. Right-click the Cube node in the Measures pane, and then select New Measure Group.

·       Select the fact table that will become the source of the new measure group.

c.      Measure group properties

·       ErrorConfiguration - Specifies whether the default or the custom error configuration will be used to handle error conditions.

·       EstimatedRows

·       IgnoreUnrelatedDimensions - By default, unrelated dimensions are ignored; if IgnoreUnrelated Dimensions is set to False, unrelated dimensions are used, but the data is considered missing.

·       ProactiveCaching - Specifies the proactive caching settings.

·       ProcessingMode - By default, ProcessingMode is set to Regular, and users cannot access the measure group until processing is complete. If ProcessingMode is set to Lazy Aggregations, data is accessible as soon as it is processed, but processing takes longer.

·       StorageMode

·       Type - Provides client applications with information about the contents of the measure group (for example, Sales). This property is not used by the server. The client can inspect and interpret this property as needed (for example, change the display icon of the measure group).

d.     Adding and Configuring Measures Properties

                                     i.          Add a measure – choose proper aggregation functions for different types of measures on the New Measure Wizard.

·       Additive

·       Semi-additive

·       Nonadditive – rate and %

                                    ii.          Configuring Measure Properties

·       AggregateFunction - Specifies the measure aggregate function.

·       DataType - Needs to be explicitly set for Count and Distinct Count measures only. The default option, Inherited, gets the data type from the underlying DSV.

·       DisplayFolder - Groups measures into logical folders.

·       FormatString - Defines the measure display format. Custom format strings such as #,##0;( #,##0) are supported.

·       MeasureExpression

·       Source - Defines the measure binding. Measures are typically bound to a single numeric column (fact) from the fact table that serves as the source of the measure group (column binding). However, a Count measure could be bound to a row (row binding) if it needs to count rows.

·       Visible

5.   Extending SSAS Cubes

a.     Defining User Hierarchies and Dimension Relationships

                                     i.          Defining Attribute Relationships – in Dimension Designer | Attribute Relationship

                                    ii.          Creating and Modifying User Dimension Hierarchies – in Dimension Designer | Dimension Structure

                                   iii.          Associating Dimensions to Measure Groups – Cube Designer | Dimension Usage

1)    Five Types of Relationship Types

·       Regular - Defines the relationship when a dimension is joined directly to a measure group through a specific attribute called the “granularity” attribute.

·       Fact - Used when the dimension is based on the fact table used to define the measure group.

·       Referenced - Used when a given dimension is related to a measure group through an intermediate dimension.

·       Many-To-Many - Specifies that a dimension is related to a given measure group through an intermediate measure group.

·       Data Mining - Defines the relationship between a dimension based on a data mining model and a given measure group.

b.     Creating KPIs, Actions, Translations, and Perspectives

                                     i.          Understanding KPI Value, Goal, Status, and Trend Properties

1)    Additional KPI Properties

2)    Creating KPIs

3)    Viewing KPIs

4)    MDX KPI Functions: KPIValue(), KPIGoal(), KPIStatus(), and KPITrend()

                                    ii.          Implementing Actions

1)    Regular actions

·       Dataset - The action content is an MDX statement.

·       Proprietary - The action content is client-specific. The client is responsible for interpreting the semantic meaning of the action.

·       Rowset - The action content is a command statement to retrieve data. Unlike the Dataset action, however, a Rowset action targets any OLE DB–compliant data source, including a relational database.

·       Statement - The action content represents an OLE DB command. Unlike the Dataset and Rowset actions, the statement should not yield any results other than success or failure.

·       URL (default) - The action content is a URL and should indicate one of the standard protocols, such as HTTP, HTTPS, FTP, FILE, or MAIL. For security reasons, a client application might ignore protocols other than HTTP and HTTPS.

2)    Drillthrough actions - Let the client request the details behind aggregated cell values in the cube. This is the only action type that the client application can send to SSAS for execution.

3)    Reporting actions - Can be used to request SSRS reports. The action command is the URL report path along with optional report parameters.

                                   iii.          Localizing Cubes Through Translations

1)    Translating Cubes

2)    Translating Dimensions

                                  iv.          Implementing Cube Perspectives

·       Defining Perspectives

6.   Creating calculations and Queries by using MDX

a.     MDX Fundamentals

                                     i.          TUPLES - A tuple is a multidimensional coordinate that identifies a single cell in the cube space.

                                    ii.          SETS - An MDX set is a collection of tuples with the same dimensionality, or attribute hierarchies.

b.     Basic MDX Queries

c.      Calculated MEMBERS – using a WITH clause before the SELECT statement.

With Member [Gross Profit] AS

   '[Measures].[Internet Sales Amount] +

   [Measures].[Reseller Sales Amount]'

Select {[Measures].[Internet Sales Amount],[Gross Profit]} on Columns,

[Product].[Category].Members on Rows

d.     MDX functions

                                     i.          CurrentMember

                                    ii.          Functions for navigating Hierarchies

                                   iii.          MDX provides functions such as PrevMember, Children, and Parent

                                  iv.          Functions for Navigating Time - MDX provides a few functions, such as Lag and ParallelPeriod

e.     Creating Calculated Members - Using the CREATE MEMBER MDX statement

CREATE MEMBER CURRENTCUBE.[MEASURES].[Sales Amount]

AS [Measures].[Internet Sales Amount] +

   [Measures].[Reseller Sales Amount],

FORMAT_STRING = "Currency",

VISIBLE = 1;

f.      Creating Named Sets – using Create Set

                                     i.          A named set is an MDX construct that has an alias and that returns a set of dimension members.

                                    ii.          You can explicitly specify the set tuples, or you can use standard set producing MDX functions, such as Children or Members.

CREATE SET CURRENTCUBE.[Top 50 Most Profitable Customers]

AS

TopCount (

   (     -- --Using the triple notation syntax, Dimension.Attribute.Attribute

Existing [Customer].[Customer].[Customer].Members),

         50,

         [Measures].[Internet Sales Amount]

);

                                   iii.          Static and Dynamic Named Sets - A static named set is evaluated once by the server. In contrast, a dynamic named set, which was introduced in SSAS 2008, is evaluated for each query.

7.   Measure Group Partitions

a.     Reasons for Partitioning

                                     i.          Improved performance

                                    ii.          Improved manageability - Partitions can have different storage modes and aggregation designs, and you can manage them independently.

b.     Considerations for Partitioning – not too small or too big

c.      Creating Measure Group Partitions – Cube Designer | Partition tab | Write the query

d.     Partition Storage Mode

                                     i.          MOLAP – Metadata + Data + Aggregation are on SSAS

                                    ii.          ROLAP – Metadata on SSAS; Data + Aggregation are on Relational Database

                                   iii.          HOLAP – Metadata + Aggregation on SSAS; Data on Relational Database (i.e., Keep the fact data in the source database).

e.     Understanding Proactive Caching

                                     i.          When to use?

·       For MOLAP and HOLAP storage modes

·       When the data source is transaction oriented and you want minimum latency, consider configuring the cube to process automatically by using proactive caching.

                                    ii.          Types of Data Change Notification Event

·       SQL Server - This option uses the Microsoft SQL Server trace events that the relational engine raises when data is changed (SQL Server 2000 and later).

·       Client Initiated - In this case, a client application notifies SSAS when it changes data by sending a NotifyTableChange XML for Analysis (XMLA) command.

·       Scheduled Polling - With this option, the server periodically polls the required tables for changes.

                                   iii.          Enabling Proactive Caching

8.   Understanding Aggregations

a.     Defining Initial Aggregations with the Aggregation Design Wizard

                                     i.          What does “Performance gain reaches” mean?

·       Specify the percentage amount of performance gain for your queries. This amount represents the percentage improvement between the maximum and minimum query times, as represented by the following formula:

PercentGain = 100 * (QTimeMAX - QTimeTARGET) / (QTimeMAX - QTimeMIN)

 

For example, if a query that is not optimized takes twenty-two seconds (QTimeMAX) to execute, and the best possible query performance with maximum aggregations is two seconds (QTimeMIN), specify a 75% desired performance gain to achieve a query time of seven seconds (QTimeTARGET).

 

https://technet.microsoft.com/en-us/library/aa933652%28v=sql.80%29.aspx

 

If desired 30% gain, 30=100*(22-x)/20è6=22-xèx=16, you would achieve a query time of 16s.

 

Note 1: the formula indicates that the % of performance gain is not % of all the possible aggregation combinations.

 

Note 2: the formula implies that the maximum performance gain is 100%.

 

b.     Refining Aggregations with the Usage-Based Optimization Wizard

9.   Deploying SSAS Objects

a.     BIDS - Deploying the latest changes to your local server for testing.

                                     i.          BIDS Design Modes

1)    Project Mode or Disconnected Mode – default, need to deploy it

2)    Connected Mode – connecting to the SSAS database directly, do not need deployment.

b.     Deployment Wizard - Deploying to a test or production environment when you need more granular control.

                                    ii.          Designed to support incremental deployment, so it gives you more control.

                                   iii.          The Deployment Wizard supports two deployment modes: interactive and command-line.

·       In interactive mode, the wizard reads the build files and shows you their settings. You can overwrite these settings if you need to. The end result is an XMLA script that you can run immediately or save for later execution.

·       Clicking the Deployment Wizard link in the Microsoft SQL Server 2008 Analysis Services program group.

c.      XMLA script - Scheduling a deployment task.

·       The deployment script consists of two sections. The first section starts with the Alter XMLA command followed by the definitions of all objects in the database.

·       The second section starts with a Process XMLA command. It instructs the server to process the database by using the processing mode you specified on the Select Processing Options page.

·       Running the Deployment Script

o   Execute it manually in SSMS,

o   Use it with SQL Server Agent

o   Use the Analysis Services Execute DDL Task.

d.     Synchronize Database Wizard - Synchronizing two cubes, such as a staging cube and a production cube.

·       When to use it?

Suppose that you have a farm of production servers and you need to deploy the latest changes to each server. You want to avoid processing the SSAS database on each production server because each cube would take very long to process. Instead, you decide to process the cube on a staging server and then synchronize each production server with the staging server. The Synchronize Database Wizard is designed specifically to handle this deployment model.

·       How to use it?

SSMS |SSAS server | SSAS Databases | Synchronize

e.     Backup and restore - Moving a cube from one server to another

f.      Analysis Management Objects (AMO) - Handling deployment programmatically

10.         Processing SSAS Objects

a.     Which Objects? – Cube, Dimension, Measure Group, Partition

b.     Processing Options for SSAS Objects

                                     i.          Process Default - Performs the minimum number of tasks required to fully initialize the object. The server converts this option to one of the other options based on the object state: All objects (Dimension, cube, measure group, partition + Mining Objects)

                                    ii.          Process Full: All objects

                                   iii.          Unprocess – Delete data: All objects

                                  iv.          Process Structure - Deletes the partition data and applies Process Default to the cube dimensions: Cube only

                                    v.          Process Update - Applies member inserts, deletes, and updates without invalidating the affected cubes: Dimension only

                                  vi.          Process Add - Adds only new data: Dimension, Partition only

                                 vii.          Process Data - Loads the object with data without building indexes and aggregations: Dimension, cube, measure group, partition only

                                viii.          Process Index - Retains data and builds only indexes and aggregations: Dimension, cube, measure group, partition only

c.      Processing Options for Data Mining Structure

                                     i.          Process Default - Performs the minimum number of tasks required to fully initialize a mining object. For example, if you change a model and send Process Default to the structure, the server will process that model only.

                                    ii.          Process Full - Drops the object stores and rebuilds the model. Metadata changes, such as adding a new column to a mining structure, require Process Full.

                                   iii.          Process Structure - Reads and caches the training data in the structure. Does not affect the mining models inside the structure.

                                  iv.          Process Clear Structure - Drops the cached data without affecting the mining models inside the structure. Disables drillthrough features. Consider

Process Clear Structure if you need to dispose of the source data used to train the models to reduce the disk footprint.

                                    v.          Unprocess - Deletes data from the mining object. When this command is sent to a structure, it deletes the cached training data in the structure and the model patterns.

d.     Processing Objects in BIDS and SSMS

e.     Setting Advanced Processing options with Proactive Caching

·       When you enable proactive caching on a partition, you do not have to process the cube explicitly. Instead, the server automatically processes the MOLAP cache when changes are made to the data source.

·       The proactive caching has standard proactive caching modes, such as Automatic MOLAP. However, as an administrator, you can fine-tune proactive caching by setting advanced options.

f.      Using the Analysis Services Tasks in SSIS (Analysis Services Execute DDL Task, Analysis Services Processing Task, Data Mining Query Task)

·       The UDM commonly serves as an OLAP layer on top of a data warehouse or a data mart. Typically, implementing a data warehouse requires ETL processes to extract data from one or more data sources, cleanse and transform that data, and load the data into the data warehouse. As a last step of the ETL process, you can automate UDM processing. SSIS includes an Analysis Services Processing Task that can process SSAS objects.

11.         SSAS Security

·       The SSAS user security architecture is layered on top of Windows security.

·       Group Windows users and groups into roles.

o   Administrators role

o   DATABASE roles

·       A permission defines a security policy for a given object.

o   Database Permissions: Full Control, Process Database, Read Definition

o   Membership – Assign users/groups to the role

o   Data Source: None, Read

o   Cubes: None, Read, Read/Write; DrillThrough (for the cube!); Process

o   Cell Data

§  Read - This permission determines the cells that members of the role can view. For example, the role is allowed access to all measures except Internet Sales Amount:

 

NOT Measures.CurrentMember IS [Measures].[Internet Sales Amount]

 

§  Read-contingent - This permission determines whether cells derived (i.e., [Total Sales Amount]) from restricted cells (i.e.,.[Internet Sales Amount]) are permitted. For example, if you define:

 

[Total Sales Amount]=[Internet Sales Amount]+[Reseller Sales Amount]

 

And you define the following in Read-contingent (but nothing in Read), then you cannot see Internet Sales Amount and Total Sales Amount.

 

NOT Measures.CurrentMember IS [Measures].[Internet Sales Amount]

 

§  Read/Write - This permission identifies the cells that users can update when writing back to the cube, assuming that the partition is enabled for writeback. The measure group partition must be write-enabled for read/write permission to take effect. By default, if the cell is disallowed, it will show #N/A. Alternatively, the client application can use the Secured Cell Value connection string setting to specify a different text, such as NULL, for disallowed cells.

 

o   Dimensions: Read (default), Read/Write, Read Definition, Process. So by default, members of a role have access to all dimensions in the database.

o   Dimension Data

§  With the pessimistic approach, you deny everything except a set of allowed members called an allowed set.

§  The optimistic approach allows all members except a set of denied members, called a denied set.

o   Mining Structure

12.         Managing SSAS High Availability, Backups, and Object Scripting

a.     Backing up and restoring an SSAS Database in SSMS

b.     Detaching and attaching SSAS databases

c.      Scheduling SSAS Backups in SQL Server Agent – using XMLA

d.     Scripting SSAS objects in SSMS

e.     Clustering SSAS:

                                     i.          Clustering for Scalability using NLB

                                    ii.          Clustering for Availability using Clustering

13.         Editing SSAS Server Properties

a.     Setting folder Locations

b.     Configuring Error Logging and Reporting

·       Working with the flight Recorder Trace for trouble shooting

c.      Defining Aggregations with the Usage-Based Optimization Wizard

d.     Populating the Query Log Table

14.         Tracing and Monitoring SSAS Instances

a.     Tracing SSAS Instances with SQL Profiler

b.     Using Performance Monitor to Analyze SSAS Performance

                                     i.          Understanding SSAS Performance Counters

                                    ii.          Configuring Performance Monitor

                                   iii.          Using Dynamic Management Views

15.         SSAS Data Mining

a.     Preparing and Creating Data Mining Structures

                                     i.          Preparing Data for Data Mining

1)    Data Source

·       Relational Data Source

·       UDM Data Source (When you mine OLAP cubes, your case table is a dimension. Any measure from any measure group—that is, any fact table—connected with the selected dimension can be used as a case-level column.)

2)    Create the DSV

                                    ii.          Creating Data Mining Models – Key, Predictable, and Input columns

                                   iii.          Selecting Data Mining Algorithms

1)    Association Rules

2)    Clustering

3)    Decision Trees

4)    Linear Regression

5)    Logistic Regression

6)    Naïve Bayes

7)    Neural Network

8)    Sequence Clustering

9)    Time Series

b.     Building a Mining Structure by Using the Data Mining Wizard.

·       On the Select The Definition Method page, select From Existing Table or Existing Cube.

c.      The Data Mining Designer

                                     i.          Five Tabs

1)    Mining Structure

2)    Mining Models

3)    Mining Model Viewer

4)    Mining Accuracy Chart

5)    Mining Model Prediction

                                    ii.          What can you do?

1)    Modify the mining structure, including adding or deleting a column and changing the properties of a column or structure.

2)    Add additional mining models to the structure, and change the parameters of any model in the structure.

3)    Process the structure and browse the models by using Data Mining Viewers.

4)    Check the accuracy of the models by using a lift chart and other techniques.

5)    Create DMX prediction queries using your models.

                                   iii.          Creating models and applying algorithms

1)    Mapping Mining Structure Attributes to Source Columns

·       Key/ Key sequence/ Key time

·       Input / Predictable / Input & Predictable / Ignored

2)    Using Case Table Definitions and nested Tables

·       Association Rules - Model in the nested table as the Key, input, predictable

·       Sequences - Line Number as the Key Sequence, input; Model as the Input, predictable

·       Predicting a Case Table Column - predict a column of the case table by using the values in the nested table.

                                  iv.          Configuring Algorithm Parameters

d.     Validating Predictive Models

                                     i.          Validity

1)    Lift Charts

2)    Profit Charts

3)    Classification Matrix

                                    ii.          Reliability of predictive models using cross validation (examine the SDs for consistency).

e.     Measuring the Accuracy of Association Rules - Support, Probability, and Importance

f.      Measuring the Accuracy of Clustering and Sequence Clustering

·       RULE 1 – USE THE BUSINESS SENSE RATHER THAN MATHEMATICS

·       RULE 2 – LOOK AT THE MSOLAP_NODE_SCORE FOR THE MODEL (the higher the better)

g.     Measuring the Accuracy of Time Series

By using a specific number of periods from the past, we can try to forecast present values. If the model performs well for forecasting present values, there is a better probability that it will perform well for forecasting future values. We control the creation of historical models by using two algorithm parameters:

·       HISTORICAL_MODEL_COUNT - controls the number of historical models to build.

·       HISTORICAL_MODEL_GAP- specifies the time increments in which historical models to be built.

h.     Creating Data Mining Queries and Reports

                                     i.          Using Prediction Query Builder in BIDS or SSMS

                                    ii.          Using DMX

·       DDL

·       DML

i.       Using Prediction Queries in Reports

·       The Prediction Query Builder is included in SSRS, in the Report Designer and the Report Wizard. This builder already creates output without nested tables (that is, it creates flattened output).

·       If you write a DMX query manually, you have to include the FLATTENED DMX keyword in the query, right after the SELECT keyword.

j.       Configuring SSAS Roles, Permissions, and Data Sources

                                     i.          Impersonation

1)    Use a specific username and password

2)    Use the service account

3)    Use the credentials of the current user

4)    Inherit

                                    ii.          Data Sources permissions

1)    Access—read or access—none

2)    Read Definition

                                   iii.          Mining Structures permissions

1)    Access—read or access—none

2)    Drill through

3)    Read Definition

4)    Process

                                  iv.          Mining Models permissions

1)    Access—read, access—read/Write, or access—none

2)    Drill through

3)    Read Definition

4)    Browse

k.     Data Mining Processing options (see the section 10.c)