Oracle® OLAP Application Developer's Guide 10g Release 2 (10.2) Part Number B14349-02 |
|
|
View PDF |
This chapter explains how to design a logical data model and create a standard form analytic workspace using Analytic Workspace Manager.
This chapter contains the following topics:
Analytic Workspace Manager is the primary tool for creating, developing, and managing analytic workspaces. The main window provides two views: the Model View and the Object View. You can switch between views using the View menu. In addition, there are menus, a toolbar, a navigation tree, and property sheets. When you select an object in the navigation tree, the property sheet to the right provides detailed information about that object. When you right-click an object, you get a choice of menu items with appropriate actions for that object.
You can also conduct an interactive session by opening OLAP Worksheet and using the OLAP DML. You can switch between the console and OLAP Worksheet, and have an up-to-date view of your workspace in each one, because they share the same session.
Analytic Workspace Manager has a full online Help system, which includes context-sensitive Help.
The Model View enables you to define a logical dimensional model composed of dimensions, levels, hierarchies, attributes, measures, calculated measures, and measure folders. The model is stored in the analytic workspace as database standard form metadata.
A drag-and-drop user interface facilitates mapping of the logical objects to columns in relational tables and views in Oracle Database. The source columns can be star, snowflake, or any other schema design that supports the logical model.
Figure 3-1 shows the logical objects created in the GLOBAL
analytic workspace.
Figure 3-1 Model View in Analytic Workspace Manager
The Object View provides a graphical user interface to the OLAP DML. You can create, modify, and delete individual workspace objects. This view is provided for users who are familiar with the OLAP DML and want to upgrade Express databases, modify custom applications, or customize a new analytic workspace. Be very careful when working with a standard form analytic workspace, so that you do not create inconsistencies in the metadata.
Figure 3-2 shows the Object View. A formula named UNITS_CUBE_SALES
is currently selected in the navigation tree, and the right pane shows the Expression tab of the property sheet. This tab shows the OLAP DML expression used to calculate the formula.
Figure 3-2 Object View in Analytic Workspace Manager
OLAP Worksheet provides full use of the OLAP DML for users who need to manage the contents of an object or execute a program. It opens in a separate window from the Analytic Workspace Manager console. This window provides menus, a toolbar, an input pane for OLAP DML commands on the bottom, and an output pane on the top.
Figure 3-3 shows OLAP Worksheet opened from Analytic Workspace Manager. Notice that the GLOBAL
workspace is attached with read/write access in both OLAP Worksheet (as shown by the AW LIST
command) and Analytic Workspace Manager (as shown by the Model View navigation tree). The two applications share the same session.
The OLAP DML Reference is available through the Help menu.
Figure 3-3 OLAP Worksheet Opened From Analytic Workspace Manager
In this section, you will learn how to obtain the Analytic Workspace Manager software, install it on your computer, and make a connection to Oracle Database.
Analytic Workspace Manager is distributed with Oracle Database. Three disks compose the Oracle Database 10g Release 2 installation set: Database (db), Companion, and Client. Analytic Workspace Manager is on the Client disk.
If you are installing on the same system as the database, then choose a Custom installation and install into the same Oracle home directory as the database. Select OLAP Analytic Workspace Manager and Worksheet from the list of components.
If you are installing on a remote system, then choose either an Administrator or a Custom installation.
See Also: An installation guide for your platform, such as:
|
On Windows, use the Start menu to open Analytic Workspace Manager:
Start > All Programs > Oracle - Oracle_home > Integrated Management Tools > OLAP Analytic Workspace Manager and Worksheet
On Linux, open Analytic Workspace Manager from the shell command line:
$ORACLE_HOME/olap/awm/awm.sh
You can define a connection to each database that you use for OLAP. After you have defined a connection, the database instance is listed in the navigation tree for you to access at any time.
To define a database connection:
Right-click the top Databases folder in the navigation tree, then choose Add Database to Tree from the pop-up menu.
Complete the Add Database to Tree dialog box.
Using Analytic Workspace Manager, you can:
Design the logical dimensional model for the analytic workspace
Map logical objects to relational data sources
Load and aggregate the data
These steps are very closely related. The data that supports your logical model must exist in your database, and you must have SELECT
privileges on the tables containing the data so you can load it into your analytic workspace.
Your goal in using Analytic Workspace Manager is to create a multidimensional data store that supports business analysis. The analytic workspace that you create must contain the logical objects described in "The Logical Dimensional Data Model". For the source data to support a logical dimensional data model, these relationships must exist:
Dimensions. You can map dimensions, levels, and attributes to any collection of tables or views that identify the child-parent relationships and the member-attribute relationships. The tables and views can be in one schema or owned by multiple schemas. When mapping dimensions, you can choose from these categories of schemas:
You can identify different dimensions as having different schema characteristics, for example, Customer could be a star schema (all levels and their attributes are in one table) and Time could be a snowflake schema (levels are in two or more tables with their attributes).
Measures. You can map measures to any table or view that contains the appropriate data.
Hierarchies and cubes are strictly metadata objects and are not mapped to data sources.
Tables may contain columns of no importance to your analytic workspace. You can simply omit them from the mappings, and Analytic Workspace Manager will ignore them.
A star schema is the simplest of the three types. It is called a star schema because a diagram of this schema resembles a star, with points radiating from a central table. The center of the star is a fact table and the points of the star are the dimension tables.
Dimension tables define the dimensions. In a star schema, all of the information for a dimension is stored in one table.
Fact tables contain foreign keys from each dimension table and a column for each measure.
Figure 3-4 shows the relationships in a star schema using the GLOBAL
relational tables. These tables provide the data for the Units Cube. These source tables illustrate different types of schema designs:
PRODUCT_DIM
and CHANNEL_DIM
are level-based dimensions in a star schema.
PRODUCT_CHILD_PARENT
is a parent-child table that supports a value-based hierarchy. There are no level columns.
TIME_MONTH_DIM
is the base-level table of a snowflake schema. The Time tables are described in "Snowflake Schema".
A snowflake schema is a type of star schema. It is called a snowflake schema because a diagram of the schema resembles a snowflake. Snowflake schemas normalize dimensions to eliminate redundancy. That is, the dimension data has been divided into multiple tables instead of one large table. Each level may be in a separate table with its attributes.
Figure 3-5 shows the Time dimension in a snowflake schema, with separate tables for months, quarters, and years.
Note that the other dimensions are shown only partially in this snowflake diagram.
Figure 3-5 Normalized Time Dimension in a Snowflake Schema
Any schema can be used that contains the parent-child relationships and the member-attribute relationships needed to implement dimensions in a dimensional data model. In the most extreme case, each parent-child and member-attribute value pairs for each hierarchy may be in a different table.
Figure 3-6 shows the Product dimension in schema that contains the appropriate relationships. Contrast these 11 tables with the single table shown in Figure 3-5 for the Product dimension in a star schema. Whereas in the star schema, the Product dimension has one source table, this schema has been normalized to store each level and each attribute in a separate table.
Note that the other dimensions are not shown in this diagram.
Figure 3-6 Product Dimension in an "Other" Schema Design
Analytic Workspace Manager provides direct mapping of one logical object to one column of a relational table or view. If you need to transform your data, then you can choose between these alternatives:
Create views that perform the necessary transformations.
Use an ETL tool such as Oracle Warehouse Builder to generate a star schema. You can then create the analytic workspace using Analytic Workspace Manager.
Use Oracle Warehouse Builder to generate an analytic workspace in Oracle9i standard form. You can then use Analytic Workspace Manager to upgrade the analytic workspace to Oracle Database 10g standard form. (The second step is not needed when using Oracle Warehouse Builder 10g Release 2.)
Following are some of the basic types of transformations that can be handled by creating views:
Load a selection of data. The Maintenance Wizard loads all rows from a mapped column into the analytic workspace. If you only want a selection of the available data, create a view with a WHERE
clause.
Load multiple levels of data. Analytic Workspace Manager currently permits you to map only one level. Create a view with a WHERE
clause that selects the base level for the analytic workspace.
Both Analytic Workspace Manager and Warehouse Builder can be used to generate analytic workspaces.
Warehouse Builder is designed for Information Technology (IT) professionals who manage production systems. It is a powerful tool that can generate analytic workspaces as one element in a larger ETL process.
Analytic Workspace Manager is an easy-to-use tool designed for application developers, departmental DBAs, and other nonprofessional DBAs. It enables them to design and develop a data model quickly and interactively based on their reporting needs. After the data model has been developed and its design is stable, the IT department may assume responsibility for generating the analytic workspace using Warehouse Builder. Analytic Workspace Manager can be used to enhance the analytic workspaces created by the IT department, such as by adding custom measures.
In the Model View, you can define and build an analytic workspace from relational tables and views. The tables and views can be stored in one or more schemas in which the appropriate data relationships exist, as described in "Identifying the Source Data".
Analytic Workspace Manager saves changes automatically that you make to the analytic workspace. You do not explicitly save your changes.
Saves occur when you take an action such as these:
Click OK or the equivalent button in a dialog box.
For example, when you click Import in the Import From EIF File dialog box, the contents are imported, and the revised analytic workspace is committed to the database. Likewise, when you click Create in the Create Dimension dialog box, the new dimension is committed to the database.
Click Apply in a property sheet.
For example, when you change the labels on the General property page for an object, the change takes effect when you click Apply.
To create an analytic workspace in database standard form:
Configure your database instance for OLAP use. Define permanent, temporary, and undo tablespaces, and set the database parameters to values appropriate for data loads. Refer to Chapter 6 for details.
Define a database user who will own the analytic workspace. Grant the user the OLAP_USER
role and SELECT
privileges on the source data tables.
While you can create the workspace in the same schema as the relational tables, doing so can cause problems in defining unique names within a single namespace.
Examine the sparsity characteristics of your data, so that you can implement a logical model for the best performance.
See "What is Sparsity?" and "Examining Sparsity Characteristics for GLOBAL".
Open Analytic Workspace Manager and connect to your database instance as the user you defined earlier for this purpose.
Create a new analytic workspace container in your database:
In the Model View navigation tree, expand the folders until you see the schema where you want to create the analytic workspace.
Right-click the schema name, then choose Create Analytic Workspace from the pop-up menu.
Complete the Create Analytic Workspace dialog box, then choose Create.
The new analytic workspace appears in the Analytic Workspaces folder for the schema.
Define the logical dimensions for the data.
Define the logical cubes for the data.
Map the logical items to their data sources.
Load the data.
Define measure folders to simplify access for end users.
When you have finished, you will have an analytic workspace populated with the detail data fetched from relational tables or views. You may also have summarized data and calculated measures.
In addition to the basic steps, you can add functionality to an analytic workspace in these ways:
Support multiple languages by adding translations of metadata and attribute values.
Develop one or more calculation plans for the analytic workspace, so that you can establish the order of calculations. Calculation plans enable you to include dependent calculations in the model.
Dimensions are the parents of levels, hierarchies, and attributes in the logical model. You define these supporting objects, in addition to the dimension itself, in order to have a fully functional dimension.
However, you can also define dimensions with value-based hierarchies that do not have levels defined as metadata, or "flat" dimensions that do not have hierarchies or levels. These types of dimensions must be defined with natural keys, as described in the next topic.
Dimensions are lists of unique values that identify and categorize data. They form the edges of a logical cube, and thus of the measures within the cube. Analytic Workspace Manager supports these common dimension styles:
List or flat dimensions have no levels or hierarchies.
Level-based dimensions use parent-child relationships to group members into levels. Most dimensions are level-based.
Value-based dimensions have parent-child relationships among their members, but these relationships do not form meaningful levels.
You can define dimensions as either User or Time dimensions. Business analysis is performed on historical data, so fully defined time periods are vital. A time dimension table must have columns for period end dates and time span. These required attributes support time-series analysis, such as comparisons with earlier time periods. If this information is not available, then you can define Time as a User dimension, but it will not support time-based analysis.
Every dimension member must be a unique value. Depending on your data, you can create a dimension that uses either natural keys or surrogate keys from the relational sources for its members.
Natural keys are read from the relational sources without modification. To use natural keys, the values must be unique across levels. Because each level may be mapped to a different relational column, this uniqueness may not be enforced in the source data.
For example, a Geography source table might have a value of NEW_YORK
in the CITIES
column and a value of NEW_YORK
in the STATES
column. Unless you take steps to assure uniqueness, the second value for NEW_YORK
will overwrite the first.
If a dimension is flat or value-based, then it must use natural keys. You must take whatever steps you need to assure that the dimension members are unique.
Surrogate keys ensure uniqueness by adding a level prefix to the members while loading them into the analytic workspace. For the previous example, surrogate keys create two dimension members named CITIES_NEW_YORK
and STATES_NEW_YORK
, instead of a single member named NEW_YORK
. A dimension that has surrogate keys must be defined with at least one level-based hierarchy.
To create a standard form dimension:
Expand the folder for the analytic workspace.
An analytic workspace folder contains subfolders named Dimensions, Cubes, Measure Folders, and Calculation Plans.
Right-click Dimensions, then choose Create Dimension from the pop-up menu.
The Create Dimension dialog box is displayed.
Complete all tabs.
Click Help for specific information about your choices.
Click Create.
The new dimension appears as a subfolder under Dimensions.
For business analysis, data is typically summarized by level. For example, your database may contain daily snapshots of a transactional database. Days are thus the base level. You might summarize this data at the weekly, quarterly, and yearly levels.
Levels have parent-child or one-to-many relationships, which form a level-based hierarchy. For example, each week summarizes seven days, each quarter summarizes 13 weeks, and each year summarizes four quarters. This hierarchical structure enables analysts to detect trends at the higher levels, then drill down to the lower levels to identify factors that contributed to a trend.
For each level that you define, you must identify a data source for dimension members at that level. Members at all levels are stored in the same dimension. In the previous example, the Time dimension contains members for weeks, quarters, and years.
To create a level:
Expand the folder for the dimension.
A dimension folder contains subfolders named Levels, Hierarchies, and Attributes.
Right-click Levels, then choose Create Level from the pop-up menu.
The Create Level dialog box is displayed.
Complete all tabs of the Create Level dialog box.
Click Help for specific information about these choices.
Click Create.
The new level appears as an item in the Levels folder.
Dimensions can have one or more hierarchies. Most hierarchies are level-based. Analytic Workspace Manager supports these common types of level-based hierarchies:
Normal hierarchies consist of one or more levels of aggregation. Members roll up into the next higher level in a many-to-one relationship, and these members roll up into the next higher level, and so forth to the top level.
Ragged hierarchies contain at least one member with a different base, creating a "ragged" base level for the hierarchy.
Skip-level hierarchies contain at least one member whose parents are more than one level above it, creating a hole in the hierarchy. An example of a skip-level hierarchy is City-State-Country, where at least one city has a country as its parent (for example, Washington D.C. in the United States).
In relational source tables, a skip-level hierarchy may contain nulls in the level columns.
You may also have dimensions with parent-child relations that do not support levels. For example, an employee dimension might have a parent-child relation that identifies each employee's supervisor. However, levels that group together first-, second-, and third-level supervisors and so forth may not be meaningful for analysis. Similarly, you might have a line-item dimension with members that cannot be grouped into meaningful levels. In this situation, you can create a value-based hierarchy defined by the parent-child relations instead of levels. You can create value-based hierarchies only for dimensions that use natural keys, because surrogate keys are formed with the names of the levels.
To create a hierarchy:
Expand the folder for the dimension.
A dimension folder contains subfolders named Levels, Hierarchies, and Attributes.
Right-click Hierarchies, then choose Create Hierarchy from the pop-up menu.
The Create Hierarchy dialog box is displayed.
Complete all tabs of the Create Hierarchy dialog box.
If you define multiple hierarchies, be sure to define one of them as the default hierarchy.
Click Help for specific information about these choices.
Click Create.
The new hierarchy appears as an item in the Hierarchies folder.
Attributes provide information about the individual members of a dimension. They are used for labeling crosstabular and graphical data displays, selecting data, organizing dimension members, and so forth.
Analytic Workspace Manager creates some attributes automatically when creating a dimension. These attributes have a unique type, such as "Member Long Description," which client applications expect to find. You can create additional "User" attributes that provide supplementary information about the dimension members.
All dimensions are created with long and short description attributes. If your source tables include long and short descriptions, then you can map the attributes to the appropriate columns. However, if your source tables include only one set of labels, then you should always map the long description attributes. You can decide whether or not to map the short description attributes to the same column. If you do, the data will be loaded twice.
Discoverer Plus OLAP, Spreadsheet Add-In, and OracleBI Beans use long description attributes in selection lists and for labelling crosstabs and graphs. The Add-In initially makes limited use of short description attributes, but users can switch to long descriptions. If the appropriate descriptions are not available, then these tools use dimension members. For example, if the Product dimension has short descriptions but no long descriptions, then the tools display Product dimension members.
Time dimensions are created with time-span and end-date attributes. This information must be provided for all Time dimension members.
Be sure to examine all of these attribute definitions, because you may wish to change the default settings. In particular, expand the hierarchy tree on the Basic tab to verify that the correct levels are selected. These choices affect the number of columns that you can map to the dimension.
To create a new user attribute:
Expand the folder for the dimension.
A dimension folder contains subfolders named Levels, Hierarchies, and Attributes.
Right-click Attributes, then choose Create Attribute from the pop-up menu.
The Create Attribute dialog box is displayed.
Complete all tabs of the Create Attribute dialog box.
Click Help for specific information about these choices.
Click Create.
The new attribute appears as an item in the Attributes folder.
Cubes are the parents of measures. They are informational objects that identify measures with the exact same dimensions and thus are candidates for being processed together at all stages: data loading, aggregation, storage, and querying.
Cubes define the shape of your business measures. They are defined by a set of ordered dimensions. The dimensions form the edges of a cube, and the measures are the cells in the body of the cube.
To create a cube:
Expand the folder for the analytic workspace.
An analytic workspace folder contains subfolders named Dimensions, Cubes, Measure Folders, and Calculation Plans.
Right-click Cubes, then choose Create Cube from the pop-up menu.
The Create Cube dialog box is displayed.
Complete all tabs of the Create Cube dialog box.
Important: Your decisions have a major impact on the performance of the analytic workspace. Click Help for specific information, and refer to "Making Data Storage Decisions".
Click Create. The new cube appears as a subfolder under Cubes.
Measures store the facts collected about your business. Each measure belongs to a particular cube, and thus shares particular characteristics with other measures in the cube, such as the same dimensions.
To create a measure:
Expand the folder for the cube that has the dimensions of the new measure.
A cube folder contains subfolders named Measures and Calculated Measures.
Right-click Measures, then choose Create Measure from the pop-up menu.
The Create Measure dialog box is displayed.
Complete the General, Translations, and Implementation Details tabs of the Create Measure dialog box. Complete all tabs if you wish to override the cube settings.
Click Help for specific information about these choices.
Click Create.
The new measure appears as an item in the Measures folder.
Calculated measures add valuable information to an analytic workspace. They are created by performing calculations on the measures stored in an analytic workspace. Oracle OLAP offers an extensive range of functions and operators that can be used to define custom measures. Analytic Workspace Manager provides a Calculation Wizard, as shown in Figure 1-2, which provides these calculations:
Basic Arithmetic. Addition, subtractions, multiplication, division, ratio
Advanced Arithmetic. Cumulative total, index, percent markup, percent variance, rank, share, variance
Prior/Future Comparison. Prior value, difference from prior period, percent difference from prior period, future value
Time Frame. Moving average, moving maximum, moving minimum, moving total, year to date
Calculated measures are not stored, and so they do not occupy any significant disk space. The data values are calculated in response to individual queries on the calculated measures. In this respect, calculated measures are similar to relational views.
To create a calculated measure:
Expand the folder for the cube that contains the base measures that will be used in the calculation.
Right-click Calculated Measures, then choose Create Calculated Measure from the pop-up menu.
The Calculation Wizard Welcome page is displayed.
Follow the steps of the wizard.
Click Help for specific information about these choices. When you are done, the name of the new calculated measure appears as an item in the Calculated Measures folder.
The creation of a cube requires several decisions about data storage that affect the performance of the analytic workspace. These choices are on the Implementation Details tab for the cube.
The DBMS_AW
PL/SQL package contains a Sparsity Advisor and an Aggregate Advisor. These advisors analyze the tables in a relational schema and provide recommendations for data storage in an analytic workspace. These recommendations may help you to make the best choices for storing your data. Remember to always evaluate the advice generated by these tools against your own knowledge of the data.
See Also: Oracle OLAP Reference for information about the Sparsity Advisor and the Aggregate Advisor |
Sparsity refers to the extent to which cells contain null (NA) values instead of data. For example, if a cube is 25 percent sparse, then 25 percent of that cube's cells contain NA values and 75 percent contain data. You can also describe this cube as 75% dense.
In general, if a cube's detail-level data is more than 80 percent sparse, then you must manage sparsity by identifying the sparse dimensions in order to promote good performance.
There are two types of sparsity patterns:
Controlled sparsity means that a range of values of one or more dimensions has no data. This is often a result of the way you design your analytic workspace. For example, a Time dimension might contain future time periods that will be populated by a forecast after the data is loaded into the analytic workspace. This type of sparsity is temporary and should be disregarded when evaluating the sparsity of the cube. Other types of controlled sparsity may remain sparse and should be considered as a factor in evaluating sparsity.
Random sparsity means that NA
values are scattered throughout a measure, usually because some combinations of dimension values never have any data. This is often a result of the nature of your business. Random sparsity tends to be more common than controlled sparsity.
Your data may demonstrate one or both types of sparsity.
Measures have the same sparsity pattern when all of the following are true:
They have exactly the same dimensions
They have many of the same empty cells
They have roughly the same number of empty cells
Measures that share these characteristics should be created in the same cube. However, if their sparsity patterns are very different, then they should be created in different cubes even if they share the same dimensions. Typically, measures that are mapped to the same source tables have the same sparsity pattern.
A cube can be dense, sparse, or extremely sparse.
Dense cubes have up to 20% empty cells. For a dense cube, do not identify any dimensions as sparse.
For a sparse cube, identify the sparse dimensions.
For an extremely sparse cube, identify all of the dimensions as sparse and use compressed storage.
Compressed storage is for measures that are extremely sparse. Extreme sparsity often results from these factors:
A cube has a large number of dimensions (seven or more).
One dimension has more than 300,000 members.
Two dimensions have more than 100,000 members each.
Dimension hierarchies have numerous levels, with little change to the number of dimension members from one level to the next, so that many parents have only one descendant for several contiguous levels.
Compressed storage for this type of sparsity uses less space and results in faster aggregation than normal sparse storage.
Sparsity is calculated as the relationship between the number of actual data values in the measures and the number of cells defined by the dimensions of the cube.
To calculate the percent sparsity in a cube, use this equation:
(data values in measure) / (cells in cube) * 100
To obtain the number of data values in a measure, count the number of rows in the data source. The following SQL statement returns the number of values in the UNITS
measure in the GLOBAL
schema:
SELECT COUNT(*) FROM units_history_fact WHERE units IS NOT NULL;
To calculate the number of cells in a cube, first count the number of base-level values for each dimension. The following SQL statement returns the number of base-level Customers in the GLOBAL
schema:
SELECT COUNT(ship_to_id) FROM customer_dim;
Then multiply the number of values for each dimension in the cube:
time periods * customers * products * channels
Note: You must know the number of dimension values so that you can order the dimensions correctly. |
All of the dimensions may be sparse, or all of them may be dense, or the cube may have a mixture of sparse and dense dimensions. You can determine these characteristics by comparing the total number of dimension values with the number of dimension values actually used in a cube. For even more insight, you can compare these numbers at all levels.
For example, the following SQL statement returns the total number of dimension values at each level in the Global TIME_DIM
dimension table:
SELECT COUNT(month_id), COUNT(DISTINCT quarter_id), COUNT(DISTINCT year_id) FROM time_dim;
The next SQL statement returns the number of dimension values at each level of Time that are used in the Global UNITS_HISTORY_FACT
fact table.
SELECT COUNT(DISTINCT a.month_id), COUNT(DISTINCT a.quarter_id), COUNT(DISTINCT a.year_id) FROM time_dim a, units_history_fact b WHERE a.month_id=b.month_id;
The order in which the dimensions are listed for a cube affects performance because it determines the way the data is stored on disk. The first dimension in a cube is the fastest-varying dimension, and the last dimension is the slowest-varying dimension. The data for each measure in a cube is stored as a linear stream, in which the values of the fastest-varying dimension are clustered together and values of the slowest-varying dimension are spread far apart. Performance is optimized when values that are accessed together are stored together, because fewer pages must be swapped in and out of memory.
Data storage may be optimized for querying or loading. To optimize for loading, list the dense dimension (such as Time) before the sparse dimensions. If there is more than one dense dimension, then list the largest one first. To optimize for querying, you need to understand how your users are querying the data.
Partitioning is an method of physically storing the measures in a cube. It improves the performance of large measures in the following ways:
Improves scalability by keeping data structures small. Each partition functions like a smaller measure.
Keeps the working set of data smaller both for queries and maintenance, since the relevant data is stored together.
Enables parallel aggregation during data maintenance. Each partition can be aggregated by a separate process.
Allows different client sessions to have write access to different partitions of the same object at the same time.
Simplifies removal of old data from storage. Old partitions can be dropped as a unit, and new partitions can be added.
Stores each partition of a compressed cube in a separate analytic workspace object. If a compressed cube is not partitioned, then all measures of the cube are stored in one object.
For partitioning a cube in Analytic Workspace Manager, you must choose a dimension and one of its levels as the basis for creating the partitions. For example, you might choose the Quarter level of the Time dimension. Each Quarter and its descendants are stored in a separate partition. If there are three years of data in the analytic workspace, then partitioning on Quarter produces 12 partitions, in addition to the default partition. The default partition contains all remaining levels, that is, those above Quarter (such as Year) and those in other hierarchies (such as Fiscal Year or Year-to-Date). The aggregate levels in the new partitions are calculated and stored in the analytic workspace as a data maintenance step, while the levels in the default partition are calculated on the fly.
Figure 3-7 illustrates the Global Time dimension partitioned by Quarter.
The number of partitions also affects the database resources that can be allocated to loading and aggregating the data in an analytic workspace. Partitions can be aggregated simultaneously when sufficient resources have been allocated, as described in "Maintaining the Data".
As this discussion explains, partitioning affects the extent to which an analytic workspace is optimized for querying or for maintenance. The fewer partitions, the more levels are precalculated (optimized for querying); the more partitions, the more levels are calculated on the fly, and the more resources can be allocated to loading and aggregating the data (optimized for maintenance).
Time is typically a good candidate for partitioning, because this choice supports life-cycle maintenance. Old time periods can be dropped as a unit in a partition, and new time periods can be added in a new partition. Moreover, the partitions will be approximately the same size because of the inherent regularity of the calendar. For example, in a calendar hierarchy, months have 28-31 children, quarters have 3 children, and years have 4 children.
However, some enterprises prefer to redeploy their analytic workspaces with new data instead of maintaining them. This is also the model used by Oracle Warehouse Builder. When life-cycle maintenance is not a factor, you should choose the most dense dimension for partitioning. The most dense dimension is frequently the one with the fewest members.
Analytic Workspace Manager enables you to define summarization rules at three different levels. You can use whatever combination of levels best suits your needs:
Cube. You can define default summarization rules for all measures in a cube. You define these rules when creating or modifying a cube.
Measure. You can define unique summarization rules for a particular measure. These rules override the default cube summarization rules. You define these rules when creating or modifying a measure.
Analytic Workspace. You can define rules for one or more measures and determine the order in which the measures are summarized. In this way, you can support dependencies among the measures. You define these rules when creating or modifying a calculation plan.
Regardless of the level at which you define the summarization rules, the decisions and the user interface are the same.
A data load typically fetches data only at the lowest, or base, level. The data cells at the higher levels are empty until the values are calculated from the base values. In analytic workspaces, aggregate data can be generated at two distinct times:
On the fly in response to a query. Calculated values may be cached for use throughout the session, but they are not shared among sessions.
As part of the build procedure. Calculated values are stored in the analytic workspace and shared by all sessions.
If your dimensions have multiple hierarchies or if the hierarchies have many levels, then fully aggregating the measures can increase the size of your analytic workspace (and thus your database) geometrically. At the same time, much of the intermediate level data may be accessed infrequently or not at all.
The most effective method of summarizing data in an analytic workspace is by storing some aggregates and calculating others on the fly. A typical strategy for doing this is called skip-level aggregation, because some levels are stored and others are skipped until runtime.
Skip-level aggregation is a strategy for identifying levels for data storage by determining the ratio of dimension members at each level, and keeping the ratio of members to be rolled up on the fly at approximately 10:1. This ratio assures that all answer sets can be returned quickly. Either a data value is stored in the analytic workspace so it can simply be retrieved, or it can be calculated quickly from 10 stored values.
This 10:1 rule is best applied with some judgment. You might want to permit a higher ratio for levels that you know are seldom accessed. Or you might want to store levels at a lower ratio if you know they have heavy use.
Slower varying dimensions take longer to aggregate because the data is scattered throughout its storage space. If you are optimizing for data maintenance, then fully aggregate the faster varying dimensions and use skip-level aggregation on the slower varying dimensions.
Aggregation rules identify how and when the aggregate values are calculated. You define the aggregation rules for each cube, and you can override these rules by defining new ones for a particular measure.
Analytic workspaces offer a large selection of aggregation methods, including scaled, weighted, hierarchical, and hierarchical weighted methods. Descriptions of these methods are provided in Analytic Workspace Manager Help.
You can specify different operators for different dimensions. If you do, then order the dimensions in the aggregation rules to achieve the results you want; for example, the sum of averages may yield different values than the average of sums.
After creating logical objects, you can map them to data sources in Oracle Database. Afterward, you can load data into your analytic workspace using the Maintenance Wizard.
The mapping window has a tabular view and a graphical view.
Tabular view. Drag-and-drop the names of individual columns from the schema navigation tree to the rows for the logical objects.
Graphical view. Drag-and-drop icons, which represent tables and views, from the schema navigation tree onto the mapping canvas. Then you draw lines from the columns to the logical objects.
If you want to see the values in a particular source table or view, right-click it in either the schema tree or the mapping canvas. Choose View Data from the menu to fetch up to 1000 rows.
Figure 3-8 shows the CHANNEL
dimension mapped in the tabular view. The toolbar appears across the top and the schema navigation tree is on the left.
Figure 3-8 Dimension Mapped in Tabular View
The following procedure explains how to map a dimension in the graphical view.
To map a dimension in the graphical view, take these steps:
Define the dimension and its levels, hierarchies, and attributes.
In the Model View navigation tree, expand the dimension folder and click Mappings.
The Mapping Window will be displayed in the right pane.
Enlarge the mapping window by dragging the divider to the left.
In the toolbar, identify the source schema as Star Schema, Snowflake Schema, or Other.
In the schema navigation tree, locate the tables with the dimension members and attributes for all levels. Drag-and-drop them onto the mapping canvas.
Draw lines from the source columns to the target objects. To draw a line, click the output connector of the source column and drag it to the input connector of the target object. Be careful to map every logical object to a source column.
Tip: For a star schema with logical names that match the column names, click Auto Map Star Schema in the toolbar. Verify that all logical objects are mapped correctly.
To uncross the lines, click the Auto Arrange Mappings tool.
Click Apply.
When you have mapped all objects for the dimension, drag the divider to the right to restore access to the navigation tree.
Figure 3-9 shows the mapping canvas with the Channel dimension and its attributes mapped to columns in the CHANNEL_DIM
table. The mapping toolbar is at the top, and the schema navigation tree is on the left.
Figure 3-9 GLOBAL CHANNEL Dimension Mapped in Graphical View
To map a cube in the graphical view, take these steps:
Define the cube and its measures.
You can define derived measures at any time, because they are calculated, not loaded.
In the Model View navigation tree, expand the Cubes folder and click Mappings.
The Mapping Window will be displayed in the right pane. You will see a schema navigation tree and a table with rows for the measures, dimensions, and levels.
Enlarge the mapping window by dragging the divider to the left.
In the schema navigation tree, locate the tables with the measures. Drag-and-drop them onto the mapping canvas.
Draw lines from the source columns to the target objects.
To draw a line, click the output connector of the source column and drag it to the input connector of the target object. You must map both the measures and the related dimension keys.
To uncross the lines, click the Auto Arrange Mappings tool.
When you have mapped all objects for the dimension, drag the divider to the right to restore access to the navigation tree.
Figure 3-10 shows the mapping canvas with the Price and Cost cube mapped to columns in the PRICE_AND_COST_HIST_FACT
table. The mapping toolbar is at the top, and the schema navigation tree is on the left.
Figure 3-10 GLOBAL PRICE_AND_COST_CUBE Cube Mapped in Graphical View
The Maintenance Wizard loads and aggregates the data as a single job. You can load all mapped objects in the analytic workspace, or individual dimensions and measures. You can also choose to run the job immediately, enter it in the Oracle job queue, or save it as a SQL script.
To maintain the data:
Right-click the name of the analytic workspace, a measure, or a dimension, then choose Maintenance Wizard from the pop-up menu.
Choose a folder that includes all the items that you want to maintain. For example, if you open the Maintenance Wizard from a particular cube, you will load that cube and summarize its measures. You will not load data or summarize other cubes.
Follow the steps of the wizard.
Click Help for additional information about each step.
Verify the results in the Data Viewer. Right-click a cube, and choose View Data from the pop-up menu.
If you submit a maintenance task to the Oracle job queue, you can specify the maximum number of simultaneous processes the job can use. This number is limited by two factors:
The number of objects in the analytic workspace that can be summarized in parallel. Each cube and each partition (including the default partition) can use a separate process.
The number of simultaneous database processes the user is authorized to run.
This number is controlled by the JOB_QUEUE_PROCESSES
parameter. The setting for this parameter is based on the number of processors, as described in "Initialization Parameters for Oracle OLAP". You can obtain the current parameter setting with the following SQL command:
SHOW PARAMETER JOB_QUEUE_PROCESSES
Specify the smaller of these two numbers when submitting a job.
Oracle Database allocates the specified number of processes (if you have sufficient authorization) regardless of whether all of them can be used simultaneously at any point in the job. For example, if your job can use up to three processes, but you specify five, then two of the processes allocated to your job cannot be used by it or any other job.
When submitting a maintenance task to the job queue, be sure to note the job number so that you can verify that the job completed successfully. Runtime messages are stored in a table named OLAPSYS.XML_LOAD_LOG
. Messages in this file are identified just by the digits in the job number. The following SQL statement returns the messages for job AWXML$_54
:
SELECT XML_MESSAGE FROM OLAPSYS.XML_LOAD_LOG WHERE XML_LOADID='54';
You can manage these jobs using tools such as Oracle Enterprise Manager Scheduler or the DBMS_SCHEDULER
PL/SQL package.
You can define a measure folder for use by OLAP tools, so that the measures can be located and identified quickly by users. They may have access to several analytic workspaces or relational schemas with measures named Sales or Costs, and they will have no means of differentiating them outside of a measure folder.
To create a measure folder:
Expand the folder for the analytic workspace.
Right-click Measure Folders, then choose Create Measure Folders from the pop-up menu.
Complete the General tab of the Create Measure Folder dialog box.
Click Help for specific information about these choices.
A single analytic workspace can support multiple languages. This support enables users of OLAP applications and tools to view the metadata in their native languages. For example, you can provide translations for the display names of measures, cubes, and dimensions. You can also map attributes to multiple columns, one for each language.
The number and choice of languages is restricted only by the database character set and your ability to provide translated text. Languages can be added or removed at any time.
To add support for multiple languages:
In the Model View navigation tree, expand the folder for the analytic workspace.
Click the Languages folder, and select the languages for the analytic workspace on the Basic tab.
For each dimension, level, hierarchy, attribute, cube, measure, calculated measure, and measure folder, open the Translations tab of the property sheet. Enter the object labels and descriptions in each language.
For each dimension, open the Mappings window. Map the attributes to columns for each language.
Calculation plans enable you to create the aggregation rules for one or more measures, as an alternative to the default aggregation plan for the cube or defining individual plans for each measure. You can also identify the order in which you want the measures aggregated when there are interdependencies.
To create a calculation plan:
Expand the folder for the analytic workspace.
Right-click Calculation Plans, then choose Create Calculation Plan from the pop-up menu.
The Create Calculation Plan dialog box is displayed.
Complete the General tab.
Click Help for specific information about these choices.
To create a new aggregation step, click Add.
The Create New Aggregation Step dialog box is displayed.
Complete all tabs, then click Create.
The new aggregation step is listed on the Calculation Plan General tab.
Click Create.
The new calculation plan appears as an item in the Calculation Plans folder.
The following case study explains the choices made in creating an analytic workspace from the GLOBAL
schema. Chapter 2 describes the tables.
This example follows best practices by creating the GLOBAL
analytic workspace in a different schema from the source tables. Example 3-1 lists the SQL commands to define the GLOBAL_AW
user with sufficient access rights to use Analytic Workspace Manager and to access the GLOBAL
star schema. Alternatively, you can define users through Oracle Enterprise Manager.
Example 3-1 SQL Script for Defining the GLOBAL_AW User
CREATE USER "GLOBAL_AW" PROFILE "DEFAULT" IDENTIFIED BY "global_aw" DEFAULT TABLESPACE "GLOBAL" TEMPORARY TABLESPACE "OLAPTEMP" QUOTA UNLIMITED ON "GLOBAL" ACCOUNT UNLOCK; GRANT OLAP_USER TO GLOBAL_AW; GRANT SELECT ON global.channel_dim TO global_aw; GRANT SELECT ON global.product_child_parent TO global_aw; GRANT SELECT ON global.customer_dim TO global_aw; GRANT SELECT ON global.time_month_dim TO global_aw; GRANT SELECT ON global.time_quarter_dim TO global_aw; GRANT SELECT ON global.time_year_dim TO global_aw; GRANT SELECT ON global.units_history_fact TO global_aw; GRANT SELECT ON global.price_and_cost_history_fact TO global_aw;
By using SQL SELECT
commands with the COUNT
and COUNT(DISTINCT)
functions, you can estimate how dense the resulting dimensional cubes will be in the analytic workspace.
The Time dimension has 96 members. However, the last 17 months have no data. These time periods and their aggregates will be used initially for forecasting and later to store actual data. These additional time periods are excluded from the following calculations because they might skew the results in such a small data set.
The PRICE_AND_COST_HISTORY_FACT
table has 2023 rows out of a possible 2844 dimension value combinations (79 historic months * 36 products). The Price cube is mapped to the PRICE_AND_COST_HISTORY_FACT
table and is over 70% dense.
The UNITS_HISTORY_FACT
table has 222,589 rows out of a possible 520,452 dimension value combinations (79 historic months * 36 products * 61 customers * 3 channels). The Units cube, which is mapped to UNITS_HISTORY_FACT
, is over 40% dense.
Because the Global data set is dense, even a regular composite should not be used.
To identify the levels to be precalculated, you must know the number of dimension members at each level. You can easily acquire this information using either SQL statements or OLAP DML commands.
For example, this SQL statement:
SELECT COUNT(DISTINCT year_id) FROM global.time_year_dim;
and this OLAP DML command in the GLOBAL
analytic workspace (after loading the dimension):
SHOW NUMLINES(LIMIT(time TO time_levelrel EQ 'YEAR'))
both return the number of TIME
dimension members at the Year level.
Global is a very small data set, so few adjacent levels have a 10:1 ratio of dimension members. Table 3-1 identifies the levels to be calculated and stored in the analytic workspace.
Table 3-1 Precalculated Levels in the Global Workspace
Dimension | Level | Members | Precalculate |
---|---|---|---|
|
|
96 |
Yes |
|
|
32 |
No |
|
|
8 |
Yes |
|
|
61 |
Yes |
|
|
24 |
No |
|
|
5 |
Yes |
|
|
1 |
No |
|
|
11 |
No |
|
|
3 |
Yes |
|
|
1 |
No |
|
|
48 |
No |
|
|
3 |
Yes |
|
|
1 |
No |
Take these steps to create the GLOBAL
analytic workspace:
Open Analytic Workspace Manager and connect to Oracle Database as the GLOBAL_AW
user.
In the Model View navigation tree, expand the GLOBAL_AW
folder, and right-click Analytic Workspaces.
Choose Create Analytic Workspace from the pop-up menu.
Complete the Create Analytic Workspace dialog box, then choose Create.
This step creates the analytic workspace container and populates it with standard form catalogs and similar objects. You must now define the logical model.
GLOBAL
has four dimensions: TIME
, PRODUCT
, CUSTOMER
, and CHANNEL
. Implement the logical model described in Chapter 2 by following the basic instructions in "Creating Logical Dimensions".
Note these choices:
Time Dimension: On the General tab, select Time Dimension as the dimension type. You can map Time to a star schema (TIME_DIM
table) or to a snowflake schema (TIME_MONTH_DIM
, TIME_QUARTER_DIM
, and TIME_YEAR_DIM
tables).
Product Dimension: You can map Product to a star, level-based table (PRODUCT_DIM
) or to a parent-child table (PRODUCT_CHILD_PARENT
).
All Dimensions: On the Implementation Details tab, select Use Natural Keys From Data Source.
The source tables have numeric surrogate keys that assure unique dimension members across all levels.
All Attributes: On the General tab, verify that the attributes apply to all levels.
Languages: Add French and Dutch.
GLOBAL
has two cubes: UNITS_CUBE
and PRICE_AND_COST_CUBE
.
UNITS_CUBE
is dimensioned by TIME
, PRODUCT
, CUSTOMER
, and CHANNEL
. It contains two measures, UNITS
and SALES
.
PRICE_AND_COST_CUBE
is dimensioned by TIME
and PRODUCT
. It contains two measures, UNIT_PRICE
and UNIT_COST
.
Implement the logical model described in Chapter 2 by following the basic instructions in "Creating Logical Cubes".
UNITS_CUBE
On the Implementation Details page, list the dimensions in this order:
TIME
CUSTOMER
PRODUCT
CHANNEL
Deselect the sparsity check boxes for all dimensions. They are dense.
On the Aggregation page, select the SUM
operator for all dimensions. Use Table 3-1 to select levels for presummarization.
PRICE_AND_COST_CUBE
On the Implementation Details page, list the dimensions in this order:
TIME
PRODUCT
Measures in the Price Cube and the Units Cube will be used together frequently in calculated measures. For performance, the dimensions that the cubes share must be listed in the same order.
Deselect the sparsity check boxes for all dimensions. They are dense.
On the Aggregation page, select Last Non-NA Data Value for Time and Average for Product.
The data for the GLOBAL
analytic workspace is stored in the GLOBAL
schema.
To map the PRODUCT
dimension, take these steps:
Expand the Dimensions folder, then click the Mappings node for PRODUCT
.
Drag the divider to the left to expand the size of the mapping canvas.
In the schema navigation tree, expand the GLOBAL
folder, then drag-and-drop the PRODUCT_CHILD_PARENT
table onto the canvas.
Drag a line from the output connectors in the PRODUCT_CHILD_PARENT
table to the appropriate input connector in the PRODUCT
table.
Click Apply.
Repeat these steps to map CUSTOMER
to the CUSTOMER_DIM
table and CHANNEL
to the CHANNEL_DIM
table. For TIME
, select Snowflake Schema and map to TIME_MONTH_DIM
, TIME_QUARTER_DIM
, and TIME_YEAR_DIM
.
To map UNITS_CUBE
, take these steps:
Expand the Cubes folder, then click the Mappings node for UNITS_CUBE
.
Drag the divider to the left to expand the size of the mapping canvas.
In the schema navigation tree, expand the GLOBAL
folder, then drag-and-drop the UNITS_DETAIL_FACT
table onto the canvas.
Drag lines from the output connectors in the UNITS_DETAIL_FACT
table to the appropriate input connectors in the UNITS_CUBE
table.
Click Apply.
Repeat these steps to map PRICE_AND_COST_CUBE
to the PRICE_AND_COST_HIST_FACT
table.
To load all of the data for GLOBAL
, run the Maintenance Wizard as described in "Maintaining the Data". Note these choices:
Run the Maintenance Wizard from the GLOBAL
folder in the Model navigation tree.
Select Objects page: Select the Add the Dimensions of the Cube box, then move Cubes to the Selected Source Objects column. Click Finish to run the job immediately.
Figure 3-11 shows the results of a query in OracleBI Discoverer Plus OLAP. **Replace this pic***
Figure 3-11 Discoverer Plus OLAP Displays Data from PRICE_AND_COST_CUBE
"Identifying Required Business Facts" identifies the business measures required by the Global Corporation. Only three measures were acquired from the source fact tables: Units, Unit Price, and Unit Cost. The remaining business measures can be calculated from those three. Table 3-2 shows the calculated measures for the Units Cube.
Table 3-2 Custom Measures for the GLOBAL Analytic Workspace
Required Business Measures | Calculation Type | Based On Measures |
---|---|---|
Sales |
Basic Arithmetic > Multiplication |
|
Extended Cost |
Basic Arithmetic > Multiplication |
|
Extended Margin |
Basic Arithmetic > Subtraction |
|
Change in sales from prior period (month, quarter, or year) Change in sales from prior year |
Prior/Future Comparison > Difference from Prior Period |
|
Percent change in sales from prior period Percent change in sales from prior year |
Prior/Future Comparison > Percent Difference from Prior Period |
|
Product share |
Advanced Arithmetic > Share |
|
Channel share |
Advanced Arithmetic > Share |
|
Market share |
Advanced Arithmetic > Share |
|
Extended margin change from prior period Extended margin change from prior year |
Prior/Future Comparison > Difference from Prior Period |
|
Extended margin percent change from prior period Extended margin percent change from prior year |
Prior/Future Comparison > Percent Difference from Prior Period |
|
Units sold, change from prior period |
Prior/Future Comparison > Difference from Prior Period |
|
Extended margin per unit |
Basic Arithmetic > Division |
|
Sales History (SH
) is a sample star schema that is delivered with Oracle Database, along with OLAP Catalog metadata for access directly to the relational tables by OLAP query tools. Although Global is used for most of the examples in this manual, Sales History has a very different set of data characteristics and demonstrates a correspondingly different set of build choices.
You can download a template for a Sales History analytic workspace from
http://www.oracle.com/technology/products/bi/olap/olap.html
Figure 3-12 shows a schema diagram of Sales History.
Take these steps to create the SH
analytic workspace:
Define database parameters for OLAP.
Create permanent and temporary tablespaces specifically for use by the SH analytic workspace.
Define the SH_AW
user.
Open Analytic Workspace Manager and connect to Oracle Database as the SH_AW
user.
Create the SH
analytic workspace, and define the logical dimensions.
Examine the sparsity characteristics of the data and define the logical cube.
Map, load, and summarize the data.
Query the analytic workspace.
When building a large analytic workspace, the parameters for Oracle Database may affect how quickly the build proceeds. Before changing any database parameters, you should monitor performance using the default settings.
Example 3-2 shows a few of the settings in the init.ora
file for a computer with 32G of physical memory and four processors. Note that you must define an undo tablespace before you can specify it in a startup parameter. For more information about these settings, refer to Chapter 6.
While the GLOBAL
analytic workspace has about a half million cells for base-level data in its largest cube, the Sales History SALES
cube has over 18 trillion. This makes the Sales History analytic workspace small to average for a real application, although quite large for a sample data set. It is sufficiently large for a build to fail on a small desktop computer unless resources have been allocated for its use.
You should define temporary and permanent tablespaces for use by Sales History.
Define a tablespace that is large enough to hold the base-level data, stored aggregates, forecast data, and so forth. If multiple physical disks are available, define an extension file for each one. For the best performance, do not use the same tablespace that the star schema uses.
Define a temporary tablespace that is large enough to hold the data for the SALES
cube. Stripe this tablespace across multiple disks the same as for the permanent tablespace. Use a small EXTENT MANAGEMENT SIZE
value, such as 256K
.
Example 3-3 shows how the tablespaces might be defined for Sales History when four disk drives are available.
Example 3-3 SQL Script for Defining Tablespaces for the Sales History Analytic Workspace
/* Create permanent tablespaces on four disks */ CREATE TABLESPACE sh_aw DATAFILE '/disk1/oradata/sh_aw1.dbf' SIZE 64M AUTOEXTEND ON NEXT 64M MAXSIZE 1024M EXTENT MANAGEMENT LOCAL SEGMENT SPACE MANAGEMENT AUTO; ALTER TABLESPACE sh_aw ADD DATAFILE '/disk2/oradata/sh_aw2.dbf' SIZE 64M REUSE AUTOEXTEND ON NEXT 64M MAXSIZE 1024M, '/disk3/oradata/sh_aw3.dbf' SIZE 64M REUSE AUTOEXTEND ON NEXT 64M MAXSIZE 1024M, '/disk4/oradata/sh_aw4.dbf' SIZE 64M REUSE AUTOEXTEND ON NEXT 64M MAXSIZE UNLIMITED; /* Create temporary tablespaces on four disks */ CREATE TEMPORARY TABLESPACE sh_temp TEMPFILE '/disk1/oradata/sh_aw1.tmp' SIZE 64M REUSE AUTOEXTEND ON NEXT 64M MAXSIZE 1024M EXTENT MANAGEMENT LOCAL UNIFORM SIZE 256K; ALTER TABLESPACE sh_temp ADD TEMPFILE '/disk2/oradata/sh_aw2.tmp' SIZE 64M REUSE AUTOEXTEND ON NEXT 64M MAXSIZE 1024M, '/disk3/oradata/sh_aw3.tmp' SIZE 64M REUSE AUTOEXTEND ON NEXT 64M MAXSIZE 1024M, '/disk4/oradata/sh_aw4.tmp' SIZE 64M REUSE AUTOEXTEND ON NEXT 64M MAXSIZE UNLIMITED;
Example 3-4 shows a script that is similar to the one used to create the GLOBAL_AW
user in Example 3-1. It defines a user named SH_AW
and authorizes it to access the SH
star schema. The script sets the new permanent and temporary tablespaces as the defaults for the SH_AW
user.
Example 3-4 Script for Creating the SH_AW User
/* Create the user and grant privileges */ CREATE USER sh_aw PROFILE "DEFAULT" IDENTIFIED BY "sh_aw" DEFAULT TABLESPACE sh_perm TEMPORARY TABLESPACE sh_temp QUOTA UNLIMITED ON sh_perm ACCOUNT UNLOCK; GRANT OLAP_USER TO sh_aw; /* Create a directory object*/ CREATE OR REPLACE DIRECTORY sh_scripts AS '/users/oracle/sh_scripts'; GRANT ALL ON DIRECTORY sh_scripts TO PUBLIC; /* Grant access to SH star schema */ GRANT SELECT ON SH.CHANNELS to SH_AW; GRANT SELECT ON SH.PRODUCTS to SH_AW; GRANT SELECT ON SH.TIMES to SH_AW; GRANT SELECT ON SH.CUSTOMERS to SH_AW; GRANT SELECT ON SH.COUNTRIES to SH_AW; GRANT SELECT ON SH.PROMOTIONS to SH_AW; GRANT SELECT ON SH.SALES to SH_AW;
Because Sales History is a star schema, the logical model for the analytic workspace is primarily indicated by the schema design, as shown in Figure 3-12.
The two fact tables, SALES
and COSTS
, are the data sources for two logical cubes. This case study only uses SALES
.
The SALES
table has a primary key composed of foreign keys from five dimension tables, which are named TIMES
, PRODUCTS
, CHANNELS
, PROMOTIONS
, and CUSTOMERS
. CUSTOMERS
is related to a sixth dimension table, COUNTRIES
, by a foreign key. In addition, SALES
has two columns that contain business measures named QUANTITY_SOLD
and AMOUNT_SOLD
. Thus, the star schema defines a logical SALES
cube with five dimensions and two measures for the analytic workspace.
The Times table has a numeric surrogate key for each level, so you can specify natural keys as an implementation detail for TIMES_DIM
.
Each level in a Time dimension must have time-span and end-date attributes. However, the Times table does not have this data for Day or Fiscal Week. One way to correct this problem is to add the columns to the Times table, using SQL statements like the following:
ALTER TABLE times ADD ( days_in_day NUMBER(1) DEFAULT 1, days_in_week NUMBER(1) DEFAULT 7 );
When you have finished mapping the dimension, run the Maintenance Wizard to load the members and attributes. Because they load quickly, you can run the job immediately (instead of in the job queue) to verify that the mappings are correct.
The Customers and Countries tables are related on the Countries key column, and together they support two hierarchies, CUST_ROLLUP
and GEOG_ROLLUP
. Because the two hierarchies share two aggregate levels (CITY
and STATE
), you must generate surrogate keys in the analytic workspace so that each hierarchy has unique dimension members. Otherwise, a single set of aggregates might not be correct for both hierarchies.
Only 7,059 customers have sales data of the 55,500 listed in the Customers table, as shown in Example 3-7. You can choose the way you implement CUSTOMERS_DIM
:
Load all of the customers into the analytic workspace, regardless of their purchasing history. This case study implements this choice.
Create a view of the Customers table with a WHERE
clause in the SELECT
statement that filters the customers so that only those who have made purchases are included in the analytic workspace. Map CUSTOMERS_DIM
to the new view.
Define City as the base level; do not map the Customer level or its attributes. Create a view of the SALES
table with a GROUP BY
clause in the SELECT
statement that aggregates the data to the CITY
level. This choice is appropriate only if data at the Customer level is not needed for analysis.
When you have finished mapping the dimension, run the Maintenance Wizard to load the members and attributes. Because they load quickly, you can run the job immediately (instead of in the job queue) to verify that the mappings are correct.
The three remaining dimensions do not present any new challenges. Their source tables can be identified as star schema in the Mappings canvas, because all levels and attributes are in a single source table.
The measures in the Sales cube use only 4 of the 503 promotions listed in the PROMOTIONS_DIM
dimension table. You have the same choices for handling this dimension as you did for the CUSTOMERS_DIM
dimension, which also has a large percentage of unused key values.
The definition of a cube involves decisions that affect performance. Unlike Global, the Sales History data set is fairly large and sparse like many real data sets. It is a good candidate for using the Sparsity Advisor. The Sparsity Advisor analyzes the sparsity characteristics of the data as it is stored in the relational source tables.
The Sparsity Advisor consists of several subprograms in the DBMS_AW
PL/SQL package.
SPARSITY_ADVICE_TABLE
procedure creates a table for storing the advice generated by the ADVISE_SPARSITY
procedure.
ADD_DIMENSION_SOURCE
procedure populates a table type named DBMS_AW$_DIMENSION_SOURCES_T
with information about the dimensions of a cube. This information is analyzed by the ADVISE_SPARSITY
procedure.
ADVISE_SPARSITY
procedure analyzes a fact table for sparsity using information about its dimensions provided by the ADD_DIMENSION_SOURCE
procedure. It populates a table created by the SPARSITY_ADVICE_TABLE
procedure with the results of its analysis.
ADVISE_DIMENSIONALITY
function returns an OLAP DML definition of a composite dimension and the dimension order for variables in the cube, based on the sparsity recommendations generated by the ADVISE_SPARSITY
procedure for a particular partition.
ADVISE_DIMENSIONALITY
procedure evaluates the information provided by the ADVISE_SPARSITY
procedure and generates the OLAP DML commands for defining a composite and a variable in the analytic workspace.
It returns its results in two forms: as a table with its recommendations and as OLAP DML commands that implement these recommendations. When creating an analytic workspace using Analytic Workspace Manager, you cannot use the OLAP DML commands directly. However, you can use the table or the commands (or both) to guide your choices on the property sheets.
See Also: Oracle OLAP Reference for complete information about the Sparsity Advisor |
Example 3-5 shows a sample PL/SQL program for evaluating the sparsity characteristics of the tables in the SH
schema. The ADD_DIMENSION_SOURCE
procedure provides the Sparsity Advisor with information about the dimensions, levels, and hierarchies.
Example 3-5 PL/SQL Program for Using the Sparsity Advisor on Sales History
--Connect and set display parameters CONNECT sh/sh SET ECHO ON SET LINESIZE 300 SET PAGESIZE 300 SET LONG 8000 SET SERVEROUT ON FORMAT WRAPPED --Create a table for results BEGIN dbms_aw.sparsity_advice_table('sh_sparsity_advice'); EXCEPTION WHEN OTHERS THEN NULL; END; / TRUNCATE TABLE sh_sparsity_advice; --Define program variables DECLARE dimsources dbms_aw$_dimension_sources_t; dimlist VARCHAR2(500); sparsedim VARCHAR2(500); counter NUMBER(2) := 1; maxpart NUMBER(2); defs CLOB; BEGIN --Describe the dimension hierarchies dbms_aw.add_dimension_source('channel', 'channel_id', dimsources, 'channels', dbms_aw.hier_levels, dbms_aw$_columnlist_t('channel_id', 'channel_class_id', 'channel_total_id')); dbms_aw.add_dimension_source('product', 'prod_id', dimsources, 'products', dbms_aw.hier_levels, dbms_aw$_columnlist_t('prod_id', 'prod_subcategory_id', 'prod_category_id', 'prod_total_id')); dbms_aw.add_dimension_source('customer', 'cust_id', dimsources, 'customers', dbms_aw.hier_levels, dbms_aw$_columnlist_t('cust_id', 'cust_city_id', 'cust_state_province_id', 'cust_total_id')); dbms_aw.add_dimension_source('time', 'time_id', dimsources, 'times', dbms_aw.hier_levels, dbms_aw$_columnlist_t('time_id', 'calendar_month_id','calendar_quarter_id', 'calendar_year_id')); dbms_aw.add_dimension_source('promotion', 'promo_id', dimsources, 'promotions', dbms_aw.hier_levels, dbms_aw$_columnlist_t('promo_id', 'promo_subcategory_id', 'promo_category_id', 'promo_total_id')); --Analyze tables using default settings dbms_aw.advise_sparsity('sales', 'sales_cube', dimsources, dbms_aw.advice_default, dbms_aw.partby_default, 'sh_sparsity_advice'); commit; -- Get recommendations as OLAP DML commands select max(partnum) into maxpart from sh_sparsity_advice; WHILE counter <= maxpart LOOP dimlist := dbms_aw.advise_dimensionality('sales_cube', sparsedim, 'sales_cube_composite', counter, 'sh_sparsity_advice'); dbms_output.put_line('Dimension list: ' || dimlist); dbms_output.put_line('Sparse dimension: ' || sparsedim); counter := counter+1; END LOOP; dbms_aw.advise_dimensionality(defs,'sales_cube', 'sales_cube_composite', 'DECIMAL', 'sh_sparsity_advice'); dbms_output.put_line('Definitions: '); dbms_aw.printlog(defs); END; / -- Get recommendations directly from the output table COLUMN cubename FORMAT a8 COLUMN fact FORMAT a20 COLUMN dimension FORMAT a12 COLUMN dimcolumn FORMAT a12 COLUMN dimsource FORMAT a12 COLUMN nmem FORMAT 99999 COLUMN nleaf FORMAT 99999 COLUMN advice FORMAT a12 COLUMN pos FORMAT 99 COLUMN density FORMAT 9.9999 COLUMN partnum FORMAT 99 COLUMN partby FORMAT a20 COLUMN parttops FORMAT a20 --Get basic information about the dimensions SELECT DISTINCT(dimension), dimcolumn, position pos, membercount nmem, leafcount nleaf FROM sh_sparsity_advice WHERE cubename='sales_cube' ORDER BY position; --Get partitioning advice about the dimensions SELECT dimcolumn, advice, partnum, parttops FROM sh_sparsity_advice WHERE cubename='sales_cube' ORDER BY partnum; --Identify the partition tops SELECT DISTINCT(calendar_year_id), calendar_year FROM times ORDER BY calendar_year;
The Sparsity Advisor makes very detailed recommendations, but you can derive general recommendations that can be implemented in Analytic Workspace Manager:
List the dimensions in this order: TIME
, CHANNEL
, PRODUCT
, PROMOTION
, CUSTOMER
.
Include all dimensions in a compressed composite.
Create five partitions for the five years of data.
On the Implementation Details page for SALES_CUBE
, do the following to implement these recommendations:
In the Dimension Order and Sparsity table, list the dimensions in the recommended order and mark all of them as sparse.
Select Use Compression.
Select Partition Cube.
For partitioning, select the TIMES_DIM
dimension, CAL_ROLLUP
hierarchy, and YEAR
level.
Example 3-6 shows the recommendations as OLAP DML commands. The dimensions are listed in order within angle brackets (<>), and definition of the sparse dimension shows all dimensions in a compressed composite. The partition template creates five partitions on TIME
.
Example 3-6 Recommendations for Sales History in OLAP DML
Dimension list: <sales_cube_composite<time channel product promotion customer>> Sparse dimension: DEFINE sales_cube_composite COMPOSITE COMPRESSED <time channel product promotion customer> DEFINE sales_cube_pt PARTITION TEMPLATE <time channel product promotion customer> - PARTITION BY LIST (time) - (PARTITION p1 VALUES () <sales_cube_composite_p1<>> - PARTITION p2 VALUES () <sales_cube_composite_p2<>> - PARTITION p3 VALUES () <sales_cube_composite_p3<>> - PARTITION p4 VALUES () <sales_cube_composite_p4<>> - PARTITION p5 VALUES () <sales_cube_composite_p5<>>) . . .
The Sparsity Advisor bases its OLAP DML commands on the information stored in the output table, which in this example is named SH_SPARSITY_ADVICE
. You can also view this information directly in the table.
Example 3-7 queries SH_SPARSITY_ADVICE
. The first SELECT
statement retrieves basic information about the dimensions. The POSITION
column identifies their recommended order. MEMBERCOUNT
shows the total number of dimension members defined in the dimension tables, and LEAFCOUNT
shows the number with data in the SALES
fact table.
The second SELECT
statement retrieves partitioning advice. It shows five partitions based on Time, and compression is the recommendation for four of the five. A SELECT
statement on the TIMES
table shows that the partition tops are the five years.
Example 3-7 Recommendations for Sales History in the Advice Table
SELECT distinct(dimension), dimcolumn, position, membercount, leafcount FROM sh_sparsity_advice WHERE cubename='sales_cube' ORDER BY position; DIMENSION DIMCOLUMN POSITION MEMBERCOUNT LEAFCOUNT ------------ ------------ -------- ----------- --------- time time_id 1 1826 1460 channel channel_id 2 5 4 product prod_id 3 72 72 promotion promo_id 4 503 4 customer cust_id 5 55500 7059 SELECT dimcolumn, advice, partnum, parttops FROM sh_sparsity_advice WHERE cubename='sales_cube' ORDER BY partnum; DIMCOLUMN ADVICE PARTNUM PARTTOPS ------------ ------------ ------- -------------------- time_id COMPRESSED 1 1803 channel_id COMPRESSED 1 prod_id COMPRESSED 1 promo_id COMPRESSED 1 cust_id COMPRESSED 1 time_id COMPRESSED 2 1805 channel_id COMPRESSED 2 prod_id COMPRESSED 2 promo_id COMPRESSED 2 cust_id COMPRESSED 2 time_id COMPRESSED 3 1804 channel_id COMPRESSED 3 prod_id COMPRESSED 3 promo_id COMPRESSED 3 cust_id COMPRESSED 3 time_id COMPRESSED 4 1813 channel_id COMPRESSED 4 prod_id COMPRESSED 4 promo_id COMPRESSED 4 cust_id COMPRESSED 4 time_id SPARSE 5 1802 channel_id SPARSE 5 prod_id SPARSE 5 promo_id SPARSE 5 cust_id SPARSE 5 25 rows selected. SELECT DISTINCT(calendar_year_id), calendar_year FROM times ORDER BY calendar_year; CALENDAR_YEAR_ID CALENDAR_YEAR ---------------- ------------- 1802 1998 1803 1999 1804 2000 1805 2001 1813 2002
When building the cube, submit the maintenance task to the job queue, either to run immediately or at a later time. If you are running Oracle Database on a single-processor computer, keep the number of processes at 1. Otherwise, check the value of JOB_QUEUE_PROCESSES
to see how many jobs you can run simultaneously. As defined in this example, Sales History can use up to six processes (1 cube + 5 partitions).