Oracle® OLAP Reference 10g Release 2 (10.2) Part Number B14350-01 |
|
|
View PDF |
Data can be stored in several different forms in an analytic workspace, depending on whether it is dense, sparse, or very sparse. The Sparsity Advisor is a group of subprograms in DBMS_AW
that you can use to analyze the relational source data and get recommendations for storing it in an analytic workspace.
Analytic workspaces analyze and manipulate data in a multidimensional format that allocates one cell for each combination of dimension members. The cell can contain a data value, or it can contain an NA
(null). Regardless of its content, the cell size is defined by the data type, for example, every cell in a DECIMAL
variable is 8 bytes.
Variables can be either dense (they contain 30% or more cells with data values) or sparse (less than 30% data values). Most variables are sparse and many are extremely sparse.
Although data can also be stored in the multidimensional format used for analysis, other methods are available for storing sparse variables that make more efficient use of disk space and improve performance. Sparse data can be stored in a variable defined with a composite dimension. A composite has as its members the dimension-value combinations (called tuples) for which there is data. When a data value is added to a variable dimensioned by a composite, that action triggers the creation of a composite tuple. A composite is an index into one or more sparse data variables, and is used to store sparse data in a compact form. Very sparse data can be stored in a variable defined with a compressed composite, which uses a different algorithm for data storage from regular composites.
In contrast to dimensional data, relational data is stored in tables in a very compact format, with rows only for actual data values. When designing an analytic workspace, you may have difficulty manually identifying sparsity in the source data and determining the best storage method. The Sparsity Advisor analyzes the source data in relational tables and recommends a storage method. The recommendations may include the definition of a composite and partitioning of the data variable.
The Sparsity Advisor consists of these procedures and functions:
The Sparsity Advisor also provides a public table type for storing information about the dimensions of the facts being analyzed. Three objects are used to define the table type:
DBMS_AW$_COLUMNLIST_T
DBMS_AW$_DIMENSION_SOURCE_T
DBMS_AW$_DIMENSION_SOURCES_T
The following SQL DESCRIBE
statements show the object definitions.
SQL> describe dbms_aw$_columnlist_t dbms_aw$_columnlist_t TABLE OF VARCHAR2(100) SQL> describe dbms_aw$_dimension_source_t Name Null? Type ----------------------------------------- -------- ---------------------------- DIMNAME VARCHAR2(100) COLUMNNAME VARCHAR2(100) SOURCEVALUE VARCHAR2(32767) DIMTYPE NUMBER(3) HIERCOLS DBMS_AW$_COLUMNLIST_T PARTBY NUMBER(9) SQL> describe dbms_aw$_dimension_sources_t dbms_aw$_dimension_sources_t TABLE OF DBMS_AW$_DIMENSION_SOURCE_T
Take these steps to use the Sparsity Advisor:
Call SPARSITY_ADVICE_TABLE
to create a table for storing the evaluation of the Sparsity Advisor.
Call ADD_DIMENSION_SOURCE
for each dimension related by one or more columns to the fact table being evaluated.
The information that you provide about these dimensions is stored in a DBMS_AW$_DIMENSION_SOURCES_T
variable.
Call ADVISE_SPARSITY
to evaluate the fact table.
Its recommendations are stored in the table created by SPARSITY_ADVICE_TABLE
. You can use these recommendations to make your own judgements about defining variables in your analytic workspace, or you can continue with the following step.
Call the ADVISE_DIMENSIONALITY
procedure to get the OLAP DML object definitions for the recommended composite, partitioning, and variable definitions.
or
Use the ADVISE_DIMENSIONALITY
function to get the OLAP DML object definition for the recommended composite and the dimension order for the variable definitions for a specific partition.
Example 24-1 provides a SQL script for evaluating the sparsity of the UNITS_HISTORY_FACT
table in the GLOBAL
schema. In the GLOBAL
analytic workspace, UNITS_HISTORY_FACT
defines the Units Cube and will be the source for the UNITS
variable. UNITS_HISTORY_FACT
is a fact table with a primary key composed of foreign keys from four dimension tables. A fifth column contains the facts for Unit Sales.
The CHANNEL_DIM
and CUSTOMER_DIM
tables contain all of the information for the Channel and Customer dimensions in a basic star configuration. Three tables in a snowflake configuration provide data for the Time dimension: MONTH_DIM
, QUARTER_DIM
, and YEAR_DIM
. The PRODUCT_CHILD_PARENT
table is a parent-child table and defines the Product dimension.
Example 24-1 Sparsity Advisor Script for GLOBAL
CONNECT global/global SET ECHO ON SET LINESIZE 300 SET PAGESIZE 300 SET SERVEROUT ON FORMAT WRAPPED -- Define and initialize an advice table named GLOBAL_SPARSITY_ADVICE BEGIN dbms_aw.sparsity_advice_table(); EXCEPTION WHEN OTHERS THEN NULL; END; / TRUNCATE TABLE aw_sparsity_advice; DECLARE dimsources dbms_aw$_dimension_sources_t; dimlist VARCHAR2(500); sparsedim VARCHAR2(500); defs CLOB; BEGIN -- Provide information about all dimensions in the cube dbms_aw.add_dimension_source('channel', 'channel_id', dimsources, 'channel_dim', dbms_aw.hier_levels, dbms_aw$_columnlist_t('channel_id', 'total_channel_id')); dbms_aw.add_dimension_source('product', 'item_id', dimsources, 'product_child_parent', dbms_aw.hier_parentchild, dbms_aw$_columnlist_t('product_id', 'parent_id')); dbms_aw.add_dimension_source('customer', 'ship_to_id', dimsources, 'customer_dim', dbms_aw.hier_levels, dbms_aw$_columnlist_t('ship_to_id', 'warehouse_id', 'region_id', 'total_customer_id')); dbms_aw.add_dimension_source('time', 'month_id', dimsources, 'SELECT m.month_id, q.quarter_id, y.year_id FROM time_month_dim m, time_quarter_dim q, time_year_dim y WHERE m.parent=q.quarter_id AND q.parent=y.year_id', dbms_aw.hier_levels, dbms_aw$_columnlist_t('month_id', 'quarter_id', 'year_id')); -- Analyze fact table and provide advice without partitioning dbms_aw.advise_sparsity('units_history_fact', 'units_cube', dimsources, dbms_aw.advice_default, dbms_aw.partby_none); commit; -- Generate OLAP DML for composite and variable definitions dimlist := dbms_aw.advise_dimensionality('units_cube', sparsedim, 'units_cube_composite'); dbms_output.put_line('Dimension list: ' || dimlist); dbms_output.put_line('Sparse dimension: ' || sparsedim); dbms_aw.advise_dimensionality(defs, 'units_cube'); dbms_output.put_line('Definitions: '); dbms_aw.printlog(defs); END; /
The script in Example 24-1 generates the following information.
Dimension list: <channel units_cube_composite<product customer time>> Sparse dimension: DEFINE units_cube_composite COMPOSITE <product customer time> Definitions: DEFINE units_cube.cp COMPOSITE <product customer time> DEFINE units_cube NUMBER VARIABLE <channel units_cube.cp<product customer time>> PL/SQL procedure successfully completed.
This SQL SELECT
statement shows some of the columns from the GLOBAL_SPARSITY_ADVICE
table, which is the basis for the recommended OLAP DML object definitions.
SELECT fact, dimension, dimcolumn, membercount nmem, leafcount nleaf, advice, density from aw_sparsity_advice WHERE cubename='units_cube'; FACT DIMENSION DIMCOLUMN NMEM NLEAF ADVICE DENSITY -------------------- ------------ ------------ ----- ------ ------------ -------- units_history_fact channel channel_id 3 3 DENSE .46182 units_history_fact product item_id 48 36 SPARSE .94827 units_history_fact customer ship_to_id 61 61 SPARSE .97031 units_history_fact time month_id 96 79 SPARSE .97664