7 Using the Java API

This chapter provides information to help you get started using the Oracle Data Mining Java API. It describes the general design of the API, and it explains how to use the API to perform major mining operations in your application.

See Also:

Oracle Data Mining Java API Reference (javadoc).
JDM 1.0 javadoc at http://www.oracle.com/technology/products/bi/odm

This chapter includes the following topics:

The Java Sample Applications
Setting up Your Development Environment
Connecting to the Data Mining Server
API Design Overview
Describing the Mining Data
Build Settings
Executing Mining Tasks
Building a Mining Model
Exploring Model Details
Testing a Model
Applying a Model for Scoring Data
Using a Cost Matrix
Using Prior Probabilities
Using Automated Prediction and Explain Tasks
Preparing the Data

7.1 The Java Sample Applications

The samples included in this chapter are taken from the Data Mining sample applications available on the Database companion CD. When you install the companion CD, the Data Mining sample applications are copied to the following directory.

$ORACLE_HOME/rdbms/demo     (on Unix)
or
(%ORACLE_HOME%\rdbms\demo     (on NT)

To obtain a listing of the sample applications , simply type the following on Unix:

ls $ORACLE_HOME/rdbms/demo/dm*

Use an equivalent command on other operating systems.

Table 7-1 lists the Java sample applications.

Table 7-1 The Java Sample Applications for Data Mining

Application	Description
`dmabdemo.java`	Creates an Adaptive Bayes Network model (classification).
`dmaidemo.java`	Creates an Attribute Importance model.
`dmardemo.java`	Creates an Association Rules model.
`dmtreedemo.java`	Creates a Decision Tree model (classification).
`dmkmdemo.java`	Creates a k_means model (clustering).
`dmnbdemo.java`	Creates a Naive Bayes model (classification).
`dmnmdemo.java`	Creates a Non_Negative Matrix Factorization model (feature extraction).
`dmocdemo.java`	Creates an O-Cluster model (clustering).
`dmsvcdemo.java`	Creates a Support Vector Machine model (classification).
`dmsvodemo.java`	Creates a Support Vector Machine model (one-class classification).
`dmsvrdemo.java`	Creates a Support Vector Machine model (regression).
`dmtxtnmfdemo.java`	Text mining using NMF feature extraction.
`dmtxtsvmdemo.java`	Text mining using SVM classification.
`dmxfdemo.java`	Transformations using the Java API.
`dmpademo.java`	Predictive Analytics using the Java API.
`dmapplydemo.java`	Apply classification model in different ways.
`dmexpimpdemo.java`	Native import/export of models.

See Also:

Oracle Data Mining Administrator's Guide for information about installing, running, and viewing the sample programs.

7.2 Setting up Your Development Environment

The ODM Java API requires Oracle Database 10g Release 2 (10.2)and J2SE 1.4.2.

To use the ODM Java API, include the following libraries in your CLASSPATH:

$ORACLE_HOME/rdbms/jlib/jdm.jar
$ORACLE_HOME/rdbms/jlib/ojdm_api.jar
$ORACLE_HOME/rdbms/jlib/xdb.jar
$ORACLE_HOME/jdbc/lib/ojdbc14.jar
$ORACLE_HOME/oc4j/j2ee/home/lib/connector.jar
$ORACLE_HOME/jlib/orai18n.jar 
$ORACLE_HOME/jlib/orai18n-mapping.jar 
$ORACLE_HOME/lib/xmlparserv2.jar

7.3 Connecting to the Data Mining Server

The first job of a data mining application is to connect to the Data Mining Server (DMS), which is the data mining engine and metadata repository within the Oracle Database.

Note:

The JDM API uses the general term DME (Data Mining Engine). In the ODM Java API, the term DME refers to the Oracle DMS.

The DMS connection is encapsulated in a Connection object, which provides the framework for a data mining application. The Connection object serves the following purposes:

Authenticates users
Supports retrieval and storage of named objects
Supports the execution of mining tasks
Provides version information for the JDM implementation and provider

The DMS Connection object is described in detail in "Features of a DMS Connection".

7.3.1 Connection Factory

A Connection is created from a ConnectionFactory, an interface provided by the JDM standard API. You can lookup a ConnectionFactory from the JNDI server, or you can create a ConnectionFactory using an OraConnectionFactory object.

7.3.1.1 Create a ConnectionFactory Using OraConnectionFactory

//Create OraConnectionFactory
javax.datamining.resource.ConnectionFactory connFactory = 
                 oracle.dmt.jdm.resource.OraConnectionFactory();

7.3.1.2 Create a ConnectionFactory From the JNDI Server

//Setup the initial context to connect to the JNDI server
Hashtable env = new Hashtable();
env.put( Context.INITIAL_CONTEXT_FACTORY,
"oracle.dmt.jdm.resource.OraConnectionFactory" );
env.put( Context.PROVIDER_URL, "http://myHost:myPort/myService" );
env.put( Context.SECURITY_PRINCIPAL, "user" );
env.put( Context.SECURITY_CREDENTIALS, "password" );
InitialContext jndiContext = new javax.naming.InitialContext( env );
// Perform JNDI lookup to obtain the connection factory
javax.datamining.resource.ConnectionFactory dmeConnFactory =
(ConnectionFactory) jndiContext.lookup("java:comp/env/jdm/MyServer");
//Lookup ConnectionFactory
javax.datamining.resource.ConnectionFactory connFactory = 
  (ConnectionFactory) jndiContext.lookup("java:comp/env/jdm/MyServer");

7.3.2 Managing the DMS Connection

You can choose to pre-create the JDBC connection to the DMS, or you can manage it through the ODM Java API. If you pre-create the JDBC connection, your data mining application can access the connection caching features of JDBC. When the ODM Java API manages the JDBC connection, caching is not available to your application.

See Also:

Oracle Database JDBC Developer's Guide and Reference for information about connection caching.

7.3.2.1 Pre-Create the JDBC Connection

To pre-create the JDBC connection, create an OracleDataSource for an OraConnectionFactory.

//Create an OracleDataSource
OracleDataSource ods = new OracleDataSource();
ods.setURL(URL);
ods.setUser(user);
ods.setPassword(password);
 
//Create a connection factory using the OracleDataSource
javax.datamining.resource.ConnectionFactory connFactory = 
  oracle.dmt.jdm.resource.OraConnectionFactory(ods);
//Create DME Connection
javax.datamining.resource.Connection dmeConn = 
    connFactory.getConnection();

7.3.2.2 Use a ConnectionSpec for the DMS Connection

To manage the JDBC connection within the ODM Java API, create an empty ConnectionSpec instance using the getConnectionSpec() method of OraConnectionFactory.

//Create ConnectionSpec
ConnectionSpec connSpec = m_dmeConnFactory.getConnectionSpec();
connSpec.setURI("jdbc:oracle:thin:@host:port:sid");
connSpec.setName("user");
connSpec.setPassword("password");          
 
//Create DME Connection
javax.datamining.resource.Connection m_dmeConn = 
m_dmeConnFactory.getConnection(connSpec);

7.3.3 Features of a DMS Connection

In the ODM Java API, the DMS Connection is the primary factory object. The Connection instantiates the object factories using the getFactory method. The Connection object provides named object lookup, persistence, and task execution features.

7.3.3.1 Create Object Factories

The Connection.getFactory method creates a factory object. For example, to create a factory for the PhysicalDataSet object, pass the absolute name of the object to this method. The getFactory method creates an instance of PhysicalDataSetFactory.

javax.datamining.data.PhysicalDataSetFactory pdsFactory =
                dmeConn.getFactory("javax.datamining.data.PhysicalDataSet");

7.3.3.2 Provide Access to Mining Object Metadata

The Connection object provides methods for retrieving metadata about mining objects.

Method	Description
`getCreationDate`	Returns the creation date of the specified named object. getCreationDate(java.lang.String objectName, NamedObject objectType) returns java.util.Date
`getDescription`	Returns the description of the specified mining object. getDescription(java.lang.String objectName, NamedObject objectType) returns java.lang.String
`getObjectNames`	Returns a collection of the names of the objects of the specified type. getObjectNames(NamedObject objectType) returns java.util.Collection

You can obtain additional information about persistent mining objects by querying the Oracle data dictionary tables.

7.3.3.3 Save and Retrieve Mining Objects

The Connection object provides methods for retrieving mining objects and saving them in the DMS. Persistent objects are stored as database objects. Transient objects are stored in memory.

Method	Description
`saveObject`	Saves the named object in the metadata repository associated with the connection. saveObject(java.lang.String name, MiningObject object, boolean replace)
`retrieveObject`	Retrieves a copy of the specified named object from the metadata repository associated with the connection. retrieveObject(java.lang.String objectIdentifier) returns MiningObject
`retrieveObject`	Retrieves a copy of the object with the specified name and type from the metadata repository associated with the connection. retrieveObject(java.lang.String name, NamedObject objectType) returns MiningObject

See Also:

7.3.3.4 Execute Mining Tasks

The Connection object provides an execute method, which can execute mining tasks either asynchronously or synchronously. The DMS uses the database Scheduler to execute mining tasks, which are stored in the user's schema as Scheduler jobs.

Task Execution	execute method syntax
asynchronous	execute(java.lang.String taskName) returns ExecutionHandle
synchronous	execute(Task task,java.lang.Long timeout)) returns ExecutionHandle

Synchronous execution is typically used with single record scoring, but it may be used in other contexts as well.

See Also:

"Task Object"
"Executing Mining Tasks"
Oracle Database Administrator's Guide for information about the database Scheduler.

7.3.3.5 Retrieve DMS Capabilities and Metadata

The Connection object provides methods for obtaining information about the DMS at runtime.

Method	Description
`getMetaData`	Returns information about the underlying DMS instance represented through an active connection. `ConnectionMetaData` provides version information for the JDM implementation and Oracle Database. getMetaData() returns ConnectionMetaData
`getSupportedFunctions`	Returns an array of mining functions that are supported by the implementation. getSupportedFunctions() returns MiningFunction[]
`getSupportedAlgorithms`	Returns an array of mining algorithms that are supported by the specified mining function. getSupportedAlgorithms(MiningFunction function) returns MiningAlgorithm[]
`supportsCapability`	Returns true if the specified combination of mining capabilities is supported. If an algorithm is not specified, returns true if the specified function is supported. supportsCapability(MiningFunction function, MiningAlgorithm algorithm, MiningTask taskType) returns boolean

7.3.3.6 Retrieve Version Information

The Connection object provides methods for retrieving JDM standard version information and Oracle version information.

Method	Description
`getVersion`	Returns the version of the JDM Standard API. It must be "JDM 1.0" for the first release of JDM. getVersion() returns String
`getMajorVersion`	Returns the major version number. For the first release of JDM, this is "1". getMajorVersion() returns int
`getMinorVersion`	Returns the minor version number. For the first release of JDM, this is "0". getMinorVersion() returns int
`getProviderName`	Returns the provider name as "Oracle Corporation". getProviderName() returns String
`getProviderVersion`	Returns the version of the Oracle Database that shipped the Oracle Data Mining Java API jar file. getProviderVersion() returns String

7.4 API Design Overview

Object factories are central to the design of JDM. The ODM Java API uses object factories for instantiating mining objects.

javax.datamining is the base package for the JDM standard defined classes.

oracle.dmt.jdm is the base package for the Oracle extensions to the JDM standard.

The packages in the JDM standard API are organized by mining functions and algorithms. For example, the javax.datamining.supervised package contains all the classes that support supervised functions. It has subpackages for classification and regression classes.

javax.datamining.supervised.classification
javax.datamining.supervised.regression

Similarly, javax.datamining.algorithm is the base package for all algorithms. Each algorithm has its own subpackage. The JDM standard supports algorithms such as naive bayes and support vector machines.

javax.datamining.algorithm.naivebayes
javax.datamining.algorithm.svm

The ODM Java API follows a similar package structure for the extensions. For example, the ODM Java API supports Feature Extraction, a non-JDM standard function, and the Non-Negative Matrix Factorization algorithm that is used for feature extraction.

oracle.dmt.jdm.featureextraction
oracle.dmt.jdm.algorithm.nmf

The JDM standard has core packages that define common classes and packages for tasks, model details, rules and statistics. Figure 7-1 illustrates the inheritance hierarchy of the named objects.

Figure 7-1 JDM Named Objects Class Diagram

Description of "Figure 7-1 JDM Named Objects Class Diagram"

7.5 Describing the Mining Data

The JDM standard defines physical and logical data objects to describe the mining attribute characteristics of the data as well as statistical computations for describing the data.

In the ODM Java API, only physical data objects are supported. Data can be logically represented with database views. The DBMS_STATS package can be used for statistical computations.

The javax.datamining.data package contains all the data-related classes. The class diagram in Figure 7-2 illustrates the class relationships of the data objects supported by the ODM Java API.

Figure 7-2 Data Objects in Oracle Data Mining Java API

Description of "Figure 7-2 Data Objects in Oracle Data Mining Java API"

The following code illustrates the creation of a PhysicalDataSet object. It refers to the view DMUSER.MINING_DATA_BUILD_V and specifies the column cust_id as case-id using the PhysicalAttributeRole.

//Create PhysicalDataSetFactory
PhysicalDataSetFactory pdsFactory = 
     (PhysicalDataSetFactory)m_dmeConn.getFactory
     ("javax.datamining.data.PhysicalDataSet");
//Create a PhysicalDataSet object
PhysicalDataSet buildData = 
     pdsFactory.create("DMUSER.MINING_DATA_BUILD_V", false);
//Create PhysicalAttributeFactory 
PhysicalAttributeFactory paFactory =
     (PhysicalAttributeFactory)m_dmeConn.getFactory
     ("javax.datamining.data.PhysicalAttribute");
//Create PhysicalAttribute object
PhysicalAttribute pAttr = paFactory.create
     ("cust_id", AttributeDataType.integerType, PhysicalAttributeRole.caseId );
//Add the attribute to the PhysicalDataSet object
buildData.addAtribute(pAttr);
//Save the physical data set object
dmeConn.saveObject("JDM_BUILD_PDS", buildData, true);

7.6 Build Settings

In the ODM Java API, the BuildSettings object is saved as a table in the database. The settings table is compatible with the DBMS_DATA_MINING.CREATE_MODEL procedure. The name of the settings table must be unique in the user's schema. Figure 7-3 illustrates the build settings class hierarchy.

Figure 7-3 Build Settings Class Diagram.

Description of "Figure 7-3 Build Settings Class Diagram."

The following code illustrates the creation and storing of a classification settings object with a tree algorithm.

//Create a classification settings factory
ClassificationSettingsFactory clasFactory = 
(ClassificationSettingsFactory)dmeConn.getFactory
     ("javax.datamining.supervised.classification.ClassificationSettings");
//Create a ClassificationSettings object
ClassificationSettings clas = clasFactory.create();
//Set target attribute name
clas.setTargetAttributeName("AFFINITY_CARD");
//Create a TreeSettingsFactory
TreeSettingsFactory treeFactory =
(TreeSettingsFactory)dmeConn.getFactory
     ("javax.datamining.algorithm.tree.TreeSettings");
//Create TreeSettings instance
TreeSettings treeAlgo = treeFactory.create();
treeAlgo.setBuildHomogeneityMetric(TreeHomogeneityMetric.entropy);
treeAlgo.setMaxDepth(10);
treeAlgo.setMinNodeSize( 10, SizeUnit.count );
//Set algorithm settings in the classification settings
clas.setAlgorithmSettings(treeAlgo);
//Save the build settings object in the database
dmeConn.saveObject("JDM_TREE_CLAS", clas, true);

7.7 Executing Mining Tasks

The ODM Java API uses the DBMS_SCHEDULER infrastructure for executing mining tasks either synchronously or asynchronously in the database. A mining task is saved as a DBMS_SCHEDULER job in the user's schema. Its initial state is DISABLED. When the user calls the execute method in the DMS Connection, the job state is changed to ENABLED.

The class diagram in Figure 7-4 illustrates the different types of tasks that are available in the ODM Java API.

Figure 7-4 Task Class Diagram

Class diagram of different types of tasks.

Description of "Figure 7-4 Task Class Diagram"

DBMS_SCHEDULER provides additional scheduling and resource management features. You can extend the capabilities of ODM tasks by using the Scheduler infrastructure.

See Also:

Oracle Database Administrator's Guide for information about the database scheduler.

7.8 Building a Mining Model

The javax.datamining.task.BuildTask class is used to build a mining model. Prior to building a model, a PhysicalDataSet object and a BuildSettings object must be saved.

The following code illustrates the building of a tree model using the PhysicalDataSet described in "Describing the Mining Data" and the BuildSettings described in "Build Settings".

//Create BuildTaskFactory
BuildTaskFactory buildTaskFactory =
     dmeConn.getFactory("javax.datamining.task.BuildTask");
//Create BuildTask object
BuildTask buildTask = buildTaskFactory.create
     ( "JDM_BUILD_PDS","JDM_TREE_CLAS","JDM_TREE_MODEL"); 
//Save BuildTask object
dmeConn.saveObject("JDM_BUILD_TASK", buildTask, true);
//Execute build task asynchronously in the database
ExecutionHandle execHandle = dmeConn.execute("JDM_BUILD_TASK");
//Wait for completion of the task

7.9 Exploring Model Details

After building a model using the BuildTask, a model object is persisted in the database. It can be retrieved to explore the model details.

The class diagram in Figure 7-5 illustrates the different types of model objects and model details objects supported by the ODM Java API.

Figure 7-5 Model and Model Detail Class Diagram

Class diagram of Model objects and model details

Description of "Figure 7-5 Model and Model Detail Class Diagram"

The following code illustrates the retrieval of the classification tree model built in "Building a Mining Model" and its TreeModelDetail.

//Retrieve classification model from the DME
ClassificationModel treeModel = (ClassificationModel)dmeConn.retrieveObject
     ( "JDM_TREE_MODEL", NamedObject.model);
//Retrieve tree model detail from the model
TreeModelDetail treeDetail = (TreeModelDetail)treeModel.getModelDetail();
//Get the root node
TreeNode rootNode = treeDetail.getRootNode();
//Get child nodes
TreeNode[] childNodes = rootNode.getChildren();
//Get details of the first child node
int nodeId = childNodes[0].getIdentifier();
long caseCount = childNodes[0].getCaseCount();
Object prediction = childNodes[0].getPrediction();

7.10 Testing a Model

Once a supervised model has been built, it can be evaluated using a test operation. The JDM standard defines two types of test operations: one that takes the mining model as input, and the other that takes the apply output table with the actual and predicted value columns.

javax.datamining.supervised.TestTask is the base class for the model- based test tasks, and javax.datamining.supervised.TestMetricsTask is the base class for the apply output table-based test tasks.

The test operation creates and persists a test metrics object in the DMS. For classification model testing, either of the following can be used:

javax.datamining.supervised.classification.ClassificationTestTask
javax.datamining.supervised.classification.ClassificationTestMetricsTask

Both of these tasks create a named object:

javax.datamining.supervised.classification.ClassificationTestMetrics

The ClassificationTestMetrics named object is stored as a table in the user's schema. The name of the table is the name of the object. The confusion matrix, lift results, and ROC associated with the ClassificationTestMetrics object are stored in separate tables whose names are the ClassificationTestMetrics object name followed by the suffix _CFM, _LFT, or _ROC. Tools such as Oracle Discoverer can display the test results by querying these tables.

Similarly for regression model testing, either of the following can be used:

javax.datamining.supervised.regression.RegressionTestTask
javax.datamining.supervised.regression.RegressionTestMtericsTask

Both these tasks create a named object

javax.datamining.supervised.regression.RegressionTestMetrics

and store it as a table in the user schema.

The class diagram in Figure 7-6 illustrates the test metrics class hierarchy. It refers to "Build Settings" for the class hierarchy of test tasks.

Figure 7-6 Test Metrics Class Hierarchy

Description of "Figure 7-6 Test Metrics Class Hierarchy"

The following code illustrates the test of a tree model JDM_TREE_MODEL using the ClassificationTestTask on the dataset MINING_DATA_TEST_V.

//Create & save PhysicalDataSpecification      
PhysicalDataSet testData = m_pdsFactory.create(
        "MINING_DATA_TEST_V", false );
PhysicalAttribute pa = m_paFactory.create("cust_id", 
        AttributeDataType.integerType, PhysicalAttributeRole.caseId );
testData.addAttribute( pa );
m_dmeConn.saveObject( "JDM_TEST_PDS", testData, true );
//Create ClassificationTestTaskFactory
ClassificationTestTaskFactory testTaskFactory =  
  (ClassificationTestTaskFactory)dmeConn.getFactory(
     "javax.datamining.supervised.classification.ClassificationTestTask");
//Create, store & execute Test Task
ClassificationTestTask testTask = testTaskFactory.create( 
        "JDM_TEST_PDS", "JDM_TREE_MODEL", "JDM_TREE_TESTMETRICS" );
testTask.setNumberOfLiftQuantiles(10);
testTask.setPositiveTargetValue(new Integer(1));
//Save TestTask object
dmeConn.saveObject("JDM_TEST_TASK", testTask, true);
//Execute test task asynchronously in the database
ExecutionHandle execHandle = dmeConn.execute("JDM_TEST_TASK");
//Wait for completion of the task
ExecutionStatus execStatus = execHandle.waitForCompletion(Integer.MAX_VALUE);
//Explore the test metrics after successful completion of the task
if(ExecutionState.success.equals(execStatus.getState())) {
  //Retrieve the test metrics object
  ClassificationTestMetrics testMetrics =  
          (ClassificationTestMetrics)dmeConn.getObject("JDM_TREE_TESTMETRICS");
  //Retrieve confusion matrix and accuracy
  Double accuracy = testMetrics.getAccuracy();
  ConfusionMatrix cfm = testMetrics.getConfusionMatrix();
  //Retrieve lift 
  Lift lift = testMetrics.getLift();
  //Retrieve ROC
  ReceiverOperatingCharacterics roc = testMetrics.getROC();
}

In the preceding example, a test metrics object is stored as a table called JDM_TREE_TESTMETRICS. The confusion matrix is stored in the JDM_TREE_TESTMETRICS_CFM table, lift is stored in the JDB_TREE_TESTMETRICS_LFT table, and ROC is stored in the JDM_TREE_TESTMETRICS_ROC table. You can use BI tools like Oracle Discoverer to query these tables and create reports.

7.11 Applying a Model for Scoring Data

All supervised models can be applied to data to find the prediction. Some of the unsupervised models, such as clustering and feature extraction, support the apply operation to find the cluster id or feature id for new records.

The JDM standard API provides an ApplySettings object to specify the type of output for the scored results. javax.datamining.task.apply.ApplySettings is the base class for all apply settings. In the ODM Java API, the ApplySettings object is transient; it is stored in the Connection context, not in the database.

The class diagram in Figure 7-7 illustrates the class hierarchy of the apply settings available in the ODM Java API.

Figure 7-7 Apply Settings

Description of "Figure 7-7 Apply Settings"

In the ODM Java API, default apply settings produce the apply output table in fixed format. The list in Table 7-2 illustrates the default output formats for different functions.

Table 7-2 Default Output Formats for Different Functions

Mining Function
Classification without Cost	Case ID	Prediction	Probability
Classification with Cost	Case ID	Prediction	Probability	Cost
Regression	Case ID	Prediction
Clustering	Case ID	Cluster ID	Probability
Feature extraction	Case ID	Feature ID	Value

All types of apply settings support source and destination attribute mappings. For example, if the original apply table has customer name and age columns that need to be carried forward to the apply output table, it can be done by specifying the source destination mappings in the apply settings.

In the ODM Java API, classification apply settings support map by rank, top prediction, map by category, and map all predictions. Regression apply settings support map prediction value. Clustering apply settings support map by rank, map by cluster id, map top cluster, and map all clusters. Feature extraction apply settings support map by rank, map by feature id, map top feature, and map all features.

The following code illustrates the applying of a tree model JDM_TREE_MODEL using ClassificationApplyTask on the dataset MINING_DATA_APPLY_V.

//Create & save PhysicalDataSpecification      
PhysicalDataSet applyData = m_pdsFactory.create( "MINING_DATA_APPLY_V", false );
PhysicalAttribute pa = m_paFactory.create("cust_id", 
        AttributeDataType.integerType, PhysicalAttributeRole.caseId );
applyData.addAttribute( pa );
m_dmeConn.saveObject( "JDM_APPLY_PDS", applyData, true );
//Create ClassificationApplySettingsFactory
ClassificationApplySettingsFactory applySettingsFactory =  
  (ClassificationApplySettingsFactory)dmeConn.getFactory(
     "javax.datamining.supervised.classification. ClassificationApplySettings");
//Create & save ClassificationApplySettings
ClassificationApplySettings clasAS = applySettingsFactory.create();
m_dmeConn.saveObject( "JDM_APPLY_SETTINGS", clasAS, true);
//Create DataSetApplyTaskFactory
DataSetApplyTaskFactory applyTaskFactory =  
  (DataSetApplyTaskFactory)dmeConn.getFactory(
     "javax.datamining.task.apply.DataSetApplyTask");
//Create, store & execute apply Task
DataSetApplyTask applyTask = m_dsApplyFactory.create(
        " JDM_APPLY_PDS ", "JDM_TREE_MODEL", " JDM_APPLY_SETTINGS ", 
        "JDM_APPLY_OUTPUT_TABLE");
//Save ApplyTask object
dmeConn.saveObject("JDM_APPLY_TASK", applyTask, true);
//Execute test task asynchronously in the database
ExecutionHandle execHandle = dmeConn.execute("JDM_APPLY_TASK");
//Wait for completion of the task
ExecutionStatus execStatus = execHandle.waitForCompletion(Integer.MAX_VALUE);

7.12 Using a Cost Matrix

The class javax.datamining.supervised.classification.CostMatrix is used to represent the costs of the false positive and false negative predictions. It is used for classification problems to specify the costs associated with the false predictions.

In the ODM Java API, cost matrix is supported in apply and test operations for all classification models. For the decision tree algorithm, a cost matrix can be specified at build time. For more information about cost matrix, see Oracle Data Mining Concepts.

The following code illustrates how to create a cost matrix object where the target has two classes: YES (1) and NO (0). Suppose a positive (YES) response to the promotion generates $2 and the cost of the promotion is $1. Then the cost of misclassifying a positive responder is $2. The cost of misclassifying a non-responder is $1.

//Create category set factory & cost matrix factory
CategorySetFactory catSetFactory = (CategorySetFactory)m_dmeConn.getFactory(
      "javax.datamining.data.CategorySet" );
CostMatrixFactory costMatrixFactory = (CostMatrixFactory)m_dmeConn.getFactory(
      "javax.datamining.supervised.classification.CostMatrix");
//Create categorySet
CategorySet catSet = m_catSetFactory.create(AttributeDataType.integerType);
//Add category values
catSet.addCategory(new Integer(0), CategoryProperty.valid);
catSet.addCategory(new Integer(1), CategoryProperty.valid);
//create cost matrix
CostMatrix costMatrix = m_costMatrixFactory.create(catSet);
costMatrix.setValue(new Integer(0), new Integer(0), 0);
costMatrix.setValue(new Integer(1), new Integer(1), 0);
costMatrix.setValue(new Integer(0), new Integer(1), 2);
costMatrix.setValue(new Integer(1), new Integer(0), 1);
//Save cost matrix in the DME
dmeConn.saveObject("JDM_COST_MATRIX", costMatrix);

7.13 Using Prior Probabilities

Prior probabilities are used for classification problems if the actual data has a different distribution for target values than the data provided for the model build. A user can specify the prior probabilities in the classification function settings, using setPriorProbabilitiesMap. For more information about prior probabilities, see Oracle Data Mining Concepts.

Note:

Priors are not supported with decision trees.

The following code illustrates how to create a PriorProbabilities object, when the target has two classes: YES (1) and NO (0), and probability of YES is 0.05, probability of NO is 0.95.

//Set target prior probabilities
Map priorMap = new HashMap();
priorMap.put(new Double(0), new Double(0.7));
priorMap.put(new Double(1), new Double(0.3));
buildSettings.setPriorProbabilitiesMap("affinity_card", priorMap);

7.14 Using Automated Prediction and Explain Tasks

The ODM Java API provides oracle.dmt.jdm.task.OraPredictTask and oracle.dmt.jdm.task.OraExplainTask for generating predictions and explaining attribute importance. These tasks automate the predict and explain operations for data mining novice users.

OraPredictTask predicts the value of a target column based on cases where the target is not null. OraPredictTask uses known data values to automatically create a model and populate the unknown values in the target.

OraExplainTask identifies attribute columns that are important for explaining the variation of values in a given column. OraExplainTask analyzes the data and builds a model that identifies the important attributes and ranks their importance.

Both of these tasks do the automated data preparation where needed.

The following code illustrates OraPredictTask and OraExplainTask.

//Predict task
   //Create predict task factory and task object
   OraPredictTaskFactory predictFactory = 
    (OraPredictTaskFactory)m_dmeConn.getFactory(
      "oracle.dmt.jdm.task.OraPredictTask");
   OraPredictTask predictTask = m_predictFactory.create(
                     "MINING_DATA_BUILD_V", //Input table
                     "cust_id", //Case id column
                     "affinity_card", //target column
                     "JDM_PREDICTION_TABLE"); //prediction output table
   //Save predict task object
   dmeConn.saveObject("JDM_PREDICT_TASK", predictTask, true);
   //Execute test task asynchronously in the database
   ExecutionHandle execHandle = dmeConn.execute("JDM_PREDICT_TASK");
   //Wait for completion of the task
   ExecutionStatus execStatus = 
       execHandle.waitForCompletion(Integer.MAX_VALUE);                         
//Explain task
   //Create explain task factory and task object
   OraExplainTaskFactory explainFactory =
      (OraExplainTaskFactory)m_dmeConn.getFactory(
      "oracle.dmt.jdm.task.OraExplainTask");
   OraExplainTask explainTask = m_explainFactory.create(
                     "MINING_DATA_BUILD_V", //Input table
                     "affinity_card", //explain column
                     "JDM_EXPLAIN_TABLE"); //explain output table
   //Save predict task object
   dmeConn.saveObject("JDM_EXPLAIN_TASK", explainTask, true);
   //Execute test task asynchronously in the database
   ExecutionHandle execHandle = dmeConn.execute("JDM_ EXPLAIN_TASK");
   //Wait for completion of the task
   ExecutionStatus execStatus = execHandle.waitForCompletion(Integer.MAX_VALUE);

7.15 Preparing the Data

In the ODM Java API, data must be prepared before building, applying, or testing a model. The oracle.dmt.jdm.task.OraTransformationTask class supports common transformations used in data mining: binning, normalization, clipping, and text transformations. For more information about transformations, see Oracle Data Mining Concepts.

The class diagram in Figure 7-8 illustrates the OraTransformationTask and its relationship with other objects.

Figure 7-8 OraTransformationTask and its Relationship With Other Objects

Description of "Figure 7-8 OraTransformationTask and its Relationship With Other Objects"

7.15.1 Using Binning/Discretization Transformation

Binning is the process of grouping related values together, thus reducing the number of distinct values for an attribute. Having fewer distinct values typically leads to a more compact model and one that builds faster, but it can also lead to some loss in accuracy.

The class diagram in Figure 7-9 illustrates the binning transformation classes.

Figure 7-9 OraBinningTransformation Class Diagram

Description of "Figure 7-9 OraBinningTransformation Class Diagram"

Here, OraBinningTransformation contains all the settings required for binning. The ODM Java API supports top-n, custom binning for categorical attributes, and equi-width, quantile and custom binning for numerical attributes. After running the binning transformations, it creates a transformed table and bin boundary tables in the user's schema. The user can specify the bin boundary table names, or the system will generate the names for the bin boundary tables. This facilitates the reusing of the bin boundary tables that are created for binning build data for apply and test data.

The following code illustrates the binning operation on the view MINING_BUILD_DATA_V

//Create binning transformation instance
OraBinningTransformFactory binXformFactory = 
   (OraBinningTransformFactory)dmeConn.getFactory(
      "oracle.dmt.jdm.transform.binning.OraBinningTransform");
OraBinningTransform binTransform = m_binXformFactory.create(
      "MINING_DATA_BUILD_V", // name of the input data set
      "BINNED_DATA_BUILD_V", // name of the transformation result 
      true); // result of the transformation is a view  
// Specify the number of numeric bins
binTransform.setNumberOfBinsForNumerical(10);
// Specify the number of categoric bins
binTransform.setNumberOfBinsForCategorical(8);
// Specify the list of excluded attributes
String[] excludedList = new String[]{"CUST_ID", "CUST_GENDER"};
binTransform.setExcludeColumnList(excludedList);
// Specify the type of numeric binning: equal-width or quantile
       ( default is quantile )
binTransform.setNumericalBinningType(binningType);
// Specify the type of categorical binning as Top-N: by default it is none   
binTransform.setCategoricalBinningType(OraCategoricalBinningType.top_n);
//Create transformation task
OraTransformationTask xformTask = m_xformTaskFactory.create(binTransform);
//Save transformation task object
dmeConn.saveObject("JDM_BINNING_TASK", xformTask, true);
//Execute transformation task asynchronously in the database
ExecutionHandle execHandle = dmeConn.execute("JDM_ BINNING _TASK");
//Wait for completion of the task
ExecutionStatus execStatus = execHandle.waitForCompletion(Integer.MAX_VALUE);

7.15.2 Using Normalization Transformation

Normalizing converts individual attribute values in such a way that all attribute values lie in the same range. Normally, values are converted to be in the range 0.0 to 1.0 or the range -1 to +1. Normalization ensures that attributes do not receive artificial weighting caused by differences in the ranges that they span.

The class diagram in Figure 7-10 illustrates the normalization transformation classes.

Figure 7-10 OraNormalizeTransformation Class Diagram

Normalization transformation related classes.

Description of "Figure 7-10 OraNormalizeTransformation Class Diagram"

Here, OraNormalizeTransformation contains all the settings required for normalization. The ODM Java API supports z-Score, min-max, and linear scale normalizations. Normalization is required for SVM, NMF, and k-Means algorithms.

The following code illustrates normalization on the view MINING_BUILD_DATA_V.

//Create OraNormalizationFactory
OraNormalizeTransformFactory normalizeXformFactory = 
  (OraNormalizeTransformFactory)m_dmeConn.getFactory(
      "oracle.dmt.jdm.transform.normalize.OraNormalizeTransform");
//Create OraNormalization
OraNormalizeTransform normalizeTransform = m_normalizeXformFactory.create(
      "MINING_DATA_BUILD_V", // name of the input data set
      "NORMALIZED_DATA_BUILD_V", // name of the transformation result 
      true, // result of the transformation is a view
      OraNormalizeType.z_Score, //Normalize type
      new Integer(6) ); //Rounding number    
// Specify the list of excluded attributes
String[] excludedList = new String[]{"CUST_ID", "CUST_GENDER"};
normalizeTransform.setExcludeColumnList(excludedList);
//Create transformation task
OraTransformationTask xformTask = m_xformTaskFactory.create(normalizeTransform);
//Save transformation task object
dmeConn.saveObject("JDM_NORMALIZE_TASK", xformTask, true);
//Execute transformation task asynchronously in the database
ExecutionHandle execHandle = dmeConn.execute("JDM_NORMALIZE_TASK");
//Wait for completion of the task
ExecutionStatus execStatus = execHandle.waitForCompletion(Integer.MAX_VALUE);

7.15.3 Using Clipping Transformation

Some computations on attribute values can be significantly affected by extreme values. One approach to achieving a more robust computation is to either winsorize or trim the data using clipping transformations.

Winsorizing involves setting the tail values of a particular attribute to some specified value. For example, for a 90% winsorization, the bottom 5% are set equal to the minimum value in the 6th percentile, while the upper 5% are set equal to the value corresponding to the maximum value in the 95th percentile.

Trimming "removes" the tails in the sense that trimmed values are ignored in further values. This is achieved by setting the tails to NULL.

The class diagram in Figure 7-11 illustrates the clipping transformation classes.

Figure 7-11 OraClippingTransformation Class Diagram

Description of "Figure 7-11 OraClippingTransformation Class Diagram"

Here, OraClippingTransformation contains all the settings required for clipping. The ODM Java API supports winsorize and trim types of clipping.

The following code illustrates clipping on the view MINING_BUILD_DATA_V.

//Create OraClippingTransformFactory
OraClippingTransformFactory clipXformFactory = 
  (OraClippingTransformFactory)dmeConn.getFactory(
      "oracle.dmt.jdm.transform.clipping.OraClippingTransform");
//Create OraClippingTransform
OraClippingTransform clipTransform = clipXformFactory.create(
      "MINING_DATA_BUILD_V", // name of the input data set
      "WINSORISED_DATA_BUILD_V", // name of the transformation result 
      true );// result of the transformation is a view    
//Specify the list of excluded attributes
String[] excludedList = new String[]{"CUST_ID", "CUST_GENDER"};
clipTransform.setExcludeColumnList(excludedList);
//Specify the type of clipping
clipTransform.setClippingType(OraClippingType.winsorize);
// Specify the tail fraction as 3% of values on both ends
clipTransform.setTailFraction(0.03);
//Create and save transformation task
OraTransformationTask xformTask = xformTaskFactory.create(clipTransform);
//Save transformation task object
dmeConn.saveObject("JDM_CLIPPING_TASK", xformTask, true);
//Execute transformation task asynchronously in the database
ExecutionHandle execHandle = dmeConn.execute("JDM_CLIPPING_TASK");
//Wait for completion of the task
ExecutionStatus execStatus = execHandle.waitForCompletion(Integer.MAX_VALUE);

7.15.4 Using Text Transformation

Text columns need to be transformed to nested table structure to do the mining on text columns. This transformation converts the text columns to nested table columns. A features table is created by text transformation. A model build text data column features table must be used for apply and test tasks to get the correct results.

The class diagram in Figure 7-12 illustrates the text transformation classes.

Figure 7-12 Text Transformation Class Diagram

Description of "Figure 7-12 Text Transformation Class Diagram"

Here, OraTextTransformation is used to specify the text columns and the feature tables associated with the text columns.

The following code illustrates clipping on the table MINING_BUILD_TEXT.

//Create OraTextTransformFactory
OraTextTransformFactory textXformFactory = dmeConn.getFactory(
      "oracle.dmt.jdm.transform.text.OraTextTransform");
//Create OraTextTransform
OraTextTransform txtXform = (OraTextTransformImpl)textXformFactory.create(
      "MINING_BUILD_TEXT", // name of the input data set
      "NESTED_TABLE_BUILD_TEXT ", // name of the transformation result
      "CUST_ID", //Case id column
      new String[] { "COMMENTS" } ); //Text column names 
      );
//Create transformation task
OraTransformationTask xformTask = m_xformTaskFactory.create(txtXform);
//Save transformation task object
dmeConn.saveObject("JDM_TEXTXFORM_TASK", xformTask, true);
//Execute transformation task asynchronously in the database
ExecutionHandle execHandle = dmeConn.execute("JDM_TEXTXFORM_TASK");
//Wait for completion of the task
ExecutionStatus execStatus = execHandle.waitForCompletion
     (Integer.MAX_VALUE);