6 Java API Overview

This chapter introduces the new Oracle Data Mining Java API. You can use the Java API to create thin client applications that access the rich data mining functionality within the Oracle Database.

The ODM Java API is an Oracle implementation of the Java Data Mining (JDM) 1.0 standard API for data mining. The ODM Java API implements Oracle-specific extensions to JDM 1.0, in compliance with the JSR-73 standards extension framework. The full range of data mining functions and algorithms available in the Database, including the new predictive analytics features in the DBMS_PREDICTIVE_ANALYTICS PL/SQL package, are exposed through the ODM Java API.

The ODM Java API replaces the proprietary Java API for data mining that was available with Oracle 10.1. It is fully compatible with the Oracle 10g Release 2 (10.2)PL/SQL API for data mining.

This chapter includes the following topics:

The JDM 1.0 Standard
Oracle Extensions to JDM 1.0
Principal Objects in the ODM Java API

6.1 The JDM 1.0 Standard

JDM 1.0 is an industry standard Java API for data mining, developed under the Java Community Process (JCP). It defines Java interfaces that vendors can implement for their Data Mining Engines.

JDM interfaces support mining functions including classification, regression, clustering, attribute importance, and association; and specific mining algorithms including naïve bayes, support vector machines, decision trees, and k-means.

For a complete description of the JDM 1.0 standard, visit the JSR-000073 Data Mining API page of the Java Community Process Web Site.

http://jcp.org/aboutJava/communityprocess/final/jsr073

You can download the JDM 1.0 javadoc from the Oracle Data Mining page of the Oracle Technology Network.

http://www.oracle.com/technology/products/bi/odm/index.html

The Java packages defined by the JDM standard are summarized in Table 6-1.

Table 6-1 JDM 1.0 Standard High-Level Packages

Package	Description
`javax.datamining`	Defines the classes and interfaces used in JDM subpackages.
`javax.datamining.base`	Defines the interfaces for top-level objects and interfaces. This package was introduced to avoid cyclic package dependencies.
`javax.datamining.resource`	Defines objects that support connecting to the Data Mining Server and executing tasks.
`javax.datamining.data`	Defines objects that support logical and physical data, model signature, taxonomy, category set, and the generic super class category matrix.
`javax.datamining.statistics`	Defines objects that support attribute statistics.
`javax.datamining.rule`	Defines objects that support rules and their predicate components.
`javax.datamining.task`	Defines objects that support tasks for building, computing statistics, importing, and exporting models. The `task` package has an optional `apply` subpackage, which is mainly used for supervised and clustering functions.
`javax.datamining.association`	Defines objects that support the build settings and model for association rules.
`javax.datamining.clustering`	Defines objects that support the build settings, models and apply output for clustering.
`javax.datamining.attributeimportance`	Defines objects that support the build settings and model for attribute importance.
`javax.datamining.supervised`	Defines objects that support the build settings and model for supervised learning functions. This package includes optional subpackages for classification and regression and a test task that is common to both.
`javax.datamining.algorithm`	Defines objects that support algorithm-specific settings. This package has optional subpackages for different algorithms.
`javax.datamining.modeldetail`	Defines objects that support the details of various model representations. This package includes optional subpackages for different types of models.

6.2 Oracle Extensions to JDM 1.0

The ODM Java API adds functionality that is not part of the JDM standards. The Oracle extensions to the JDM API provide the following major additional features:

Feature Extraction with the Non-Negative Matrix Factorization (NMF) algorithm
Orthogonal Partitioning Clustering (O-Cluster), an Oracle-proprietary clustering algorithm
Adaptive Bayes Network (ABN), an Oracle-proprietary classification algorithm
Transformations, including discretization (binning), normalization, clipping, and text transformations.
Predictive analytic s (OraPredictTask and OraExplainTask interfaces)

See Also:

Oracle Data Mining Java API Reference (javadoc) for detailed information about the ODM Java API.

The Java packages defined by the Oracle extensions to the JDM standards are summarized in Table 6-2.

Table 6-2 Oracle High-Level Packages that Extend the JDM 1.0 Standards

Package	Description
`oracle.dmt.jdm.featureextraction`	Defines objects related to feature extraction, which supports the scoring operation.
`oracle.dmt.jdm.algorithm.nmf`	Defines objects related to the Non-Negative Matrix Factorization (NMF) algorithm.
`oracle.dmt.jdm.algorithm.ocluster`	Defines objects related to the Orthogonal Partitioning Clustering algorithm (O-cluster)
`oracle.dmt.jdm.algorithm.abn`	Defines objects related to the Adaptive Bayes Network (ABN) classification algorithm.
`oracle.dmt.jdm.transform`	Defines objects related to data transformations.

6.3 Principal Objects in the ODM Java API

In the JDM standard API, named objects are objects that can be saved using the saveObject method of a Connection instance. All named objects are inherited from the javax.datamining.MiningObject interface.

The JDM standard supports both permanent and temporary named objects. Permanent objects (persistentObject) are saved permanently in the database. Temporary objects (transientObject) exist only for the duration of the session.

The persistent and transient named objects supported by the Oracle extensions to the JDM API are listed in Table 6-3.

Table 6-3 Named Objects in ODM Java API

Persistent Objects	Transient Objects
`Model`	`ApplySettings`
`BuildSettings`	`PhysicalDataset`
`Task`
`CostMatrix`
`TestMetrics`

Note:

The LogicalData and Taxonomy objects in the standard JDM API are not supported by Oracle.

The named objects in the ODM Java API are described in the following sections.

6.3.1 PhysicalDataSet Object

A PhysicalDataSet object refers to the data to be used as input to a data mining operation. In JDM, PhysicalDatSet objects reference specific data through a Uniform Resource Identifier (URI), which could specify a table, a file, or some other data source.

In the ODM Java API, a PhysicalDataSet must reference a table or a view within the database instance referenced in the Connection. The syntax of a physical data set URI in the ODM Java AI is the Oracle syntax for specifying a table or a view.

[SchemaName.]TableName

[SchemaName.]ViewName

In JDM, PhysicalDataSet objects can support multiple data representations. Oracle Data Mining supports two types of data representation: single-record case, and wide data. The Oracle implementation requires users to specify the case-id column in the physical dataset. Refer to Oracle Data Mining Concepts for more details.

In the ODM Java API, a PhysicalDataSet object is transient. It is stored in the Connection as an in-memory object.

See Also:

"Describing the Mining Data".

6.3.2 BuildSettings Object

A BuildSettings object captures the high-level specifications used to build a model. The ODM Java API specifies a variety of mining functions: classification, regression, attribute importance, association, clustering, and feature extraction.

A BuildSettings object can specify a type of desired result without identifying a particular algorithm. If an algorithm is not specified in the BuildSettings object, the DMS selects an algorithm based on the build settings and the characteristics of the data.

BuildSettings has a verify method, which validates the input specifications for a model. Input must satisfy the requirements of the ODM Java API.

In the ODM Java API, a BuildSettings object is persistent. It is stored as a table with a user-specified name in the user schema. This settings table is interoperable with the PL/SQL API for data mining. Normally, you should not modify the build settings table manually.

See Also:

"Build Settings" and "Model Settings".

6.3.3 Task Object

A Task object represents all the information needed to perform a mining operation. The execute method of the Connection object is used to start the execution of a mining task.

Mining operations, which often process input tables with millions of records, can be time consuming. For this reason, the JDM API supports the asynchronous execution of mining tasks.

Mining tasks are stored as DBMS_SCHEDULER job objects in the user schema. The saved job object is in a DISABLED state until the execute method causes it to start execution.

The execute method returns a javax.datamining.ExecutionHandle object, which provides methods for monitoring an asynchronous task. ExecutionHandle methods include waitForCompletion and getStatus.

See Also:

"Executing Mining Tasks".
Oracle Database PL/SQL Packages and Types Reference for more information about DBMS_SCHEDULER.

6.3.4 Model Object

A Model object results from the application of an algorithm to data, as specified in a BuildSettings object.

Models can be used in several operations. They can be:

inspected, for example to examine the rules produced from a decision tree or association
tested for accuracy
applied to data for scoring
exported to an external representation such as native format or PMML
imported for use in the DMS

When a model is applied to data, it is submitted to the DMS for interpretation. A Model references its BuildSettings object as well as the Task that created it.

See Also:

"Exploring Model Details".

6.3.5 TestMetrics Object

A TestMetrics object results from the testing of a supervised model with test data. Different test metrics are computed, depending on the type of mining function. For classification models, the accuracy, confusion-matrix, lift, and receiver-operating characteristics can be computed to access the model. Similarly for regression models, R-squared and RMS errors can be computed.

See Also:

"Testing a Model".

6.3.6 ApplySettings Object

An ApplySettings object allows users to tailor the results of an apply task. It contains a set of ordered items. Output can consist of:

Data to be passed through to the output from the input dataset, for example key attributes
Values computed from the apply itself, for example score, probability, and in the case of decision trees, rule identifiers
Multi-class categories for its associated probabilities. For example, in a classification model with target favoriteColor, users could select the specific colors to receive the probability that a given color is favorite

Each mining function class defines a method to construct a default ApplySettings object. This simplifies the programmer's effort if only standard output is desired. For example, typical output for a classification apply would include the top prediction and its probability.

See Also:

"Applying a Model for Scoring Data".