Oracle® Data Mining Application Developer's Guide, 10g Release 2 (10.2) Part Number B14340-01 |
|
|
View PDF |
This chapter introduces the new Oracle Data Mining Java API. You can use the Java API to create thin client applications that access the rich data mining functionality within the Oracle Database.
The ODM Java API is an Oracle implementation of the Java Data Mining (JDM) 1.0 standard API for data mining. The ODM Java API implements Oracle-specific extensions to JDM 1.0, in compliance with the JSR-73 standards extension framework. The full range of data mining functions and algorithms available in the Database, including the new predictive analytics features in the DBMS_PREDICTIVE_ANALYTICS
PL/SQL package, are exposed through the ODM Java API.
The ODM Java API replaces the proprietary Java API for data mining that was available with Oracle 10.1. It is fully compatible with the Oracle 10g Release 2 (10.2)PL/SQL API for data mining.
This chapter includes the following topics:
JDM 1.0 is an industry standard Java API for data mining, developed under the Java Community Process (JCP). It defines Java interfaces that vendors can implement for their Data Mining Engines.
JDM interfaces support mining functions including classification, regression, clustering, attribute importance, and association; and specific mining algorithms including naïve bayes, support vector machines, decision trees, and k-means.
For a complete description of the JDM 1.0 standard, visit the JSR-000073 Data Mining API page of the Java Community Process Web Site.
http://jcp.org/aboutJava/communityprocess/final/jsr073
You can download the JDM 1.0 javadoc from the Oracle Data Mining page of the Oracle Technology Network.
http://www.oracle.com/technology/products/bi/odm/index.html
The Java packages defined by the JDM standard are summarized in Table 6-1.
Table 6-1 JDM 1.0 Standard High-Level Packages
Package | Description |
---|---|
|
Defines the classes and interfaces used in JDM subpackages. |
|
Defines the interfaces for top-level objects and interfaces. This package was introduced to avoid cyclic package dependencies. |
|
Defines objects that support connecting to the Data Mining Server and executing tasks. |
|
Defines objects that support logical and physical data, model signature, taxonomy, category set, and the generic super class category matrix. |
|
Defines objects that support attribute statistics. |
|
Defines objects that support rules and their predicate components. |
|
Defines objects that support tasks for building, computing statistics, importing, and exporting models. The |
|
Defines objects that support the build settings and model for association rules. |
|
Defines objects that support the build settings, models and apply output for clustering. |
|
Defines objects that support the build settings and model for attribute importance. |
|
Defines objects that support the build settings and model for supervised learning functions. This package includes optional subpackages for classification and regression and a test task that is common to both. |
|
Defines objects that support algorithm-specific settings. This package has optional subpackages for different algorithms. |
|
Defines objects that support the details of various model representations. This package includes optional subpackages for different types of models. |
The ODM Java API adds functionality that is not part of the JDM standards. The Oracle extensions to the JDM API provide the following major additional features:
Feature Extraction with the Non-Negative Matrix Factorization (NMF) algorithm
Orthogonal Partitioning Clustering (O-Cluster), an Oracle-proprietary clustering algorithm
Adaptive Bayes Network (ABN), an Oracle-proprietary classification algorithm
Transformations, including discretization (binning), normalization, clipping, and text transformations.
Predictive analytic s (OraPredictTask
and OraExplainTask
interfaces)
See Also: Oracle Data Mining Java API Reference (javadoc) for detailed information about the ODM Java API. |
The Java packages defined by the Oracle extensions to the JDM standards are summarized in Table 6-2.
Table 6-2 Oracle High-Level Packages that Extend the JDM 1.0 Standards
Package | Description |
---|---|
|
Defines objects related to feature extraction, which supports the scoring operation. |
|
Defines objects related to the Non-Negative Matrix Factorization (NMF) algorithm. |
|
Defines objects related to the Orthogonal Partitioning Clustering algorithm (O-cluster) |
|
Defines objects related to the Adaptive Bayes Network (ABN) classification algorithm. |
|
Defines objects related to data transformations. |
In the JDM standard API, named objects are objects that can be saved using the saveObject
method of a Connection
instance. All named objects are inherited from the javax.datamining.MiningObject
interface.
The JDM standard supports both permanent and temporary named objects. Permanent objects (persistentObject
) are saved permanently in the database. Temporary objects (transientObject
) exist only for the duration of the session.
The persistent and transient named objects supported by the Oracle extensions to the JDM API are listed in Table 6-3.
Table 6-3 Named Objects in ODM Java API
Persistent Objects | Transient Objects |
---|---|
|
|
|
|
|
|
|
|
|
|
Note: TheLogicalData and Taxonomy objects in the standard JDM API are not supported by Oracle. |
The named objects in the ODM Java API are described in the following sections.
A PhysicalDataSet
object refers to the data to be used as input to a data mining operation. In JDM, PhysicalDatSet
objects reference specific data through a Uniform Resource Identifier (URI), which could specify a table, a file, or some other data source.
In the ODM Java API, a PhysicalDataSet
must reference a table or a view within the database instance referenced in the Connection
. The syntax of a physical data set URI in the ODM Java AI is the Oracle syntax for specifying a table or a view.
[SchemaName.]TableName
or
[SchemaName.]ViewName
In JDM, PhysicalDataSet
objects can support multiple data representations. Oracle Data Mining supports two types of data representation: single-record case, and wide data. The Oracle implementation requires users to specify the case-id column in the physical dataset. Refer to Oracle Data Mining Concepts for more details.
In the ODM Java API, a PhysicalDataSet
object is transient. It is stored in the Connection
as an in-memory object.
A BuildSettings
object captures the high-level specifications used to build a model. The ODM Java API specifies a variety of mining functions: classification, regression, attribute importance, association, clustering, and feature extraction.
A BuildSettings
object can specify a type of desired result without identifying a particular algorithm. If an algorithm is not specified in the BuildSettings
object, the DMS selects an algorithm based on the build settings and the characteristics of the data.
BuildSettings
has a verify
method, which validates the input specifications for a model. Input must satisfy the requirements of the ODM Java API.
In the ODM Java API, a BuildSettings
object is persistent. It is stored as a table with a user-specified name in the user schema. This settings table is interoperable with the PL/SQL API for data mining. Normally, you should not modify the build settings table manually.
A Task
object represents all the information needed to perform a mining operation. The execute
method of the Connection
object is used to start the execution of a mining task.
Mining operations, which often process input tables with millions of records, can be time consuming. For this reason, the JDM API supports the asynchronous execution of mining tasks.
Mining tasks are stored as DBMS_SCHEDULER
job objects in the user schema. The saved job object is in a DISABLED
state until the execute
method causes it to start execution.
The execute
method returns a javax.datamining.ExecutionHandle
object, which provides methods for monitoring an asynchronous task. ExecutionHandle
methods include waitForCompletion
and getStatus
.
See Also:
|
A Model
object results from the application of an algorithm to data, as specified in a BuildSettings
object.
Models can be used in several operations. They can be:
inspected, for example to examine the rules produced from a decision tree or association
tested for accuracy
applied to data for scoring
exported to an external representation such as native format or PMML
imported for use in the DMS
When a model is applied to data, it is submitted to the DMS for interpretation. A Model
references its BuildSettings
object as well as the Task
that created it.
A TestMetrics
object results from the testing of a supervised model with test data. Different test metrics are computed, depending on the type of mining function. For classification models, the accuracy, confusion-matrix, lift, and receiver-operating characteristics can be computed to access the model. Similarly for regression models, R-squared and RMS errors can be computed.
An ApplySettings
object allows users to tailor the results of an apply task. It contains a set of ordered items. Output can consist of:
Data to be passed through to the output from the input dataset, for example key attributes
Values computed from the apply itself, for example score, probability, and in the case of decision trees, rule identifiers
Multi-class categories for its associated probabilities. For example, in a classification model with target favoriteColor
, users could select the specific colors to receive the probability that a given color is favorite
Each mining function class defines a method to construct a default ApplySettings
object. This simplifies the programmer's effort if only standard output is desired. For example, typical output for a classification apply would include the top prediction and its probability.