What's New in Oracle Text?

This chapter describes new features of Oracle Text (formerly Oracle8i interMedia Text) and provides pointers to additional information. The following topics are covered:

Release 9.2 New Features in Oracle Text

The following features are new for this release:

Document Classification

The new CTX_CLS.TRAIN procedure enables you to generate rules for routing documents to different categories.

See Also:
TRAIN in Chapter 6, "CTX_CLS Package"
User Defined Lexer

The user-defined lexer enables you to create lexing solutions for indexing and querying languages not supported by Oracle Text such as Arabic.

See Also:
BASIC_LEXER in Chapter 2, "Indexing"
Query Templating

CONTAINS and CATSEARCH are no longer limited to their respective CONTEXT and CTXCAT grammars. Query templating enables you to use the CONTEXT grammar and associated operators in CATSEARCH queries and vice-versa.

See Also:
CATSEARCH in Chapter 1, "SQL Statements and Operators"
CREATE INDEX ONLINE Support

You can create a CONTEXT index while allowing inserts, updates, and deletes to your base table.

See Also:
CREATE INDEX in Chapter 1, "SQL Statements and Operators"
Parallel Indexing Enhancements

Parallel indexing is now supported for non-partitioned tables. You can use parallelism with CREATE INDEX and ALTER INDEX with parameters replace, resume, and sync. You can also run CTX_DDL.SYNC_INDEX and CTX_DDL.OPTIMIZE_INDEX with a parallel degree.

See Also:

CREATE INDEX in Chapter 1, "SQL Statements and Operators"

SYNC_INDEX in Chapter 7, "CTX_DDL Package"
Stem Indexing

Stem indexing enables better performance for stem ($) queries by indexing the stem form in addition to the base form.

See Also:
BASIC_LEXER in Chapter 2, "Indexing"
Chinese Lexer

New CHINESE_LEXER enables you to index traditional and simplified Chinese text more efficiently.

See Also:
CHINESE_LEXER in Chapter 2, "Indexing"
URIType indexing

You can create CONTEXT indexes on URIType columns.

See Also:
CREATE INDEX in Chapter 1, "SQL Statements and Operators"
CTXXPATH

The CTXXPATH indextype enables you to speed up ExistsNode() queries on XMLType columns.

See Also:
Syntax for CTXXPATH Indextype in Chapter 1, "SQL Statements and Operators"

Oracle9i XML Database Developer's Guide - Oracle XML DB
ORA:CONTAINS Support in ExistsNode()

You can call the CONTAINS function within an ExistsNode() statement without a Text index.

See Also:

Oracle9i XML Database Developer's Guide - Oracle XML DB

CREATE_POLICY in Chapter 7, "CTX_DDL Package".

Release 9.0.1 New Features in Oracle Text

The following sections outline the new features in this release.

Document Classification

A document classification application is one that classifies an incoming stream of documents based on their content. These applications are also known as document routing or filtering applications. For example, an online news agency might need to classify its incoming stream of articles as they arrive into categories such as politics, crime, and sports.

Oracle Text enables you to build such applications with the new CTXRULE index type. This index type indexes the rules (queries) that define classifications or routing criteria. When documents arrive, the new MATCHES operator can be used to categorize and route each document.

Note:
Oracle Text supports document classification for only plain text, XML, and HTML documents.

See Also:
CREATE INDEX and MATCHES statements in Chapter 1, "SQL Statements and Operators".

Oracle Text Application Developer's Guide for more information about document classification.
Local Partitioned Index Support

You can create local partitioned indexes on partitioned text tables. To do so, use CREATE INDEX with the LOCAL PARTITION clause. You can also rebuild partitioned indexes with ALTER INDEX.

See Also:
CREATE INDEX and ALTER INDEX in Chapter 1, "SQL Statements and Operators".
IGNORE Format Column Value

The format column in your text table allows you to specify whether binary or text data is stored in the text column.

A new format column value of IGNORE is provided. When you issue the CREATE INDEX statement and specify a format column, any row whose format column is set to IGNORE is ignored during indexing. This feature is useful for indexing text columns that contain data incompatible with text indexing such as images or raw binary data.

See Also:
CREATE INDEX in Chapter 1, "SQL Statements and Operators".
USER_DATASTORE Enhancement

When you specify your user procedure for the USER_DATSTORE, you can return permanent BLOB and CLOB locators for your IN/OUT parameter.

See Also:
USER_DATASTORE in Chapter 2, "Indexing".
New Korean Lexer

In this release, Oracle Text continues to support the indexing and querying of Korean text with a new Korean lexer, KOREAN_MORPH_LEXER. The KOREAN_MORPH_LEXER lexer offers the following benefits over the KOREAN_LEXER:
- better morphological analysis of Korean text
- faster indexing
- smaller indexes
- more accurate query searching
  
  See Also:
  KOREAN_MORPH_LEXER in Chapter 2, "Indexing".
New Japanese Lexer

In this release, Oracle Text continues to support the indexing and querying of Japanese text with a new Japanese lexer JAPANESE_LEXER. This lexer offers the following benefits over the JAPANESE_VGRAM_LEXER:
- generates a smaller index
- better query response time
- generates real word tokens resulting in better query precision
  
  See Also:
  JAPANESE_LEXER in Chapter 2, "Indexing".
XMLType Indexing

Oracle Text supports the indexing of text columns of type XMLType.

Note:
XMLType indexing is supported only for the CONTEXT index type.

See Also:
Oracle Text Application Developer's Guide for more information about XMLType indexing.
All Language Stopwords

You can create a MULTI_STOPLIST type stoplist that contains words that are to be stopped in more than one language. This new stopword type is called ALL. For example, you can use an ALL stopword when you need to index international documents that contain English fragments.

See Also:
ADD_STOPWORD in Chapter 7, "CTX_DDL Package".
UTF-16 Auto-detection

Oracle Text supports UTF-16 conversion to the database character set with the charset and Inso filters. These filters can convert documents that are UTF-16 big-endian (AL16UTF16) or little-endian (AL16UTF16LE).

Oracle Text also supports endian auto-detection when the character set column or charset filter is set to UTF16AUTO.

See Also:
CHARSET_FILTER in Chapter 2, "Indexing".
INSO_FILTER Timeout Attribute

The INSO_FILTER document filter has a new timeout attribute that allows you to specify the maximum time Oracle waits for a document to be filtered during indexing. You can use this mechanism to avoid hanging during the index operation.

See Also:
INSO_FILTER in Chapter 2, "Indexing".

XML Path Searching

XML documents can have parent-child tag structures such as the following:

<A> <B> <C> dog </C> </B </A>

In this example, tag C is a child of tag B which is a child of tag A.

Oracle Text now enables you to do path searching with the new PATH_SECTION_GROUP. This section group allows you to specify direct parentage in queries, such as to find all documents that contain the term dog in element C which is a child of element B and so on.

The new section group also allows you to do tag attribute value searching and attribute equality testing.

The new operators associated with the this feature are

INPATH

HASPATH

Oracle Text Application Developer's Guide for more information about path section searching with XML documents.

CTX_DDL Updated Procedures

The following procedures in the CTX_DDL PL/SQL package have been updated:
- CTX_DDL.SYNC_INDEX
  
  This procedure has two new parameters for specifying memory size and partition name.
- CTX_DDL.SET_ATTRIBUTE
  
  See Also:
  Chapter 7, "CTX_DDL Package".
This procedure accepts ON/OFF boolean attributes in addition to TRUE, T, FALSE,F, YES, Y, NO, and N.
CTX_DOC New Procedure
- CTX_DOC.IFILTER
  
  See Also:
  Chapter 8, "CTX_DOC Package".
Use this procedure when you need your USER_DATASTORE procedure to filter binary data to text before concatenation.
CTX_OUTPUT New Procedures

The CTX_OUTPUT package has the following new procedures:
- CTX_OUTPUT.ADD_EVENT
- CTX_OUTPUT.REMOVE_EVENT
Use the first procedure to augment the index log file with rowid information, which is useful for debugging an index operation.

See Also:
Chapter 9, "CTX_OUTPUT Package".
New and Updated Views

The following views have been updated for this release:
- CTX_VERSION
- CTX_PENDING
- CTX_USER_PENDING
The CTX_VERSION view has a new column VER_CODE which is the version number of the Oracle Text code linked in to the Oracle shadow process. Use this column to detect and verify patch releases.

The following views are new. Use the first four for querying information about sub-lexers with multi-lexer preference:

See Also:

Appendix G, "Views".