Oracle Text Reference Release 9.2 Part Number A96518-01 |
|
This chapter describes new features of Oracle Text (formerly Oracle8i interMedia Text) and provides pointers to additional information. The following topics are covered:
The following features are new for this release:
The new CTX_CLS.TRAIN procedure enables you to generate rules for routing documents to different categories.
The user-defined lexer enables you to create lexing solutions for indexing and querying languages not supported by Oracle Text such as Arabic.
CONTAINS and CATSEARCH are no longer limited to their respective CONTEXT and CTXCAT grammars. Query templating enables you to use the CONTEXT grammar and associated operators in CATSEARCH queries and vice-versa.
You can create a CONTEXT index while allowing inserts, updates, and deletes to your base table.
Parallel indexing is now supported for non-partitioned tables. You can use parallelism with CREATE INDEX
and ALTER INDEX
with parameters replace
, resume
, and sync
. You can also run CTX_DDL.SYNC_INDEX
and CTX_DDL.OPTIMIZE_INDEX
with a parallel degree.
See Also: |
Stem indexing enables better performance for stem ($) queries by indexing the stem form in addition to the base form.
New CHINESE_LEXER enables you to index traditional and simplified Chinese text more efficiently.
You can create CONTEXT indexes on URIType columns.
The CTXXPATH indextype enables you to speed up ExistsNode() queries on XMLType columns.
You can call the CONTAINS function within an ExistsNode() statement without a Text index.
The following sections outline the new features in this release.
A document classification application is one that classifies an incoming stream of documents based on their content. These applications are also known as document routing or filtering applications. For example, an online news agency might need to classify its incoming stream of articles as they arrive into categories such as politics, crime, and sports.
Oracle Text enables you to build such applications with the new CTXRULE
index type. This index type indexes the rules (queries) that define classifications or routing criteria. When documents arrive, the new MATCHES
operator can be used to categorize and route each document.
See Also:
CREATE INDEX and MATCHES statements in Chapter 1, "SQL Statements and Operators". Oracle Text Application Developer's Guide for more information about document classification. |
You can create local partitioned indexes on partitioned text tables. To do so, use CREATE INDEX
with the LOCAL
PARTITION
clause. You can also rebuild partitioned indexes with ALTER INDEX
.
See Also:
CREATE INDEX and ALTER INDEX in Chapter 1, "SQL Statements and Operators". |
The format column in your text table allows you to specify whether binary or text data is stored in the text column.
A new format column value of IGNORE
is provided. When you issue the CREATE INDEX
statement and specify a format column, any row whose format column is set to IGNORE
is ignored during indexing. This feature is useful for indexing text columns that contain data incompatible with text indexing such as images or raw binary data.
When you specify your user procedure for the USER_DATSTORE
, you can return permanent BLOB
and CLOB
locators for your IN/OUT
parameter.
In this release, Oracle Text continues to support the indexing and querying of Korean text with a new Korean lexer, KOREAN_MORPH_LEXER
. The KOREAN_MORPH_LEXER
lexer offers the following benefits over the KOREAN_LEXER
:
In this release, Oracle Text continues to support the indexing and querying of Japanese text with a new Japanese lexer JAPANESE_LEXER
. This lexer offers the following benefits over the JAPANESE_VGRAM_LEXER
:
Oracle Text supports the indexing of text columns of type XMLType
.
See Also:
Oracle Text Application Developer's Guide for more information about |
You can create a MULTI_STOPLIST
type stoplist that contains words that are to be stopped in more than one language. This new stopword type is called ALL. For example, you can use an ALL stopword when you need to index international documents that contain English fragments.
Oracle Text supports UTF-16 conversion to the database character set with the charset and Inso filters. These filters can convert documents that are UTF-16 big-endian (AL16UTF16) or little-endian (AL16UTF16LE).
Oracle Text also supports endian auto-detection when the character set column or charset filter is set to UTF16AUTO
.
The INSO_FILTER
document filter has a new timeout attribute that allows you to specify the maximum time Oracle waits for a document to be filtered during indexing. You can use this mechanism to avoid hanging during the index operation.
XML documents can have parent-child tag structures such as the following:
<A> <B> <C> dog </C> </B </A>
In this example, tag C is a child of tag B which is a child of tag A.
Oracle Text now enables you to do path searching with the new PATH_SECTION_GROUP
. This section group allows you to specify direct parentage in queries, such as to find all documents that contain the term dog in element C which is a child of element B and so on.
The new section group also allows you to do tag attribute value searching and attribute equality testing.
The new operators associated with the this feature are
INPATH
HASPATH
See Also:
INPATH and HASPATH operators in Chapter 3, "CONTAINS Query Operators". Oracle Text Application Developer's Guide for more information about path section searching with XML documents. |
The following procedures in the CTX_DDL
PL/SQL package have been updated:
CTX_DDL.SYNC_INDEX
This procedure has two new parameters for specifying memory size and partition name.
CTX_DDL.SET_ATTRIBUTE
This procedure accepts ON/OFF
boolean attributes in addition to TRUE
, T
, FALSE
,F
, YES
, Y
, NO
, and N
.
CTX_DOC.IFILTER
Use this procedure when you need your USER_DATASTORE
procedure to filter binary data to text before concatenation.
The CTX_OUTPUT
package has the following new procedures:
CTX_OUTPUT.ADD_EVENT
CTX_OUTPUT.REMOVE_EVENT
Use the first procedure to augment the index log file with rowid information, which is useful for debugging an index operation.
The following views have been updated for this release:
The CTX_VERSION
view has a new column VER_CODE
which is the version number of the Oracle Text code linked in to the Oracle shadow process. Use this column to detect and verify patch releases.
The following views are new. Use the first four for querying information about sub-lexers with multi-lexer preference:
|
Copyright © 1998, 2002 Oracle Corporation. All Rights Reserved. |
|