Skip Headers
Oracle® Text Application Developer's Guide
10g Release 2 (10.2)

Part Number B14217-01
Go to Documentation Home
Home
Go to Book List
Book List
Go to Index
Index
Go to Master Index
Master Index
Go to Feedback page
Feedback

Go to next page
Next
View PDF

Contents

List of Figures

List of Tables

Title and Copyright Information

Send Us Your Comments

Preface

Audience
Documentation Accessibility
Structure
Related Documents
Conventions

1 Understanding Oracle Text Application Development

1.1 What is Oracle Text?
1.2 Designing Your Application
1.3 Text Queries on Document Collections
1.3.1 Flowchart of Text Query Application
1.4 Queries on Catalog Information
1.4.1 Flowchart for Catalog Query Application
1.5 Document Classification
1.6 XML Searching
1.6.1 Using Oracle Text
1.6.2 Using the Oracle XML DB Framework
1.6.3 Combining Oracle Text features with Oracle XML DB
1.6.3.1 Using the Text-on-XML Method
1.6.3.2 Using the XML-on-Text Method

2 Getting Started with Oracle Text

2.1 Overview of Getting Started with Oracle Text
2.2 Creating an Oracle Text User
2.3 Query Application Quick Tour
2.3.1 Building Web Applications with the Oracle Text Wizard
2.3.1.1 Oracle JDeveloper
2.3.1.2 Oracle Text Wizard Addins
2.3.1.3 Oracle Text Wizard Instructions
2.4 Catalog Application Quick Tour
2.5 Classification Application Quick Tour
2.5.1 Steps for Creating a Classification Application

3 Indexing with Oracle Text

3.1 About Oracle Text Indexes
3.1.1 Types of Oracle Text Indexes
3.1.2 Structure of the Oracle Text CONTEXT Index
3.1.2.1 Merged Word and Theme Index
3.1.3 The Oracle Text Indexing Process
3.1.3.1 Datastore Object
3.1.3.2 Filter Object
3.1.3.3 Sectioner Object
3.1.3.4 Lexer Object
3.1.3.5 Indexing Engine
3.1.4 Partitioned Tables and Indexes
3.1.4.1 Querying Partitioned Tables
3.1.5 Creating an Index Online
3.1.6 Parallel Indexing
3.1.7 Indexing and Views
3.2 Considerations For Indexing
3.2.1 Location of Text
3.2.1.1 Supported Column Types
3.2.1.2 Storing Text in the Text Table
3.2.1.3 Storing File Path Names
3.2.1.4 Storing URLs
3.2.1.5 Storing Associated Document Information
3.2.1.6 Format and Character Set Columns
3.2.1.7 Supported Document Formats
3.2.1.8 Summary of DATASTORE Types
3.2.2 Document Formats and Filtering
3.2.2.1 No Filtering for HTML
3.2.2.2 Filtering Mixed-Format Columns
3.2.2.3 Custom Filtering
3.2.3 Bypassing Rows for Indexing
3.2.4 Document Character Set
3.2.4.1 Mixed Character Set Columns
3.2.5 Document Language
3.2.5.1 Languages Features Outside BASIC_LEXER
3.2.5.2 Indexing Multi-language Columns
3.2.6 Indexing Special Characters
3.2.6.1 Printjoins Character
3.2.6.2 Skipjoins Character
3.2.6.3 Other Characters
3.2.7 Case-Sensitive Indexing and Querying
3.2.8 Language-Specific Features
3.2.8.1 Indexing Themes
3.2.8.2 Base-Letter Conversion for Characters with Diacritical Marks
3.2.8.3 Alternate Spelling
3.2.8.4 Composite Words
3.2.8.5 Korean, Japanese, and Chinese Indexing
3.2.9 Fuzzy Matching and Stemming
3.2.10 Better Wildcard Query Performance
3.2.11 Document Section Searching
3.2.12 Stopwords and Stopthemes
3.2.12.1 Multi-Language Stoplists
3.2.13 Index Performance
3.2.14 Query Performance and Storage of LOB Columns
3.3 Index Creation
3.3.1 Procedure for Creating a CONTEXT Index
3.3.2 Creating Preferences
3.3.2.1 Datastore Examples
3.3.2.2 NULL_FILTER Example: Indexing HTML Documents
3.3.2.3 PROCEDURE_FILTER Example
3.3.2.4 BASIC_LEXER Example: Setting Printjoins Characters
3.3.2.5 MULTI_LEXER Example: Indexing a Multi-Language Table
3.3.2.6 BASIC_WORDLIST Example: Enabling Substring and Prefix Indexing
3.3.3 Creating Section Groups for Section Searching
3.3.3.1 Example: Creating HTML Sections
3.3.4 Using Stopwords and Stoplists
3.3.4.1 Multi-Language Stoplists
3.3.4.2 Stopthemes and Stopclasses
3.3.4.3 PL/SQL Procedures for Managing Stoplists
3.3.5 Creating an Index
3.3.6 Creating a CONTEXT Index
3.3.6.1 CONTEXT Index and DML
3.3.6.2 Default CONTEXT Index Example
3.3.6.3 Custom CONTEXT Index Example: Indexing HTML Documents
3.3.7 Creating a CTXCAT Index
3.3.7.1 CTXCAT Index and DML
3.3.7.2 About CTXCAT Sub-Indexes and Their Costs
3.3.7.3 Creating CTXCAT Sub-indexes
3.3.7.4 Creating CTXCAT Index
3.3.8 Creating a CTXRULE Index
3.3.8.1 Create a Table of Queries
3.3.8.2 Create the CTXRULE Index
3.3.8.3 Classifying a Document
3.4 Index Maintenance
3.4.1 Viewing Index Errors
3.4.2 Dropping an Index
3.4.3 Resuming Failed Index
3.4.3.1 Example: Resuming a Failed Index
3.4.4 Rebuilding an Index
3.4.4.1 Example: Rebuilding and Index
3.4.5 Dropping a Preference
3.4.5.1 Example
3.5 Managing DML Operations for a CONTEXT Index
3.5.1 Viewing Pending DML
3.5.2 Synchronizing the Index
3.5.2.1 Setting Background DML
3.5.3 Index Optimization
3.5.3.1 CONTEXT Index Structure
3.5.3.2 Index Fragmentation
3.5.3.3 Document Invalidation and Garbage Collection
3.5.3.4 Single Token Optimization
3.5.3.5 Viewing Index Fragmentation and Garbage Data
3.5.3.6 Examples: Optimizing the Index

4 Querying with Oracle Text

4.1 Overview of Queries
4.1.1 Querying with CONTAINS
4.1.1.1 CONTAINS SQL Example
4.1.1.2 CONTAINS PL/SQL Example
4.1.1.3 Structured Query with CONTAINS
4.1.2 Querying with CATSEARCH
4.1.2.1 CATSEARCH SQL Query
4.1.2.2 CATSEARCH Example
4.1.3 Querying with MATCHES
4.1.3.1 MATCHES SQL Query
4.1.3.2 MATCHES PL/SQL Example
4.1.4 Word and Phrase Queries
4.1.4.1 CONTAINS Phrase Queries
4.1.4.2 CATSEARCH Phrase Queries
4.1.5 Querying Stopwords
4.1.6 ABOUT Queries and Themes
4.1.6.1 Querying Stopthemes
4.1.7 Query Expressions
4.1.7.1 CONTAINS Operators
4.1.7.2 CATSEARCH Operator
4.1.7.3 MATCHES Operator
4.1.8 Case-Sensitive Searching
4.1.8.1 Word Queries
4.1.8.2 ABOUT Queries
4.1.9 Query Feedback
4.1.10 Query Explain Plan
4.1.11 Using a Thesaurus in Queries
4.1.12 Document Section Searching
4.1.13 Using Query Templating
4.1.14 Query Rewrite
4.1.15 Query Relaxation
4.1.16 Query Language
4.1.17 Alternative Scoring
4.1.18 Alternative Grammar
4.1.19 Query Analysis
4.1.20 Other Query Features
4.2 The CONTEXT Grammar
4.2.1 ABOUT Query
4.2.2 Logical Operators
4.2.3 Section Searching
4.2.4 Proximity Queries with NEAR and NEAR_ACCUM Operators
4.2.5 Fuzzy, Stem, Soundex, Wildcard and Thesaurus Expansion Operators
4.2.6 Using CTXCAT Grammar
4.2.7 Stored Query Expressions
4.2.7.1 Defining a Stored Query Expression
4.2.7.2 SQE Example
4.2.8 Calling PL/SQL Functions in CONTAINS
4.2.9 Optimizing for Response Time
4.2.9.1 Other Factors that Influence Query Response Time
4.2.10 Counting Hits
4.2.10.1 SQL Count Hits Example
4.2.10.2 Counting Hits with a Structured Predicate
4.2.10.3 PL/SQL Count Hits Example
4.3 The CTXCAT Grammar
4.3.1 Using CONTEXT Grammar with CATSEARCH

5 Presenting Documents in Oracle Text

5.1 Highlighting Query Terms
5.1.1 Text highlighting
5.1.2 Theme Highlighting
5.1.3 CTX_DOC Highlighting Procedures
5.1.3.1 Markup Procedure
5.1.3.2 Highlight Procedure
5.1.3.3 Concordance
5.2 Obtaining Lists of Themes, Gists, and Theme Summaries
5.2.1 Lists of Themes
5.2.1.1 In-Memory Themes
5.2.1.2 Result Table Themes
5.2.2 Gist and Theme Summary
5.2.2.1 In-Memory Gist
5.2.2.2 Result Table Gists
5.2.2.3 Theme Summary
5.3 Document Presentation and Highlighting
5.3.1 Highlighting Example
5.3.2 Document List of Themes Example
5.3.3 Gist Example

6 Classifying Documents in Oracle Text

6.1 Overview
6.1.1 Classification Applications
6.2 Classification Solutions
6.3 Rule-Based Classification
6.3.1 Rule-based Classification Example
6.3.2 CTXRULE Parameters and Limitations
6.4 Supervised Classification
6.4.1 Decision Tree Supervised Classification
6.4.1.1 Decision Tree Supervised Classification Example
6.4.2 SVM-Based Supervised Classification
6.4.2.1 SVM-Based Supervised Classification Example
6.5 Unsupervised Classification (Clustering)
6.5.1 Clustering Example

7 Tuning Oracle Text

7.1 Optimizing Queries with Statistics
7.1.1 Collecting Statistics
7.1.1.1 Example
7.1.2 Re-Collecting Statistics
7.1.3 Deleting Statistics
7.2 Optimizing Queries for Response Time
7.2.1 Other Factors that Influence Query Response Time
7.2.2 Improved Response Time with FIRST_ROWS(n) for ORDER BY Queries
7.2.2.1 About the FIRST_ROWS Hint
7.2.3 Improved Response Time using Local Partitioned CONTEXT Index
7.2.3.1 Range Search on Partition Key Column
7.2.3.2 ORDER BY Partition Key Column
7.2.4 Improved Response Time with Local Partitioned Index for Order by Score
7.3 Optimizing Queries for Throughput
7.3.1 CHOOSE and ALL ROWS Modes
7.3.2 FIRST_ROWS Mode
7.4 Tracing
7.5 Parallel Queries
7.6 Tuning Queries with Blocking Operations
7.7 Frequently Asked Questions a About Query Performance
7.7.1 What is Query Performance?
7.7.2 What is the fastest type of text query?
7.7.3 Should I collect statistics on my tables?
7.7.4 How does the size of my data affect queries?
7.7.5 How does the format of my data affect queries?
7.7.6 What is a functional versus an indexed lookup?
7.7.7 What tables are involved in queries?
7.7.8 Does sorting the results slow a text-only query?
7.7.9 How do I make a ORDER BY score query faster?
7.7.10 Which Memory Settings Affect Querying?
7.7.11 Does out of line LOB storage of wide base table columns improve performance?
7.7.12 How can I make a CONTAINS query on more than one column faster?
7.7.13 Is it OK to have many expansions in a query?
7.7.14 How can local partition indexes help?
7.7.15 Should I query in parallel?
7.7.16 Should I index themes?
7.7.17 When should I use a CTXCAT index?
7.7.18 When is a CTXCAT index NOT suitable?
7.7.19 What optimizer hints are available, and what do they do?
7.8 Frequently Asked Questions About Indexing Performance
7.8.1 How long should indexing take?
7.8.2 Which index memory settings should I use?
7.8.3 How much disk overhead will indexing require?
7.8.4 How does the format of my data affect indexing?
7.8.5 Can parallel indexing improve performance?
7.8.6 How can I improve index performance for creating local partitioned index?
7.8.7 How can I tell how much indexing has completed?
7.9 Frequently Asked Questions About Updating the Index
7.9.1 How often should I index new or updated records?
7.9.2 How can I tell when my indexes are getting fragmented?
7.9.3 Does memory allocation affect index synchronization?

8 Searching Document Section in Oracle Text

8.1 About Oracle Text Document Section Searching
8.1.1 Enabling Oracle Text Section Searching
8.1.1.1 Create a Section Group
8.1.1.2 Define Your Sections
8.1.1.3 Index Your Documents
8.1.1.4 Section Searching with the WITHIN Operator
8.1.1.5 Path Searching with INPATH and HASPATH Operators
8.1.2 Oracle Text Section Types
8.1.2.1 Zone Section
8.1.2.2 Field Section
8.1.2.3 Stop Section
8.1.2.4 MDATA Section
8.1.2.5 Attribute Section
8.1.2.6 Special Sections
8.2 HTML Section Searching with Oracle Text
8.2.1 Creating HTML Sections
8.2.2 Searching HTML Meta Tags
8.2.2.1 Example: Creating Sections for <META>Tags
8.3 XML Section Searching with Oracle Text
8.3.1 Automatic Sectioning
8.3.2 Attribute Searching
8.3.2.1 Creating Attribute Sections
8.3.2.2 Searching Attributes with the INPATH Operator
8.3.3 Creating Document Type Sensitive Sections
8.3.4 Path Section Searching
8.3.4.1 Creating an Index with PATH_SECTION_GROUP
8.3.4.2 Top-Level Tag Searching
8.3.4.3 Any-Level Tag Searching
8.3.4.4 Direct Parentage Searching
8.3.4.5 Tag Value Testing
8.3.4.6 Attribute Searching
8.3.4.7 Attribute Value Testing
8.3.4.8 Path Testing
8.3.4.9 Section Equality Testing with HASPATH

9 Working With a Thesaurus in Oracle Text

9.1 Overview of Oracle Text Thesaurus Features
9.1.1 Oracle Text Thesaurus Creation and Maintenance
9.1.1.1 CTX_THES Package
9.1.1.2 Thesaurus Operators
9.1.1.3 ctxload Utility
9.1.2 Using a Case-sensitive Thesaurus
9.1.3 Using a Case-insensitive Thesaurus
9.1.4 Default Thesaurus
9.1.5 Supplied Thesaurus
9.1.5.1 Supplied Thesaurus Structure and Content
9.1.5.2 Supplied Thesaurus Location
9.2 Defining Terms in a Thesaurus
9.2.1 Defining Synonyms
9.2.2 Defining Hierarchical Relations
9.3 Using a Thesaurus in a Query Application
9.3.1 Loading a Custom Thesaurus and Issuing Thesaurus-based Queries
9.3.1.1 Advantage
9.3.1.2 Limitations
9.3.2 Augmenting Knowledge Base with Custom Thesaurus
9.3.2.1 Advantage
9.3.2.2 Limitations
9.3.2.3 Linking New Terms to Existing Terms
9.3.2.4 Loading a Thesaurus with ctxload
9.3.2.5 Compiling a Loaded Thesaurus
9.4 About the Supplied Knowledge Base
9.4.1 Adding a Language-Specific Knowledge Base
9.4.1.1 Limitations

10 Administering Oracle Text

10.1 Oracle Text Users and Roles
10.1.1 CTXSYS User
10.1.2 CTXAPP Role
10.1.3 Granting Roles and Privileges to Users
10.2 DML Queue
10.3 The CTX_OUTPUT Package
10.4 The CTX_REPORT Package
10.5 Servers
10.6 Administration Tool

11 Migrating Oracle Text Applications

11.1 Migrating to Oracle Text 10g Release 2 (10.2)
11.1.1 New Filter (INSO_FILTER versus AUTO_FILTER)
11.1.1.1 Migrating to the AUTO_FILTER Filter Type
11.2 Migrating to Oracle Text 10g Release 1 (10.1)
11.2.1 Security Improvements in Oracle Text 10g Release 1
11.2.1.1 CTXSYS No Longer Has DBA Permissions
11.2.1.2 Migrating CTXSYS-Owned Procedures
11.2.1.3 Effective User During Indexing
11.2.1.4 Procedures Do Not Need to Be Owned by CTXSYS
11.2.1.5 Synching and Optimizing of Other Users' Indexes
11.2.1.6 CTX Packages and Invoker's Rights
11.2.1.7 CREATE TABLE Permissions
11.2.2 Migrating Back to Previous Releases from Release 10.1

A CONTEXT Query Application

A.1 Web Query Application Overview
A.2 The PSP Web Application
A.2.1 Web Application Prerequisites
A.2.2 Building the Web Application
A.2.3 PSP Sample Code
A.2.3.1 loader.ctl
A.2.3.2 loader.dat
A.2.3.3 search_htmlservices.sql
A.2.3.4 search_html.psp
A.3 The JSP Web Application
A.3.1 Web Application Prerequisites
A.3.2 JSP Sample Code
A.3.2.1 search_html.jsp

B CATSEARCH Query Application

B.1 CATSEARCH Web Query Application Overview
B.2 The JSP Web Application
B.2.1 Building the JSP Web Application
B.2.2 JSP Sample Code
B.2.2.1 loader.ctl
B.2.2.2 loader.dat
B.2.2.3 catalogSearch.jsp

Glossary

Index