Oracle Ultra Search Online Documentation Release 9.2 |
|
Related Topics | ||
Oracle Ultra Search uses the Oracle Text engine to index and search documents. When an end user specifies a certain query string, Oracle Ultra Search takes that string and transforms it into an Oracle Text query expression. This process is called query syntax expansion.
You can customize Ultra Search to use your own implementation of the query syntax expansion. In previous releases, the default query syntax expansion implementation was contained in the WK_QUERYEXP PL/SQL package. The Contains query lets you specify a query syntax similar to most internet search engines. The syntax boosts scores for documents that match the user's query in the 'title' StringAttribute.
The Contains query lets you specify a query syntax similar to most internet search engines. The syntax boosts scores for documents that match the user's query in the 'title' StringAttribute. The syntax for Contains is the same when used on the document content and on StringAttributes.
You can customize this syntax by subclassing the Contains query and overriding the expand() method with your own implementation. In fact, you can implement the Query interface and ignore the provided Contains query, because the query API accepts any object that implements the Query interface.
This document describes how you can customize the query syntax expansion implementation to suit your organization's preferences.
The default query syntax expansion implementation directly affects the following.
- The way the end user enters a query string (known as the "End User Query Syntax")
- The way the documents matching the query are scored (known as "Scoring")
- The way the end user's query string is transformed into an Oracle Text query string (known as the "Expansion Rules")
The end user query syntax defined by the default query syntax expansion implementation is similar to the standard text query syntax employed by most search engines on the Web.
Token
A token is a string enclosed in double-quotes ("). It can be a single word or a phrase.
Operators
The default implementation defines three operators. They are the [+], [-] and [*] operators. These operators are defined by the default implementation. You can change these operators to whatever you prefer in your own custom implementation.
Plus operator [+] This specifies that the token immediately following it must appear in all documents included in the search result.
Minus operator [-] This specifies that the token immediately following it cannot appear in any document included in the search result.
Asterisk [*] This specifies a wildcard search. It matches zero or more characters. A token starting with the asterisk is ignored. The asterisk can only be specified at the end (right side) or middle of a token. For example, "hel*o" and "hell*" use the asterisk correctly, but "*ello" is unacceptable.Summary
The following table summarizes the rules for the Ultra Search end user query syntax:
Note: All end-user query strings are encased in square braces. For example, the end user query string Oracle Applications is notated as [Oracle Applications].
Rule Description Single word search Entering one word finds documents that contain that word.
For example, searching for [Oracle] finds all documents that contain the word "Oracle" anywhere in that document.
Multiple word search Entering more than one word finds documents that each contain any of those words in any order.
For example, searching for [Oracle Applications] finds documents that contain "Oracle" or "Applications" or "Oracle Applications."
Compulsory inclusion [+] Attaching a [+] in front of a word requires that the word be found in all matching documents.
For example, searching for [Oracle + Applications] only finds documents that contain the word "Applications." Note: In a multiple word search, you can attach a [+] in front of every token including the very first token.
Compulsory exclusion [-] Attaching a [-] in front of a word requires that the word must not be found in all matching documents.
For example, searching for [Oracle - Applications] only finds documents that do not contain the word "Applications". Note: In a multiple word search, you can attach a [-] in front of every token except the very first token.
Phrase Matching ["..."] Putting quotes around a set of words only finds documents that contain that precise phrase.
For example, searching for ["Oracle Applications"] finds only documents that contain the string "Oracle Applications."
Wildcard Matching [*] Attaching a [*] to the right-hand side of a word returns left side partial matches.
For example, searching for the string [Ora*] finds documents that contain all words beginning with "Ora," such as "Oracle" and "Orator." You can also insert an asterisk in the middle of a word. For example, searching for the string [A*e] retrieves documents that contain words such as "Apple", "Ate", "Ape", and so on. Wildcard matching requires more computational processing power and is generally slower than other types of queries.
There are three ways documents are matched against an end user query string. These three ways are known as scoring "classes." Documents are scored and ranked higher if they satisfy the requirements for a higher class. Within each class, documents are also ranked differently depending on how well they match the conditions of that scoring class.
Class 1 is the most heavily weighted class. The score is derived from the number of occurrences of a precise phrase in a document. A document that has more instances of the precise phrase have a higher score than another document that has fewer occurrences of the precise phrase.
Class 2 is the next more heavily weighted class. In this class, the closer the tokens appear in a document, the higher the score becomes. For example, an end user query string [Oracle Applications Financials] can result in three documents found. None of the three documents contain the precise phrase "Oracle Applications Financials." However, document X contains the all three tokens "Oracle", "Applications", and "Financials" in the same sentence separated by other words. Document Y contains the individual tokens in the same paragraph but in different sentences. Document Z contains the same three tokens, but each token resides in different paragraphs. In this scenario, document X has the highest score, because the tokens are closest together. Likewise, Y has a higher score than Z.
Class 3 is the least weighted class. A document that has more tokens gets a higher score. For example, an end user query string [Oracle Applications Financials] can result in three documents found. Document X might contain all three tokens. Document Y might contain the tokens "Oracle" and "Applications" only. Document Z might contain only the token "Oracle." In this scenario, document X has a higher score than Y. Likewise, Y has a higher score than Z.
As mentioned earlier, the end user query is expanded to an Oracle Text query. The expanded query string rules are captured in BNF (Backus Naur Form) notation. Again, these rules are the rules that Ultra Search uses as a default query syntax expansion implementation.
Rules
The rules that define an expanded query:
<expanded query> ::= (<expression> within <title section>)*2, <expression> <expression> ::= <generic query expression> | <simple query expression> <generic query expression> ::= (([ <plus expression>*100 & ]) (<main expression>)) [ <minus expression> ] <simple query expression> ::= (<phrase expression>)*2, (<main expression>) <main expression> ::= (<near expression>)*2, (<accum expression>)Some terms and their meanings, which explain some of the terms used in the preceding rules:
A <plus expression> is an AND expression of all plus tokens. A <minus expression> is a NOT expression of all minus tokens. A <phrase expression> is a PHRASE formed by all tokens in the <main expression> A <near expression> is a NEAR expression of all tokens but minus tokens. An <accum expression> is an ACCUMULATE expression of all tokens but minus tokens. A <simple query expression> is used only when the end user query has multiple tokens and does not have any operator or a double quote. Otherwise, a <generic query expression> is used. If there is no token that is neither plus token or minus token, then the <plus expression> and the <accum expression> are eliminated.Examples of Applying the Rules
The following table illustrates how the default query syntax expansion implementation converts end user query strings to Oracle Context compatible query strings.
End User Query String Expanded Query String Understandable by Oracle Text [Oracle] ((({Oracle}) within TITLE__31)*2,({Oracle}))[Oracle + Applications] ((((({Applications})*10)*10&(({Oracle};{Applications})*2,({Oracle},{Applications }))) within TITLE__31)*2,((({Applications})*10)*10&(({Oracle};{Applications})*2, ({Oracle},{Applications}))))[Oracle - Applications] (((({Oracle})~{Applications}) within TITLE__31)*2,(({Oracle})~{Applications}))["Oracle Applications"] ((({Oracle Applications}) within TITLE__31)*2,({Oracle Applications}))[Ora*] ((((Ora%)) within TITLE__31)*2,((Ora%)))[Oracle Applications] (((({Oracle Applications})*2,(({Oracle};{Applications})*2,({Oracle},{Application s}))) within TITLE__31)*2,(({Oracle Applications})*2,(({Oracle};{Applications})* 2,({Oracle},{Applications}))))
You can customize this expansion to suit your organization's purposes by defining and implementing your own query syntax expansion. To do so, you need to understand the requirements of Oracle Text queries. The details of Oracle Text queries are beyond the scope of this document. See the Oracle9i Text Application Developer's Guide and Oracle9i Text Reference to understand the requirements of Oracle Text queries.
To customize Ultra Search to use your own implementation of the query syntax expansion, use the Contains query. This finds documents that contain some text within its content or its string attributes. The Contains query does not apply to date or number attributes. If no attribute is specified, then Contains operates on the document content, instead of any attribute. A match found in the title attribute of the document will have a higher score than a match in the document content.
Constructors
Contains(StringAttribute, String, InstanceMetaData)
public Contains(StringAttribute att, java.lang.String val, InstanceMetaData instmd)This constructs a contains query on a string attribute.
Contains(String, InstanceMetaData)
public Contains(java.lang.String val, InstanceMetaData instmd)This constructs a contains query on the document content.
Methods
compile()
public java.lang.String compile()This compiles into a query string.
Specified By:
compile() in interface Query
Returns:
a query string representing this query.
expand(StringAttribute, String, InstanceMetaData)
public java.lang.String expand(StringAttribute att, java.lang.String str, InstanceMetaData instmd)This translates a user's attribute Contains query string into a text query.
Parameters:
att - a string attribute
str - the main query string
instmd - the InstanceMetaData objectReturns:
The translated Oracle Text query string (contains clause)
expand(String, InstanceMetaData)
public java.lang.String expand(java.lang.String str, InstanceMetaData instmd)This translates a user query string into a text query.
Parameters:
str - the main query string
instmd - the InstanceMetaData objectReturns:
The translated Oracle Text query string (contains clause)
Copyright © 2002 Oracle Corporation. All Rights Reserved. |
|