Oracle® Ultra Search Administrator's Guide
10g Release 2 (10.1.2) Part No. B14041-01 |
|
Previous |
Next |
This section describes Oracle Ultra Search new features, with pointers to additional information. It also explains the Oracle Ultra Search release history.
Document Service API
The Document Service crawler agent API allows generation of attribute data based on document contents. It accepts robot metatag instructions from the agent for the target document, and it transforms the original document contents for indexing control.
Keyword in Context
The query application has been designed to showcase keyword in context and highlighting features. Keyword in context shows a section of the original document that contains the search terms. Highlighting shows the entire document with the search terms in a different color.
Secure Crawling
Oracle Ultra Search provides secure crawling using the following types of authentication:
Digest Authentication Oracle Ultra Search supports HTTP digest authentication, and the Oracle Ultra Search crawler can authenticate itself to Web servers employing HTTP digest authentication scheme. This is based on a simple challenge-response paradigm; however, the password is encrypted.
HTML Form Authentication HTML form-based authentication is the most commonly used authentication scheme on the Web. Oracle Ultra Search lets you register HTML forms that you want the Oracle Ultra Search crawler to automatically fill in during Web crawling. HTML form authentication requires that HTTP cookie functionality is enabled, which is the default.
Indexing Control of Dynamically Generated Web Pages
The crawler can be configured to not index Web pages that are dynamically generated (for example, if a URL contains a question mark).
HTTPS
Oracle Ultra Search supports secure socket layer (SSL). This means that in addition to HTTP-based URLs, Oracle Ultra Search can also access HTTPS-based URLs (that is, HTTP over SSL).
Secure Search
Secure searches return only documents that the search user is allowed to view. Each indexed document can be protected by an access control list (ACL). During searches, the ACL is evaluated. If the user performing the search has permission to read the protected document, then the document is returned by the query API. Otherwise, it is not returned.
Oracle Ultra Search stores ACLs in the Oracle XML DB repository. Oracle Ultra Search also uses Oracle XML DB functionality to evaluate ACLs.
Remote Crawler JDBC Caching Support
It is now possible to use the remote crawler without mounting the remote cache directory to the server machine. Instead, the cache files are sent over the crawler's JDBC connection to the server cache directory.
Manual Launch Scheduling
A schedule can be created with no scheduled launch time, so that it can only be started on demand.
Crawler Log File Versioning
For each data source, the crawler preserves the latest three log files. This avoids erasing the previous crawling log file on recrawl.
New PL/SQL Administration APIs
Oracle Ultra Search includes APIs for various administration tasks, such as crawler, schedule, and instance administration.
Integration with Oracle Internet Directory
Oracle Internet Directory is Oracle's native LDAP v3-compliant directory service, built as an application on top of the Oracle Database. Oracle Ultra Search integrates with Oracle Internet Directory in the following areas:
Oracle Ultra Search administration groups and group membership are stored in Oracle Internet Directory.
Users are authenticated through Oracle Application Server Single Sign-On and Oracle Internet Directory.
Oracle Internet Directory performs authorization on Oracle Ultra Search users' administration privileges.
Cookie Support
Cookies remember context between HTTP requests. For example, the server can send a cookie such that it knows if a user has already logged on and does not need to log on again. Cookie support is enabled by default.
Crawler Cache Deletion Control
During crawling, documents are stored in the cache directory. Every time the preset size is reached, crawling stops and indexing starts. In previous releases, the cache file was always deleted when indexing was done. You can now specify not to delete the cache file when indexing is done. This option applies to all data sources. The default is to delete the cache file after indexing.
URL Boundary Rules Include Port Number Inclusion or Exclusion
You can set URL boundary rules to refine the crawling space. You can now include or exclude Web sites with a specific port. For example, you can include www.oracle.com
but not www.oracle.com:8080
. By default, all ports are crawled.
Hostname Prefix Allowed in Web Data Source URL Boundary Specification
In previous releases, you could only specify suffix inclusion rules. For example, crawl only URLs ending with oracle.com
. You can now also specify prefix rules. For example, crawl oracle.com
but not stores.oracle.com
.
Default Oracle Ultra Search Instance and Schema
Oracle Ultra Search automatically creates a default Oracle Ultra Search instance based on the default Oracle Ultra Search test user. So, you can test Oracle Ultra Search functionality based on the default instance after installation.
Monitoring Oracle Ultra Search Components with Oracle Enterprise Manager
You can use Enterprise Manager's Grid Control to monitor Oracle Ultra Search components. Using Grid Control, you can set up notification rules to send out e-mail notification automatically whenever a schedule status reaches certain severity states.
See Also: Oracle Enterprise Manager Concepts guide for more information on the using Grid Control to monitor Oracle Ultra Search components |
Crawler Recrawl Policy
You can update the recrawl policy to process documents that have changed or to process all documents.
In previous releases, processing all documents did not help when the crawling scope had been narrowed. For example, if crawling depth was reduced from seven to five, the PDF mimetype was deleted, or a host inclusion rule was removed, then the affected documents would have to be removed manually in a SQL*Plus session.
With this release, all crawled URLs are subject to crawler setting enforcement, not just newly crawled URLs.
Federated Search
Traditionally, Oracle Ultra Search used centralized search to gather data on a regular basis and update one index that cataloged all searchable data. This provided fast searching, but it required the data source be made crawlable before it could be searched. Oracle Ultra Search now also provides federated search, which allows multiple indexes to perform a single search. Each index can be maintained separately. By querying the data source at search-time, search results are always the latest results. User credentials can be passed to the data source and authenticated by the data source itself. Queries can be processed efficiently using the data's native format.
To use federated search, you must deploy an Oracle Ultra Search search adapter, or searchlet, and create an Oracle Database source. A searchlet is a Java module deployed in the middle tier (inside OC4J) that searches the data in an enterprise information system on behalf of a user. Every searchlet is a JCA 1.0 compliant resource adapter.
Oracle Ultra Search is released with the Oracle Database, Oracle Application Server, and Oracle Collaboration Suite. Previously, release numbers varied by Oracle product, and Oracle Ultra Search took its version number from the vehicle in which it is packaged. Therefore, later version numbers may actually be earlier versions of the product. For example, Oracle Ultra Search 9.2.0 is an older release than Oracle Ultra Search 9.0.4.
The following table shows the Oracle Ultra Search versions in increasing order:
Oracle Ultra Search Version | Release Vehicle |
---|---|
Oracle Ultra Search release 8.1.7 | Oracle database version 8.1.7 |
Oracle Ultra Search release 9.0.1 | Oracle9i release 1 (9.0.1) |
Oracle Ultra Search release 9.0.2 | Oracle9iAS release 2 (9.0.2) |
Oracle Ultra Search release 9.2 | Oracle9i release 9.2 |
Oracle Ultra Search release 9.0.3 | Oracle Collaboration Suite 9.0.3 |
Oracle Ultra Search release 9.0.4 | Oracle Application Server release 10g (9.0.4) |
Oracle Ultra Search release 10.1 | Oracle Database 10g and Oracle Application Server 10g |
Note: Beginning with release 10g, the Oracle Ultra Search backend version is always the same as the database version.For OracleAS releases, the Oracle Ultra Search version number is the same as the version of the database used in the Metadata Repository. However, there is an exception to this: If your Metadata Repository is Oracle9i release 9.2, then you will have Oracle Ultra Search release 9.0.4. |