Oracle Ultra Search Online Documentation Release 9.2 |
|
Related Topics | ||
Use this page to schedule data synchronization and index optimization. Data synchronization means keeping the Ultra Search index up to date with all data sources. Index optimization means keeping the updated index optimized for best query performance.
To learn more about data synchronization, see About the Ultra Search Crawler.
The tables on this page display information about synchronization schedules. A synchronization schedule has one or more data sources assigned to it. The synchronization schedule frequency specifies when the assigned data sources should be synchronized. Schedules are sorted first by name. Within a synchronization schedule, individual data sources are listed and can be sorted by source name or source type.
Creating Synchronization Schedules
To create a new schedule, click Create New Schedule and follow these steps:
- Name the schedule
- Choose one of four frequency modes and determine whether the schedule should automatically accept all URLs for indexing or examine URLs before indexing. You can also associate the schedule with a remote crawler profile.
- Assign data sources to the schedule. After a data source has been assigned to a group, it cannot be assigned to other groups.
Note: The option to examine URLs only applies to Web data sources. Table, file, and email data sources cannot be assigned to this schedule.
Updating Schedules
You can update the indexing option in the Update Schedule page. If you choose to examine URLs before indexing for the schedule, then after you run the schedule, the schedule status is shown as "Indexing pending."
In data harvesting mode, you should begin crawling first. After crawling is done, click Examine URL to examine document URLs and status, remove unwanted documents, and start indexing. After you click Begin index, you see schedule status change from launching, executing, scheduled, and so on.
After you click the link for a specific host, you see list of document URLs that have been crawled for the host. You can delete document URLs in this section.
Editing Synchronization Schedules
After a synchronization schedule has been defined, you can do the following in the Synchronization Schedules List:
- To assign the schedule to either a crawler that runs on the database host or a remote crawler that runs on a separate host, click Hostname.
- To change its frequency, click the schedule interval text.
- To alter its status, click Status.
- To delete it, click Delete.
To edit its name, data source assignments, recrawl policy, or crawling mode, click Edit. When the crawler retrieves a document, it checks to see if it has changed. By default, if the document has not changed, the crawler does not process it. In certain situations, you might want to force the crawler to reprocess all documents. Click Edit to edit schedules in the following ways:
- Update schedule name. This step is optional. To change the schedule name, specify a name for the schedule, and click Update schedule name.
- Assign data sources to schedule. To assign a data source, choose one or more available sources and click >>. After a data source has been assigned to a group, it cannot be assigned to any other group. To undo assignments of a data source, choose one or more scheduled sources and click <<.
- Update crawler recrawl policy. When the Oracle Ultra Search crawler retrieves a Web site, file, or table source document, it checks to see if that document has changed. By default, if the document has not changed, then the Oracle Ultra Search crawler does not process it. This significantly speeds up the crawling process. However, in certain situations, you might want to force the crawler to reprocess all documents.
- Update crawling mode. This lets you update the crawling mode to the following:
- Automatically accept all URLs for indexing
- Examine URLs before indexing
- Index only
Launching Synchronization Schedules
You can launch a synchronization schedule in the following ways:
- Set a schedule frequency and wait for the predetermined launch time.
- Execute it immediately. To do so, click Status, then Execute immediately.
Note: Launching a synchronization schedule could take a very long time. If a schedule has been launched before, then the next time a schedule is launched, all URLs that belong to the data source(s) to be crawled by the schedule are copied over into a queue table. Depending on the number of URLs associated with that data source, the copy operation can potentially take a long time. The administration tool displays the schedule state as 'Launching' during the entire time.
Scheduled Optimization
To ensure fast query results, the Oracle Ultra Search crawler maintains an active index of all documents crawled over all data sources. This page lets you schedule when you would like the index to be optimized. The index should be optimized during hours of low usage.
Note: Increasing the crawler temporary directory size can reduce index fragmentation.
Optimization Process Duration
You can specify a maximum amount of time for the index optimization process. Specifying a longer optimization time results in a more optimized index. Alternatively, you can specify that the optimization continue until it is finished.
Immediate Optimization
This lets you run the index optimization process immediately.
Copyright © 2002 Oracle Corporation. All Rights Reserved. |
|