Proactive Database Monitoring

The Oracle Database makes it easy to proactively monitor the health and performance of your database. It monitors the vital signs (or metrics) related to database health, analyzes the workload running against the database, and automatically identifies any issues that need your attention as an administrator. The identified issues are either presented as alerts and performance findings in Enterprise Manager or, if you prefer, can be sent to you through e-mail.

This section discusses the following topics:

Alerts
Performance Self-Diagnostics: Automatic Database Diagnostics Monitor
Monitoring General Database State and Workload
Managing Alerts

Alerts

Alerts help you monitor your database. Most alerts notify you of when particular metric thresholds are crossed. For each alert, you can set critical and warning threshold values. These threshold values are meant to be boundary values that when crossed indicate that the system is in an undesirable state. For example, when a tablespace becomes 97 percent full this can be considered undesirable, and Oracle will generate a critical alert.

Other alerts correspond to database events such as Snapshot Too Old or Resumable Session suspended. These types of alerts indicate that the event has occurred.

In addition to notification, you can set alerts to perform some action such as running a script. For instance, scripts that shrink tablespace objects can be useful for a Tablespace Usage warning alert.

By default, Oracle enables the following alerts:

Table Space Usage (warning at 85 percent full, critical at 97 percent full)
Snapshot Too Old
Recovery Area Low on Free Space
Resumable Session Suspended

You can modify these alerts or enable others by setting their metrics.

For more information, see "Managing Alerts".

Performance Self-Diagnostics: Automatic Database Diagnostics Monitor

Oracle Database includes a self-diagnostic engine called the Automatic Database Diagnostic Monitor (ADDM). ADDM makes it possible for the Oracle Database to diagnose its own performance and determine how any identified problems can be resolved.To facilitate automatic performance diagnosis using ADDM, Oracle Database periodically collects snapshots of the database state and workload. The default collection interval is one hour. Snapshots provide a statistical summary of the state of the system at any point in time. These snapshots are stored in the Automatic Workload Repository (AWR), residing in the SYSAUX tablespace. The snapshots are stored in this repository for a set time (a week by default) before they are purged to make room for new snapshots.ADDM analyses data captured in AWR to determine the major problems in the system and in many cases recommends solutions and quantifies expected benefits.

Generally, the performance problems ADDM can flag include the following:

Resource bottlenecks, such as when your database is using large amounts of CPU time or memory, for example as a result of high load SQL statements
Poor connection management, such as when your application is making too many logons to the database
Lock contention in a multiuser environment, when a lock to update data causes other sessions to wait, slowing down the database

Monitoring General Database State and Workload

The Enterprise Manager home page enables you to monitor the health of your database. It provides a central place for general database state information and is updated periodically. This page reports information that is helpful for monitoring database state and workload. See Figure 10-1, "Enterprise Manager Home Page".

Figure 10-1 Enterprise Manager Home Page

Description of "Figure 10-1 Enterprise Manager Home Page"

The General section provides a quick view of the database, such as whether the database is Up or Down, the time the database was last started, instance name, and host name.

The Host CPU section shows the percentage of CPU time used in the overall system. This chart breaks down CPU percentage into time used by the database and time used by other processes. If your database is taking up most of the CPU time, you can explore the cause further by looking at the Active Sessions summary. This summary tells you what the database processes are doing, such as which ones are using CPU, or waiting on I/O. You can drill down for more information by clicking a link, such as CPU.

If other processes are taking up most of your CPU time, this indicates that some other application running on the database machine may be causing the performance problems. To investigate this further, click the Host link under the General section. This link takes you to the machine overview page where you can see some general information about the machine, such as what operating system it is running, how long the machine has been up, and any potential problems.

Click the Performance property page to see a summary of CPU utilization, average active sessions, instance disk I/O, and instance throughput. Additional Monitoring Links enable you to drill down to Top Activity and other data. The type of actions you can take to improve host performance depends on your system, but can include eliminating unnecessary processes, adding memory, or adding CPUs.

On the Home page, the Diagnostic Summary summarizes the latest ADDM performance findings. This section also summarizes any critical or warning alerts listed in the Alerts section.

The Alerts table provides information about any alerts that have been issued along with the severity rating of each. An alert is a notification that a metric threshold has been crossed. For example, an alert can be triggered when a tablespace is running out of space.

When an alert is triggered, the name of the metric causing it is displayed in the Name column. The severity icon (Warning or Critical) is displayed, along with time of alert, and alert value. You can click the message to learn more about the cause. For more information, see "Alerts" .

The Performance Analysis section provides a quick summary of the latest ADDM findings, highlighting the issues that are causing the most significant performance impact. This analysis can identify problems such as SQL statements that are consuming significant database time. See Figure 10-2, "Performance Analysis on Home Page".

Figure 10-2 Performance Analysis on Home Page

Description of "Figure 10-2 Performance Analysis on Home Page"

For more information, see "Performance Self-Diagnostics: Automatic Database Diagnostics Monitor".

Managing Alerts

The following sections describe how to manage alerts.

Viewing Metrics and Thresholds

Metrics are a set of statistics for certain system attributes as defined by Oracle. They are computed and stored by the Automatic Workload Repository, and are displayed on the All Metrics page, which is viewable by clicking All Metrics under Related Links on the Database Home page.

Figure 10-3 shows a portion of the All Metrics page, displaying some of the metrics that AWR computes. When you click a specific metric link, a detail page appears, with more information about the metric. Online Help for this page gives you a description of the metric.

Figure 10-3 All Metrics Page

Description of "Figure 10-3 All Metrics Page"

For each of these metrics, you are able to define warning and critical threshold values, and whenever the threshold is crossed, Oracle issues an alert.

Alerts are displayed on the Database Home page under the Alerts heading (or Related Alerts for non-database alerts such as a component of Oracle Net). Figure 10-4 shows five alerts including two Tablespace Full warnings.

Figure 10-4 Alerts Section of Enterprise Manager Home Page

This image of the alerts section shows all current alerts.

Description of "Figure 10-4 Alerts Section of Enterprise Manager Home Page"

Setting thresholds is discussed in "Setting Metric Thresholds". Actions you might take to respond to alerts are discussed in "Responding to Alerts".

When the condition that triggered the alert is resolved and the metric's value is no longer outside the boundary, Oracle clears the alert. Metrics are important for measuring the health of the database and serve as input for self-tuning and recommendations made by Oracle advisors.

Setting Metric Thresholds

Oracle provides a set of predefined metrics, some of which initially have thresholds defined for them. There may be times when you want to set thresholds for other metrics, or you want to alter existing threshold settings.

One means of setting a threshold was introduced in "Creating a Tablespace", where you set warning and critical thresholds on the amount of space consumed in a tablespace.

A more general means of setting thresholds is available using the Edit Thresholds page.

The following steps describe how to set metric thresholds:

From the Database Home page, click Manage Metrics under the Related Links heading.

The Manage Metrics page is displayed. It displays the existing thresholds for metrics and any response actions that have been specified.
Click Edit Thresholds.

The Edit Thresholds page, shown in Figure 10-5, is displayed. On this page you can enter new Warning Threshold and Critical Threshold values, or you can modify existing values. In the Response Actions field, you can enter an operating system command or a script that you want executed when an alert is issued because a threshold has been crossed.

For example, to modify the warning threshold for Tablespace Space Used (%) metric, you can enter 87 as the percentage. Under Response Actions, you can optionally specify a fully qualified path to an operating system script that will clean up or increase the size of the tablespace.

Oracle provides a number of alerts that are not enabled by default. You can enable them by specifying threshold values. For example, to enable the alert for Cumulative Logons (for each second), enter 10 for warning and 25 for critical. This will cause the system to warn you when the number of logons each second exceeds 10.

Click OK to save your changes.

Figure 10-5 Edit Thresholds Page

Description of "Figure 10-5 Edit Thresholds Page"
For more comprehensive management of threshold settings for the different alerts, click the radio button in the Select column for that metric, then click Specify Multiple Thresholds.

The Specify Multiple Thresholds: metric_name page appears. This page enables you to specify metric thresholds for each object.

For example, to set the Tablespace Space Used (%) metric thresholds for individual tablespaces, select this metric then click Specify Multiple Thresholds. Enter the tablespace name and its warning and critical values. Click OK.

Responding to Alerts

When you receive an alert, follow any recommendations it provides, or consider running ADDM or another advisor, as appropriate to get more detailed diagnostics of system or object behavior.

For example, if you receive a Tablespace Space Usage alert, you might take a corrective measure by running the Segment Advisor on the tablespace to identify possible objects for shrinking. You can then shrink the objects to free space. See "Reclaiming Wasted Space" in Chapter 6, "Managing Database Storage Structures".

Additionally, as a response, you can set a corrective script to run as described in "Setting Metric Thresholds".

Clearing Alerts

Most alerts, such as the Out of Space alert, are cleared automatically when the cause of the problem disappears. However, other alerts such as Generic Alert Log Error are sent to you for notification and need to be acknowledged by you, the system administrator.

After taking the necessary corrective measures, you can acknowledge an alert by clearing or purging it. Clearing an alert sends the alert to the Alert History, which is viewable from the home page under Related Links. Purging an alert removes it from the Alert History.

To clear an alert such as Generic Alert Log Error, from the Home page under Diagnostic Summary, click the Alert Log link.The Alert Log Errors page appears. Select the alert to clear and click Clear. To purge an alert, select it and click Purge. You can also Clear Every Open Alert or Purge Every Alert using these buttons.

Setting up Direct Alert Notification

Enterprise Manager will display all alerts on the home page. However, you can optionally specify that Enterprise Manager provide you direct notification when specific alerts arise. For example, if you specify that you want e-mail notification for critical alerts, and you have a critical threshold set for the system response time for each call metric, then you might be sent an e-mail containing a message similar to the following:

Host Name=mydb.us.mycompany.com
Metric=Response Time per Call
Timestamp=08-NOV-2003 10:10:01 (GMT -7:00)
Severity=Critical
Message=Response time per call has exceeded the threshold. See the
lattest ADDM analysis.
Rule Name=
Rule Owner=SYSMAN

The e-mail contains a link to the host name and the latest ADDM analysis.

By default, alerts in critical state such as DB Down, Generic Alert Log Error Status, and Tablespace Used are set up for notification. However, to receive these notifications, you must set up your e-mail information.You can do so as follows:

From any Database Control page, click the Setup link, which is visible in the header and footer area.
On the Setup page, select Notification Methods.
Enter the required information into the Mail Server portion of the Notifications Methods page. See the online help for assistance.

There are other methods of notification, including scripts and SNMP (Simplified Network Management Protocol) traps. The latter can be used to communicate with third-party applications.

So far, you have set up a method of notification, but you have not set up an e-mail address to receive the notification. To do so, complete the following steps.
From any Database Control page, click the Preferences link, which is visible in the header and footer area.
On the Preferences page, select General. Enter your e-mail address in the E-mail Addresses section.
You can optionally edit notification rules, such as to change the severity state for receiving notification. To do so, select Notification Rules. The Notification Rules page appears. For more information about configuring notification rules, see Oracle Enterprise Manager Advanced Configuration.