Skip Headers

Oracle Enterprise Manager Event Test Reference Manual
Release 9.2.0

Part Number A96675-01
Go To Documentation Library
Home
Go To Product List
Book List

Master Index

Feedback
Go To Table Of Contents
Contents

Go to previous page Go to next page

1
Overview

The Event System within Oracle Enterprise Manager assists the DBA with automatic problem detection and correction. Using the Event System, the DBA can establish boundary thresholds for warning and critical conditions within the network environment for problem monitoring.

The Enterprise Manager base product comes with a set of event tests called Base Event Tests. These event tests consist of UpDown event tests that check whether a database, listener, or node is available. "Base Event Tests" gives a brief description of these UpDown event tests.

More comprehensive monitoring is available through Advanced Event Tests. This manual provides a complete description of all the events available through Oracle Enterprise Manager. The sub-categories of events are:

Base event tests are included as part of the Enterprise Manager base product and do not require an additional license. To use all the other event tests, you must have licensed the Oracle Diagnostics Pack, the Oracle Management Pack for Oracle Applications (for the Concurrent Manager events), or the Oracle Management Pack for SAP R/3 (for the SAP R/3 events).


Note:

For information on using the Oracle Enterprise Manager Event System, see the Oracle Enterprise Manager Administrator's Guide.


Base Event Tests

The Base Event Tests are provided with the Enterprise Manager base product and consist of the UpDown event tests. These event tests check whether a database, listener, or node is available. With the UpDown event for databases or listeners, you can use the Startup Database or Startup Listener task as a fixit job to restart the database or listener. See Descriptions of Base and Common Node Event Tests for a full description of these events.

Table 1-1 Base Event Tests  
UpDown Event Test Description

Data Gatherer UpDown

This event test checks whether the Intelligent Agent data gathering service on a node can be accessed from the Console. If the Intelligent Agent data gathering service is down, this test is triggered. Note: This event test is valid only for releases of the Intelligent Agent prior to release 9i.

Database UpDown

This event test checks whether the database being monitored is running. If this test is triggered, other database events are not ignored.

Note: If the listener serving a database is down, this event may be triggered because the Intelligent Agent uses the listener to communicate with the database. This note applies to Intelligent Agents released before 8.0.5.

(See User Audit for additional information.)

EM Web Site UpDown

This event test, introduced in Oracle9iAS Release 2 (9.0.2), checks whether the Enterprise Manager Web Site is running. A critical alert is generated whenever the value is 0, that is, whenever the Enterprise Manager Web Site stops.

HTTP Server UpDown

This event test checks whether the HTTP server being monitored is running.

HTTP Server UpDown (Oracle9iAS Release 2 (9.0.2)

This event test checks whether the HTTP server is running. A critical alert is generated whenever the value is 0, that is, whenever the HTTP server stops.

JServ UpDown

This event test, introduced in Oracle9iAS Release 2 (9.0.2), checks whether JServ is running. A critical alert is generated whenever the value is 0, that is, whenever JServ stops.

Listener Oracle Net UpDown

This event test checks whether the listener on the node being monitored is available. This test is a listener fault management event test.

Note: The Startup Listener job task can be set up as a fixit job for automatically correcting the problem.

Node UpDown

This event test checks the status of the target node as well as the agent. If the agent is down or communication between the node and the Management Server is lost, this test is triggered.

The node up/down event test differs from other event tests because this test is initiated by the Management Server, not the Agent. By default, this check is performed every 2 minutes and is NOT controlled by the event's polling schedule.

OC4J UpDown

This event test, introduced in Oracle9iAS Release 2 (9.0.2), checks whether the OC4J server is running. A critical alert is generated whenever the value is 0, that is, whenever OC4J stops.

Web Cache UpDown

This event test, introduced in Oracle9iAS Release 2 (9.0.2), checks whether Web Cache is running. A critical alert is generated whenever the value is 0, that is, whenever Web Cache stops.

User-Defined Event Test

This event test allows you to define your own script.

Table 1-2 User-Defined Event Test
Event Test Description

User-Defined Event Test

User-Defined Event tests allow you to define events based on your own monitoring scripts. The monitoring scripts can be written in any language, as long as the monitored node has the appropriate runtime requirements for the script. User-Defined Event tests thus allow administrators to extend the Event system to monitor any type of service or condition specific to their environments. Refer to the Oracle Enterprise Manager Administrator's Guide for more information on setting up User-Defined Event tests.

User-Defined SQL Event Test

This event test allows you to define your own SQL script.

Table 1-3 User-Defined SQL Event Test  
Event Test Description

User-Defined SQL Event Test

The User-Defined SQL event test allows you to define your own SQL script that evaluates an event test. The event tests you define should be written as queries (i.e. SELECT statements) that return condition values for which you are monitoring. These values are checked against the Critical and Warning threshold limits you specify, and trigger the event if the threshold limits are reached.

Example: You have a custom application that runs against the Oracle database. Each time it finds an application error, it creates an entry into a table called "error_log". Using the "User-Defined SQL Test", you can write an event test that notifies you when it finds at least 50 errors. Specifically, you define the following SQL statement:

select count(*) from error_log

This returns the number of rows in the error_log table. Since you want a critical alert raised when it reaches at least 50, you specify the Operator ">=", a Critical value of 50, and perhaps a Warning value of 30.

If your query for the event condition requires more complex processing than is allowed in a single SELECT statement, you can first create a pl/sql function that contains the extra processing steps, and then use the pl/sql function with the User-Defined SQL event test. (See User-Defined SQL Event Test for additional information.)

Microsoft® SQL Server Event Test

This test checks whether the Microsoft SQL Server being monitored is running.

Table 1-4 Microsoft SQL Server Event Test
Event Test Description

UpDown (SQL Server)

This test checks whether the Microsoft SQL Server being monitored is running.

SQL server is installed as a service on Windows NT platforms. You can either start the server from the NT service manager or using SQL Server Enterprise Manager. This service can also be started from the command line using the "net start mssqlsever" command.

On Windows95 and Windows 98 environments where services are not available, SQL server can be started by executing the following command:

C:\> sqlservr -c -dc <full path name of master database> -ec <location of the log 
file>

Master database is one of the SQL server system databases which holds its dictionary information. This master database is similar to the Oracle SYSTEM tablespace except that it is shared across all SQL Server databases on a node. This command can also be used to start SQL Server as a foreground process on Windows NT.

(See UpDown (SQL Server) for additional information.)

Common Node Event Tests

The Common Node Event Tests apply to all operating system platforms that can run the Oracle Intelligent Agent. The Node event tests are divided into the following categories:

See Descriptions of Base and Common Node Event Tests for a full description of these events.

Table 1-5 Node Fault Management Event Test  
Event Test Description

Data Gatherer Alert

This event test signifies that the Data Gatherer has generated errors to the Data Gatherer alert file since the last sample time. The Data Gatherer alert file is a special trace file containing a chronological log of messages and errors. Note that the Data Gatherer alert log file is different than the Database alert log file. An alert is displayed when Data Gatherer (ODG-xxxxx) messages are written to the Data Gatherer alert file.

Data Gatherer UpDown

This test checks whether the Intelligent Agent data gathering service on a node can be accessed from the Console. If the Intelligent Agent data gathering service is down, this test is triggered.

Node UpDown

This event test checks the status of the target node as well as the agent. If the agent is down or communication between the node and the Management Server is lost, this test is triggered.

The node up/down event test differs from other event tests because this test is initiated by the Management Server, not the Agent. By default, this check is performed every 2 minutes and is NOT controlled by the event's polling schedule.

Table 1-6 Node Performance Management Event Tests  
Event Test Description

CPU Paging

This test checks the CPU paging rate (kilobytes/second paged in/out) against the threshold values specified by the threshold arguments. If the number of occurrences exceeds the values specified, then a warning or critical alert is generated.

CPU Utilization

This test checks for the CPU utilization (percentage used) against the threshold values specified by the threshold arguments. If the number of occurrences exceeds the values specified, then a warning or critical alert is generated.

Table 1-7 Node Space Management Event Tests  
Event Test Description

Disk Full

This test checks for available space on the disk specified by the disk name parameter, such as c: (Windows) or /tmp (UNIX). If the space available is less than the values specified in the threshold arguments, then a warning or critical alert is generated.

Disk Full (%)

This event test monitors the same file systems as the Disk Full event test. The Disk Full (%) event test, however, returns the percentage of space remaining on the disk destinations.

Swap Full

This test checks for available swap space. If the space available falls below the values specified in the threshold arguments, then a warning or critical alert is generated.

Descriptions of Base and Common Node Event Tests

Alert (Data Gatherer)

This event test signifies that the Data Gatherer has generated errors to the Data Gatherer alert file since the last sample time. The Data Gatherer alert file is a special trace file containing a chronological log of messages and errors. Note that the Data Gatherer alert log file is different than the Database alert log file. An alert is displayed when Data Gatherer (ODG-xxxxx) messages are written to the Data Gatherer alert file.

Parameters

None

Output

Alert log error messages since the last sample time.

Recommended Frequency

60 seconds

User Action

Examine the Data Gatherer alert log file (alert_dg.log) for additional information. The alert log file can be found in the ORACLE_HOME/odg/log directory for the Intelligent Agent.

Note: This event test is valid only for releases of the Intelligent Agent prior to 9i.

CPU Paging

This event test checks the CPU paging rate (kilobytes/second paged in/out) against the threshold values specified by the threshold arguments. If the number of occurrences exceeds the values specified, then a warning or critical alert is generated.

Parameters
Output

Current rate

CPU Utilization

This event test checks for the CPU utilization (percentage used) against the threshold values specified by the threshold arguments. If the number of occurrences exceeds the values specified, then a warning or critical alert is generated.

Parameters
Output

Current value

Disk Full

This event test checks for available space on the disk specified by the disk name parameter, such as c: (Windows) or /tmp (UNIX). If the space available is less than the values specified in the threshold arguments, then a warning or critical alert is generated.

Parameters
Output

Disk name and space available in kilobytes on the disk.

Disk Full (%)

This event test monitors the same file systems as the Disk Full event test. The Disk Full (%) event test, however, returns the percentage of space remaining on the disk destinations.

Parameters
Output

Disk name and percentage of space available on the disk.

EM Web Site UpDown

See EM Web Site UpDown.

HTTP Server UpDown

This event test checks whether the HTTP server being monitored is running.

Parameters

None

HTTP Server UpDown for Oracle9iAS Release 2 (9.0.2)

See HTTP Server UpDown.

JServ UpDown

See JServ UpDown.

OC4J UpDown

See OC4J UpDown.

Oracle Net UpDown

This event test checks whether the listener on the node being monitored is available. This event test is a listener fault management event test.

Parameters

None

User Action

The Startup Listener job task can be set up as a fixit job for automatically correcting the problem. To avoid the fixit job executing when the listener was brought down intentionally, turn off the fixit job option.

Swap Full

This event test checks for available swap space. If the space available falls below the values specified in the threshold arguments, then a warning or critical alert is generated.

Parameters
Output

Percentage of available space.

UpDown (Data Gatherer)

This event test checks whether the Intelligent Agent data gathering service on a node can be accessed from the Console. If the Intelligent Agent data gathering service is down, this test is triggered.

Parameters

None

Output

None

Recommended Frequency

60 seconds

User Action

Restart the Oracle Data Gatherer.

Note: This event test is valid only for releases of the Intelligent Agent prior to 9i.

UpDown (Node)

This event test checks the status of the target node as well as the agent. If the agent is down or communication between the node and the Management Server is lost, this test is triggered.

The node up/down event test differs from other event tests because this test is initiated by the Management Server, not the Agent. By default, this check is performed every 2 minutes and is NOT controlled by the event's polling schedule.

Parameters

None

Possible Error Messages and User Actions

If the node Up/Down event test identifies a problem, one of the following messages may be generated:


VNI-4009: Cannot contact agent on the node -- agent may be down or network communication to the node has failed.

Cause: There may be network congestion or problems with the hardware/software on the node.

Action: Check the node and make sure it operational. Check the network connection by pinging the node. For network problems, contact your network administrator.


VNI-4038: Out of memory! Large job outputs could cause this.

Cause: There is a problem allocating memory on the Management Server node.

Action: Free up more memory on the node running the Management Server.


VNI-4040: Agent state is corrupted.

Cause: The Oracle Management Server repository is out of sync with the agent's queue files. The queue files of the agent may have been corrupted or deleted. This could be caused in one of three ways:

Situation 1: A new agent was installed into a new Oracle home but the "*.q" files were not migrated over from the old Oracle home.

Action: Bring the agent down, copy over the new "*.q" files, and bring the agent back up. Refresh the node from within the Oracle Enterprise Manager console. Ping the node to see if the Oracle Management Server and agent are now synchronized.

Situation 2: The "*.q" files were deleted.

Action: Remove the node from the Oracle Enterprise Manager console navigator. This will prompt you to remove existing jobs/events. Once the jobs and events have been removed, collapse and expand the console navigator to refresh the tree and see that the node is removed.

Situation 3: Two or more agents are on the same node. At some point, jobs and events were submitted against one agent. That agent was brought down and another agent was brought up. Jobs and events were then submitted against the second agent.

Action: Bring up the correct agent and refresh the node from the Oracle Enterprise Manager console navigator.


VNI-4044: Cannot contact agent. Node may be down, or the network may be down or slow.

Cause: There are problems contacting the node itself.

Action: Check the node and make sure it is up and running. Check the physical network connections to the node. Try doing a "ping" and make sure the node is responding. If there are network problems, contact your network administrator.


VNI-4045: Cannot contact agent. Agent is not running on the node.

Cause: The node is accessible, but the agent is not running.

Action: Start the agent. For Windows NT, start the agent service from the Control Panel Services. For UNIX, for releases of the Intelligent Agent prior to 9i, use: lsnrctl dbsnmp_start. For release 9i of the Intelligent Agent use: agentctl start [agent].


VNI-4046: Agent is not responding. Agent may be busy or in an invalid state.

Cause: The agent is not able to respond in a timely manner. This is most likely due to internal communication problems with the agent.

Action: Restart the agent. If this error occurs repeatedly, turn on agent tracing and contact Oracle Worldwide Support.


VNI-4047: Error accessing queue files on the Agent node.

Cause: There are problems accessing .q files on the agent node.

Action: Check the $ORACLE_HOME/network/agent directory, where $ORACLE_HOME is the directory where the agent is installed. Make sure there is disk space available and permissions are set such that the agent executable (dbsnmp) has read/write permissions on that directory and its files.


VNI-4048: Agent internal error (For example, Out of memory, Operating system error)

Cause: This is an internal problem.

Action: Try restarting the agent. If the problem occurs again, turn on agent tracing and contact Oracle Worldwide Support.


VNI-4049: Communications error. (e.g., oms communications software error)

Cause: This is usually a transient type of error.

Action: Check the network connection between the OMS node and the agent node.

UpDown (SQL Server)

This event test checks whether the Microsoft ® SQL Server being monitored is running.

Parameters

None

User Action

SQL server is installed as a service on Windows NT platforms. You can either start the server from the NT service manager or using SQL Server Enterprise Manager. This service can also be started from the command line using the "net start mssqlsever" command.

Use the following command to start SQL Server as a foreground process on Windows NT:

C:/> sqlservr -c -dc <full path name of master database> -ec <location of the log file>

Master database is one of the SQL server system databases which holds its dictionary information. This master database is similar to the Oracle SYSTEM tablespace except that it is shared across all SQL Server databases on a node.

Web Cache UpDown

See Web Cache UpDown.


Go to previous page Go to next page
Oracle
Copyright © 2001, 2002 Oracle Corporation.

All Rights Reserved.
Go To Documentation Library
Home
Go To Product List
Book List

Master Index

Feedback
Go To Table Of Contents
Contents