Oracle Enterprise Manager Event Test Reference Manual Release 9.2.0 Part Number A96675-01 |
|
The Event System within Oracle Enterprise Manager assists the DBA with automatic problem detection and correction. Using the Event System, the DBA can establish boundary thresholds for warning and critical conditions within the network environment for problem monitoring.
The Enterprise Manager base product comes with a set of event tests called Base Event Tests. These event tests consist of UpDown event tests that check whether a database, listener, or node is available. "Base Event Tests" gives a brief description of these UpDown event tests.
More comprehensive monitoring is available through Advanced Event Tests. This manual provides a complete description of all the events available through Oracle Enterprise Manager. The sub-categories of events are:
Base event tests are included as part of the Enterprise Manager base product and do not require an additional license. To use all the other event tests, you must have licensed the Oracle Diagnostics Pack, the Oracle Management Pack for Oracle Applications (for the Concurrent Manager events), or the Oracle Management Pack for SAP R/3 (for the SAP R/3 events).
Note: For information on using the Oracle Enterprise Manager Event System, see the Oracle Enterprise Manager Administrator's Guide. |
The Base Event Tests are provided with the Enterprise Manager base product and consist of the UpDown event tests. These event tests check whether a database, listener, or node is available. With the UpDown event for databases or listeners, you can use the Startup Database or Startup Listener task as a fixit job to restart the database or listener. See Descriptions of Base and Common Node Event Tests for a full description of these events.
UpDown Event Test | Description |
---|---|
Data Gatherer UpDown |
This event test checks whether the Intelligent Agent data gathering service on a node can be accessed from the Console. If the Intelligent Agent data gathering service is down, this test is triggered. Note: This event test is valid only for releases of the Intelligent Agent prior to release 9i. |
Database UpDown |
This event test checks whether the database being monitored is running. If this test is triggered, other database events are not ignored. Note: If the listener serving a database is down, this event may be triggered because the Intelligent Agent uses the listener to communicate with the database. This note applies to Intelligent Agents released before 8.0.5. (See User Audit for additional information.) |
EM Web Site UpDown |
This event test, introduced in Oracle9iAS Release 2 (9.0.2), checks whether the Enterprise Manager Web Site is running. A critical alert is generated whenever the value is 0, that is, whenever the Enterprise Manager Web Site stops. |
HTTP Server UpDown |
This event test checks whether the HTTP server being monitored is running. |
HTTP Server UpDown (Oracle9iAS Release 2 (9.0.2) |
This event test checks whether the HTTP server is running. A critical alert is generated whenever the value is 0, that is, whenever the HTTP server stops. |
JServ UpDown |
This event test, introduced in Oracle9iAS Release 2 (9.0.2), checks whether JServ is running. A critical alert is generated whenever the value is 0, that is, whenever JServ stops. |
Listener Oracle Net UpDown |
This event test checks whether the listener on the node being monitored is available. This test is a listener fault management event test. Note: The Startup Listener job task can be set up as a fixit job for automatically correcting the problem. |
Node UpDown |
This event test checks the status of the target node as well as the agent. If the agent is down or communication between the node and the Management Server is lost, this test is triggered. The node up/down event test differs from other event tests because this test is initiated by the Management Server, not the Agent. By default, this check is performed every 2 minutes and is NOT controlled by the event's polling schedule. |
OC4J UpDown |
This event test, introduced in Oracle9iAS Release 2 (9.0.2), checks whether the OC4J server is running. A critical alert is generated whenever the value is 0, that is, whenever OC4J stops. |
Web Cache UpDown |
This event test, introduced in Oracle9iAS Release 2 (9.0.2), checks whether Web Cache is running. A critical alert is generated whenever the value is 0, that is, whenever Web Cache stops. |
This event test allows you to define your own script.
Event Test | Description |
---|---|
User-Defined Event Test |
User-Defined Event tests allow you to define events based on your own monitoring scripts. The monitoring scripts can be written in any language, as long as the monitored node has the appropriate runtime requirements for the script. User-Defined Event tests thus allow administrators to extend the Event system to monitor any type of service or condition specific to their environments. Refer to the Oracle Enterprise Manager Administrator's Guide for more information on setting up User-Defined Event tests. |
This event test allows you to define your own SQL script.
Event Test | Description |
---|---|
User-Defined SQL Event Test |
The User-Defined SQL event test allows you to define your own SQL script that evaluates an event test. The event tests you define should be written as queries (i.e. SELECT statements) that return condition values for which you are monitoring. These values are checked against the Critical and Warning threshold limits you specify, and trigger the event if the threshold limits are reached. Example: You have a custom application that runs against the Oracle database. Each time it finds an application error, it creates an entry into a table called "error_log". Using the "User-Defined SQL Test", you can write an event test that notifies you when it finds at least 50 errors. Specifically, you define the following SQL statement: select count(*) from error_log This returns the number of rows in the error_log table. Since you want a critical alert raised when it reaches at least 50, you specify the Operator ">=", a Critical value of 50, and perhaps a Warning value of 30. If your query for the event condition requires more complex processing than is allowed in a single SELECT statement, you can first create a pl/sql function that contains the extra processing steps, and then use the pl/sql function with the User-Defined SQL event test. (See User-Defined SQL Event Test for additional information.) |
This test checks whether the Microsoft SQL Server being monitored is running.
Event Test | Description |
---|---|
UpDown (SQL Server) |
This test checks whether the Microsoft SQL Server being monitored is running. SQL server is installed as a service on Windows NT platforms. You can either start the server from the NT service manager or using SQL Server Enterprise Manager. This service can also be started from the command line using the "net start mssqlsever" command. On Windows95 and Windows 98 environments where services are not available, SQL server can be started by executing the following command: C:\> sqlservr -c -dc <full path name of master database> -ec <location of the log file> Master database is one of the SQL server system databases which holds its dictionary information. This master database is similar to the Oracle SYSTEM tablespace except that it is shared across all SQL Server databases on a node. This command can also be used to start SQL Server as a foreground process on Windows NT. (See UpDown (SQL Server) for additional information.) |
The Common Node Event Tests apply to all operating system platforms that can run the Oracle Intelligent Agent. The Node event tests are divided into the following categories:
See Descriptions of Base and Common Node Event Tests for a full description of these events.
This event test signifies that the Data Gatherer has generated errors to the Data Gatherer alert file since the last sample time. The Data Gatherer alert file is a special trace file containing a chronological log of messages and errors. Note that the Data Gatherer alert log file is different than the Database alert log file. An alert is displayed when Data Gatherer (ODG-xxxxx) messages are written to the Data Gatherer alert file.
None
Alert log error messages since the last sample time.
60 seconds
Examine the Data Gatherer alert log file (alert_dg.log) for additional information. The alert log file can be found in the ORACLE_HOME/odg/log directory for the Intelligent Agent.
Note: This event test is valid only for releases of the Intelligent Agent prior to 9i.
This event test checks the CPU paging rate (kilobytes/second paged in/out) against the threshold values specified by the threshold arguments. If the number of occurrences exceeds the values specified, then a warning or critical alert is generated.
Current rate
This event test checks for the CPU utilization (percentage used) against the threshold values specified by the threshold arguments. If the number of occurrences exceeds the values specified, then a warning or critical alert is generated.
Current value
This event test checks for available space on the disk specified by the disk name parameter, such as c: (Windows) or /tmp (UNIX). If the space available is less than the values specified in the threshold arguments, then a warning or critical alert is generated.
Disk name and space available in kilobytes on the disk.
This event test monitors the same file systems as the Disk Full event test. The Disk Full (%) event test, however, returns the percentage of space remaining on the disk destinations.
Disk name and percentage of space available on the disk.
See EM Web Site UpDown.
This event test checks whether the HTTP server being monitored is running.
None
See HTTP Server UpDown.
See JServ UpDown.
See OC4J UpDown.
This event test checks whether the listener on the node being monitored is available. This event test is a listener fault management event test.
None
The Startup Listener job task can be set up as a fixit job for automatically correcting the problem. To avoid the fixit job executing when the listener was brought down intentionally, turn off the fixit job option.
This event test checks for available swap space. If the space available falls below the values specified in the threshold arguments, then a warning or critical alert is generated.
Percentage of available space.
This event test checks whether the Intelligent Agent data gathering service on a node can be accessed from the Console. If the Intelligent Agent data gathering service is down, this test is triggered.
None
None
60 seconds
Restart the Oracle Data Gatherer.
Note: This event test is valid only for releases of the Intelligent Agent prior to 9i.
This event test checks the status of the target node as well as the agent. If the agent is down or communication between the node and the Management Server is lost, this test is triggered.
The node up/down event test differs from other event tests because this test is initiated by the Management Server, not the Agent. By default, this check is performed every 2 minutes and is NOT controlled by the event's polling schedule.
None
If the node Up/Down event test identifies a problem, one of the following messages may be generated:
Cause: There may be network congestion or problems with the hardware/software on the node.
Action: Check the node and make sure it operational. Check the network connection by pinging the node. For network problems, contact your network administrator.
Cause: There is a problem allocating memory on the Management Server node.
Action: Free up more memory on the node running the Management Server.
Cause: The Oracle Management Server repository is out of sync with the agent's queue files. The queue files of the agent may have been corrupted or deleted. This could be caused in one of three ways:
Situation 1: A new agent was installed into a new Oracle home but the "*.q" files were not migrated over from the old Oracle home.
Action: Bring the agent down, copy over the new "*.q" files, and bring the agent back up. Refresh the node from within the Oracle Enterprise Manager console. Ping the node to see if the Oracle Management Server and agent are now synchronized.
Situation 2: The "*.q" files were deleted.
Action: Remove the node from the Oracle Enterprise Manager console navigator. This will prompt you to remove existing jobs/events. Once the jobs and events have been removed, collapse and expand the console navigator to refresh the tree and see that the node is removed.
Situation 3: Two or more agents are on the same node. At some point, jobs and events were submitted against one agent. That agent was brought down and another agent was brought up. Jobs and events were then submitted against the second agent.
Action: Bring up the correct agent and refresh the node from the Oracle Enterprise Manager console navigator.
Cause: There are problems contacting the node itself.
Action: Check the node and make sure it is up and running. Check the physical network connections to the node. Try doing a "ping" and make sure the node is responding. If there are network problems, contact your network administrator.
Cause: The node is accessible, but the agent is not running.
Action: Start the agent. For Windows NT, start the agent service from the Control Panel Services. For UNIX, for releases of the Intelligent Agent prior to 9i, use: lsnrctl dbsnmp_start. For release 9i of the Intelligent Agent use: agentctl start [agent].
Cause: The agent is not able to respond in a timely manner. This is most likely due to internal communication problems with the agent.
Action: Restart the agent. If this error occurs repeatedly, turn on agent tracing and contact Oracle Worldwide Support.
Cause: There are problems accessing .q files on the agent node.
Action: Check the $ORACLE_HOME/network/agent directory, where $ORACLE_HOME is the directory where the agent is installed. Make sure there is disk space available and permissions are set such that the agent executable (dbsnmp) has read/write permissions on that directory and its files.
Cause: This is an internal problem.
Action: Try restarting the agent. If the problem occurs again, turn on agent tracing and contact Oracle Worldwide Support.
Cause: This is usually a transient type of error.
Action: Check the network connection between the OMS node and the agent node.
This event test checks whether the Microsoft ® SQL Server being monitored is running.
None
SQL server is installed as a service on Windows NT platforms. You can either start the server from the NT service manager or using SQL Server Enterprise Manager. This service can also be started from the command line using the "net start mssqlsever" command.
Use the following command to start SQL Server as a foreground process on Windows NT:
C:/> sqlservr -c -dc <full path name of master database> -ec <location of the log file>
Master database is one of the SQL server system databases which holds its dictionary information. This master database is similar to the Oracle SYSTEM tablespace except that it is shared across all SQL Server databases on a node.
See Web Cache UpDown.
|
Copyright © 2001, 2002 Oracle Corporation. All Rights Reserved. |
|