A Troubleshooting Enterprise Manager

This appendix describes solutions to common problems and scenarios that you might encounter when installing/upgrading Enterprise Manager.

Configuration Assistants Fail During Enterprise Manager Installation

During the installation, if any of the configuration assistants fail to run successfully, you can choose to run the configuration assistants in standalone mode.

Invoking the One-Off Patches Configuration Assistant in Standalone Mode

During the installation process, this configuration assistant is executed before the Management Service Configuration Assistant is run.

This configuration assistant applies the one-off patches that are required for a successful Enterprise Manager 10g R2 installation.

To run this configuration assistant in standalone mode, you must execute the following command from the Management Service Oracle home:

<OMS_HOME>/perl/bin/perl <OMS_HOME>/install/oneoffs/applyOneoffs.pl

Invoking the Database Configuration Assistant in Standalone Mode

To run the DBConfig Assistant, you must invoke the runConfig.sh as:

DB_Home/oui/bin/runConfig.sh ORACLE_HOME=<DB_HOME> ACTION=Configure MODE=Perform

Invoking the OMS Configuration Assistant in Standalone Mode

To run the OMSConfig Assistant, you must invoke the runConfig.sh as:

OMS_Home/oui/bin/runConfig.sh ORACLE_HOME=<OMS_HOME> ACTION=Configure MODE=Perform COMPONENT_XML={oracle.sysman.top.oms.10_2_0_1_0.xml}

Invoking the Agent Configuration Assistant in Standalone Mode

To run the AgentConfig Assistant, you must invoke the runConfig.sh as:

Agent_Home/oui/bin/runConfig.sh ORACLE_HOME=<AGENT_HOME> ACTION=Configure  MODE=Perform COMPONENT_XML={{oracle.sysman.top.agent.10_2_0_1_0.xml}

Note:

While the above command can be used to execute the agentca script, Oracle recommends you to execute the following command to invoke the configuration assistant:

Agent_Home/bin/agentca -f

If you want to run the agentca script on a RAC, you must execute the following command on each of the cluster nodes:

Agent_Home/bin/agentca -f -c "node1,node2,node3,...."

See Chapter6, "Agent Reconfiguration and Rediscovery" for more information.

Invoking the OC4J Configuration Assistant in Standalone Mode

If you want to deploy only the Rules Manager, execute the following commands:

/scratch/OracleHomes/oms10g/jdk/bin/java -Xmx512M -DemLocOverride=/scratch/OracleHomes/oms10g -classpath /scratch/OracleHomes/oms10g/dcm/lib/dcm.jar:/scratch/OracleHomes/oms10g/jlib/e mConfigInstall.jar:/scratch/OracleHomes/oms10g/lib/classes12.zip:/scratch/Orac leHomes/oms10g/lib/dms.jar:/scratch/OracleHomes/oms10g/j2ee/home/oc4j.jar:/scr atch/OracleHomes/oms10g/lib/xschema.jar:/scratch/OracleHomes/oms10g/lib/xmlpar serv2.jar:/scratch/OracleHomes/oms10g/opmn/lib/ons.jar:/scratch/OracleHomes/om s10g/dcm/lib/oc4j_deploy_tools.jar oracle.j2ee.tools.deploy.Oc4jDeploy -oraclehome /scratch/OracleHomes/oms10g -verbose -inifile /scratch/OracleHomes/oms10g/j2ee/deploy.master -redeploy

Enterprise Manager Upgrade/Recovery

The Enterprise Manager 10g R2 upgrade is an out-of-place upgrade^Foot 1, meaning that Enterprise Manager 10g R2 Oracle homes are separate from the old homes. If you decide to abort the upgrade and continue using the 10gR1 installation, you must perform the following steps.

Agent Recovery

Follow the instructions below to perform an agent recovery.

After exiting the Installer, you must open a new terminal and change the directory to the <New_AgentHome>/bin.
Execute the script ./upgrade_recover .
You can then start the old agent and continue using it. If you want to remove the installed bits of the new agent home, use the Remove Productions function of the installer.

OMS Recovery

If the schema has been upgraded or the upgrade was incomplete, you must manually restore the database to the backup that was taken prior to executing the OMS upgrade.

You can determine the status of the repository upgrade by looking into the log file at <New_OMSHome>/sysman/log/emrepmgr.log.<proc_id>. The last line of the log file provides the status of the upgrade. If the upgrade was completed without errors, it reads Repository Upgrade Successful. If not, the message Repository Upgrade has errors… is displayed.

Follow the instructions below to perform an OMS recovery:

Note:

Before you attempt to restore the database, you must exit the Upgrade wizard. You must also ensure there are no OMS processes that are running. See Chapter8, "Shutdown Enterprise Manager Before Upgrade" for more information on shutting down the Enterprise Manager processes.

ATTENTION:

Ensure all OMS processes are completely shut down. If not, the system may become unstable after the upgrade.

Restore the database to the backup. See the Oracle Database Administrator's Guide for more information.
After the database is restored, start the database and listener to ensure successful restoration.
Open a new terminal window and change the directory to the <New_OMSHome>/bin.
Now, execute the ./upgrade_recover.

Start the old OMS and continue to use it. If you want to remove the bits of the newly installed OMS home, use the Remove Productions function in the installer.

Recreating the Repository

If the Management Service configuration plug-in fails due to the repository creation failure, rerunning the configuration tool from the Oracle Universal Installer drops the repository and recreates it. However, if you want to manually drop the repository, complete the following steps:

Dropping the Repository

Stop the Management Service (<OMS_HOME>/bin/emctl stop oms) and Agent (<AGENT_HOME>/bin/emctl stop agent) before recreating the repository.
Set ORACLE_HOME to OMS_OracleHome
Execute OMS_Home/sysman/admin/emdrep/RepManager <hostname> <port> <SID> -action drop -output_file <log_file>

Creating the Repository

Set ORACLE_HOME to OMS_OracleHome
Execute OMS_Home/sysman/admin/emdrep/RepManager <hostname> <port> <SID> -action create -output_file <log_file>

IMPORTANT:

After recreating the repository, you must run the following command on all the Management Service Oracle homes to reconfigure the emkey:

emctl config emkey -repos -force

This command overwrites the emkey.ora file with the newly generated emkey.

ATTENTION:

While recreating the repository using ./Repmanager -action create, you may encounter the following error message:

java.sql.SQLExecution: ORA-28000: the account is locaked during recreation of repository.

Workaround

This error may occur if there are processes or multiple Management Services that are trying to connect to the database with incorrect SYSMAN credentials. If there are multiple login failures, the database becomes locked up and shuts down the monitoring agent.

You can resolve this issue by shutting down all the Management Services connected to the database, along with the monitoring agent.

Repository Creation Fails

When installing Enterprise Manager using an existing database, the repository creation fails.

This may happen if the Password Verification option in the database is enabled. To resolve this issue:

Disable the Password Verification option.
Create the repository using RepManager.
Enable the Password Verification option.

Collection Errors After Upgrade

If you upgrade only the Management Service to 10g R2 without upgrading the monitoring agent, you may encounter the following collection errors:

Target Management Services and Repository
Type OMS and Repository
Metric Response
Collection Timestamp <session_time_stamp>
Error Type Collection Failure
Message Target is in Broken State. Reason - Target deleted from agent

To resolve this issue, upgrade the agent monitoring the Management Service to 10g R2 as well.

The OMS Upgrade Stops

You may encounter problems during OMS upgrade where the upgrade process aborts due to the following reasons.

OMS Upgrade Stops at Oracle iAS Upgrade Assistant (iASUA) Failure

The installation dialog and the configuration framework log file (located at<New_OracleHome>/cfgtoollogs/cfgfw/oracle.sysman.top.oms_#date.log) lists SEVERE messages indicating the reason iASUA (the Oracle Application Server Upgrade Assistant) fails.

If the message displays permission denied on certain files, it means that the user running the installer may not have the correct privileges to run certain iAS configurations.

To resolve this issue, comment out the iAS configuration that contains these files and then retry the upgrade again. You can reapply the configurations after the upgrade is successfully completed.

OMS Configuration Stops at EMDeploy Failure

The most common reasons for EMDeploy to fail are if:

All Enterprise Manager processes are not shut down completely.

To shut down Enterprise Manager, execute the following commands:
```
<Oracle_Home>/bin/emctl stop oms
<Oracle_Home>/bin/emctl stop em
<Oracle_Home>/opmn/bin/opmnctl stopall
```
See Chapter8, "Shutdown Enterprise Manager Before Upgrade" for more information.
Symbolic Links have been used instead of Hard Links

The <Oracle_Home>/Apache/<component> configuration files must be examined to ensure only hard links (and no symbolic links) were referenced. See Chapter8, "Check for Symbolic Links" for more information.

After you have successfully resolved these issues, perform the redeploy steps manually and click Retry on the Upgrade wizard.

OMS Configuration Stops at Repository Schema Failure (RepManager)

The most common reason the repository schema configuration fails is when it is not able to connect to the listener. The configuration framework log file (<New_OracleHome>/cfgtoollogs/cfgfw/oracle.sysman.top.oms_#date.log) indicates the reason for the repository schema upgrade failure.

To resolve this issue, you must verify whether the listener connecting to the OMS is valid and active.

Also, if you have installed the OMS using the Install Enterprise Manager Using New Database installation type, ensure there are no symbolic links being referenced. After you have successfully established the listener connections, click Retry on the Upgrade wizard.

Monitoring Agent Does Not Discover Upgraded Targets

If you have upgraded an Enterprise Manager target (for example, database) independently (that is using a regular upgrade mechanism other than the Oracle Enterprise Manager Installer), the monitoring agent may fail to discover this upgraded target.

This can happen if you have specified a different Oracle home value for the upgraded target other than the one that was existing.

To resolve this issue, you must manually configure the targets.xmlfile of the monitoring agent to update the configuration details of the upgraded Oracle home information, or log in to the Enterprise Manager console, select the appropriate target, and modify its configuration parameters to reflect the upgraded target parameters.

CSA Collector is Not Discovered During Agent Upgrade

When a 10g R1 Management Service and its associated (monitoring) agent are upgraded at the same time, the agent upgrade does not discover the CSA Collector target.

To discover this target, you must run the agent configuration assistant (the agentca script) using the rediscovery option. See Chapter6, "Rediscover/Reconfigure Targets on Standalone Agents" for more information.

Enterprise Manager Deployment Fails

Enterprise Manager deployment may fail due to the Rules Manager deployment failure.

To resolve this issue, redeploy Enterprise Manager by following these steps:

Move OH/j2ee/deploy.master to OH/j2ee/deploy.master.bak
Execute the OH/bin/EMDeploy script.
Restore the OH/j2ee/deploy.master. That is, execute mv OH/j2ee/deploy.master.bak OH/j2ee/deploy.master

To Execute the OC4J Configuration Assistant in Standalone Mode

If you want to deploy only the Rules Manager, execute the following commands:

/scratch/OracleHomes/oms10g/jdk/bin/java -Xmx512M -DemLocOverride=/scratch/OracleHomes/oms10g -classpath /scratch/OracleHomes/oms10g/dcm/lib/dcm.jar:/scratch/OracleHomes/oms10g/jlib/e mConfigInstall.jar:/scratch/OracleHomes/oms10g/lib/classes12.zip:/scratch/Orac leHomes/oms10g/lib/dms.jar:/scratch/OracleHomes/oms10g/j2ee/home/oc4j.jar:/scr atch/OracleHomes/oms10g/lib/xschema.jar:/scratch/OracleHomes/oms10g/lib/xmlpar serv2.jar:/scratch/OracleHomes/oms10g/opmn/lib/ons.jar:/scratch/OracleHomes/om s10g/dcm/lib/oc4j_deploy_tools.jar oracle.j2ee.tools.deploy.Oc4jDeploy -oraclehome /scratch/OracleHomes/oms10g -verbose -inifile /scratch/OracleHomes/oms10g/j2ee/deploy.master -redeploy

Installation Fails With an Abnormal Termination

If there is a daily cron job that is running on the system where you are installing Grid Control which cleans up the /tmp/ directory, the installation might fail with an abnormal termination and the installActions.err file will log the following error: java.lang.UnsatisfiedLinkError: no nio in java.library.path.

The workaround is to set the TMP and TEMP environment variables to a directory other than the default /tmp and execute the ./runInstaller.

Management Agent Installation Fails

If the Management Agent installation fails, look into the emctl status log to diagnose the reason for installation failure. You can view this log by executing the following command:

<AGENT_HOME>/bin/emctl status agent

A sample log file is shown below with some of the typical problem areas shown in bold. The resolution for these issues is described at the end of the sample log file.

Oracle Enterprise Manager 10g Release 10.2.0.0.0.Copyright (c) 1996, 2005 Oracle Corporation.  All rights reserved.---------------------------------------------------------------Agent Version     : 10.2.0.0.0
OMS Version       : 10.2.0.0.0
Protocol Version  : 10.2.0.0.0Agent Home        : /scratch/OracleHomes2/agent10gAgent binaries    : /scratch/OracleHomes2/agent10gAgent Process ID  : 9985Parent Process ID : 29893
Agent URL         : https://stadv21.us.oracle.com:1831/emd/main/
Repository URL    : https://stadv21.us.oracle.com:1159/em/upload
Started at        : 2005-09-25 21:31:00Started by user   : tthakurLast Reload       : 2005-09-25 21:31:00Last successful upload                       : (none)
Last attempted upload                        : (none)Total Megabytes of XML files uploaded so far :     0.00Number of XML files pending upload           :     2434Size of XML files pending upload(MB)         :    21.31Available disk space on upload filesystem    :    17.78%Last attempted heartbeat to OMS              : 2005-09-26 02:40:40Last successful heartbeat to OMS             : unknown---------------------------------------------------------------
Agent is Running and Ready

Prerequisite Check Fails With Directories Not Empty Error During Retry

During an agent installation using Agent Deploy, the installation fails abruptly displaying the Failure page. On clicking Retry, the installation fails again at the Prerequisite Check phase with an error stating the directories are not empty.

This could be because Oracle Universal Installer (OUI) is still running though the SSH connection is closed on the remote host.

To resolve this issue, on the remote host, check if OUI is still running. Execute the following command to verufy this:

ps -aef | grep -i ora

If OUI is still runing, wait till OUI processes are complete and restart the SSH daemon. Now, you can click Retry to perform the installation.

Troubleshooting Typical installation Issues

This section lists the typical issues that can cause the agent installation to fail, along with their recommended resolutions.

Timezone Prerequisite Check Fails

The timezone prerequisite check (timezone_check) will fail if the TZ environment variable is not set on the SSH Daemon of the remote host.

To resolve this issue, you must set the TZ environment variable on the SSH Daemon of the remote host. See AppendixE, "Setting Up the Timezone Variable on Remote Hosts" for more information.

Alternatively, you do the following:

If you are installing/upgrading the agent from the default software location, set the timezone environment variable by specifying the following in the Additional Parameters section of the Agent Deploy application:
```
-z <timezone>
For example, -z PST8PDT
```
If you are installing the agent from a non-default software location, you must specify the timezone environment variable using the following command:
```
s_timeZone=<timezone>
For example, s_timezone=PST8PDT
```

OMS Version is Not Displayed

If the OMS Version is not displayed in the log file, it could mean that the installed agent is not registered with any Management Service (OMS).

To resolve this issue, you must manually secure the Management Agent by executing the following command:

<AGENT_HOME>/bin/emctl secure agent <password>

Discrepancy Between Agent and Repository URL Protocols

If the agent installation is successful, the protocol for both agent and the repository URLs are the same. That is, both URLs start with the https protocol (meaning both are secure).

If the protocol for the agent URL is displayed as http instead of https, this means that the agent is not secure.

To resolve this issue, you must secure the agent manually by executing the following command:

<AGENT_HOME>/bin/emctl secure agent <password>

Last Successful Upload Does Not Have a Timestamp

If there is no timestamp against this parameter in the log (displays Null), it means that the agent is unable to upload any data.

To resolve this issue, you must execute the following command and check the log again:

<AGENT_HOME>/bin emctl upload

The emctl status Log File is Empty

If the agent is not ready and running, the emctl status log displays only the copyright information. None of the parameters listed in the sample log are displayed.

The issue can occur due to any of the following reasons:

Agent is not secure: To manually secure the agent, execute the following command:
```
<AGENT_HOME>/bin emctl secure agent <password>
```
Agent is not running: Check if the agent is running. If not, you can start the agent manually by executing the following command:
```
<AGENT_HOME>/bin emctl start agent
```
Agent port is not correct: Verify whether the agent is connecting to the correct port. To verify the port, look into the sysman/config/emd.properties file:

You must also ensure the following are correctly set in the emd.properties file:
1. REPOSITORY_URL: Verify this URL (http://<hostname>:port/em/upload). Here, ensure the host name and port are correct.
2. emdWalletSrcURL: verify if the host name and port are correct in this URL (http://<hostname>:port/em/wallets/emd).
3. agentTZRegion: Ensure the time zone that is configured is correct.

SSH User Equivalence Verification Fails During Agent Installation

The most common reasons for SSH User Equivalence Verification to fail are:

The server settings in /etc/sshd/sshd_config file do not allow ssh for user $USER
The server may have disabled the public key-based authentication
The client public key on the server may be outdated
You may not have passed the -shared option for shared remote users, or may have passed this option for non-shared remote users

Verify the server setting and rerun the script to set up SSH User Equivalence successfully.

Agent Deployment on Linux RAC 10.2 Cluster Fails

Agent deployment on a 10.2 RAC cluster may fail due to a lost SSH connection during the installation process.

This can happen if the LoginGraceTime value in the sshd_config file is 0 (zero). The zero value gives an indefinite time for SSH authentication.

To resolve this issue, modify the LoginGraceTime value in the /etc/ssh/sshd_config file be a higher value. The default value is 120 seconds. This means that the server will disconnect after this time if you have not successfully logged in.

To resolve this issue, modify the LoginGraceTime value in the /etc/ssh/sshd_config file to be a higher value. If the value is set to 0 (zero), there is no definite time limit for authentication.

Agent Does Not Start Up After Upgrade

During an agent upgrade from 10.1.0.2 to 10.1.0.3, the agent may fail to start up after upgrade if the timezone that is configured for the upgraded agent is different from the originally configured agent.

You can correct this issue by changing the timezone. To do this, execute the following command from the upgraded agent home:

emctl resetTZ agent

This command will correct the agent side timezone, and specify an additional command to be run against the repository to correct the value there.

IMPORTANT:

Before you change the timezone, check if there are any blackouts that are currently running or scheduled to run on any of the targets that are monitored by the upgraded agent. Do the following to check this:

In the Grid Control console, go to the All Targets page under Targets and locate the Agent in the list of targets. Click the agent name link. The Agent home page appears.
The list of targets monitored by the agent will be listed in the Monitored Targets section.
For each target in the list, click the target name to view the target home page.
Here, in the Related Links section, click Blackouts to check any blackouts that are currently running or may be scheduled to run int he future.
If such blackouts exist, you must stop all the blackouts that are running on all the targets monitored by this agent.
From the console, stop all the targets that are scheduled to run on any of these monitored targets.
Now, run the following command from the agent home to reset the timezone;
```
emctl resetTZ agent
```
After the timezone is reset, you can create new blackouts on the targets.

Sample sshd_config File

A sample sshd_config file, which is a server-wide configuration file with all the variables is shown below.

# $OpenBSD: sshd_config,v 1.59 2002/09/25 11:17:16 markus Exp $# This is the sshd server system-wide configuration file.  See# sshd_config(5) for more information.# This sshd was compiled with PATH=/usr/local/bin:/bin:/usr/bin# The strategy used for options in the default sshd_config shipped with# OpenSSH is to specify options with their default value where# possible, but leave them commented.  Uncommented options change a# default value.#Port 22#Protocol 2,1#ListenAddress 0.0.0.0#ListenAddress ::# HostKey for protocol version 1#HostKey /etc/ssh/ssh_host_key# HostKeys for protocol version 2#HostKey /etc/ssh/ssh_host_rsa_key#HostKey /etc/ssh/ssh_host_dsa_key# Lifetime and size of ephemeral version 1 server key#KeyRegenerationInterval 3600#ServerKeyBits 768# Logging#obsoletes QuietMode and FascistLogging#SyslogFacility AUTHSyslogFacility AUTHPRIV
#LogLevel INFO

# Authentication:#LoginGraceTime 120#PermitRootLogin yes#StrictModes yes

#RSAAuthentication yes#PubkeyAuthentication yes#AuthorizedKeysFile      .ssh/authorized_keys# rhosts authentication should not be used#RhostsAuthentication no# Don't read the user's ~/.rhosts and ~/.shosts files#IgnoreRhosts yes# For this to work you will also need host keys in /etc/ssh/ssh_known_hosts#RhostsRSAAuthentication no# similar for protocol version 2#HostbasedAuthentication no# Change to yes if you don't trust ~/.ssh/known_hosts for# RhostsRSAAuthentication and HostbasedAuthentication#IgnoreUserKnownHosts no# To disable tunneled clear text passwords, change to no here!#PasswordAuthentication yes#PermitEmptyPasswords no# Change to no to disable s/key passwords#ChallengeResponseAuthentication yes# Kerberos options#KerberosAuthentication no#KerberosOrLocalPasswd yes#KerberosTicketCleanup yes#AFSTokenPassing no# Kerberos TGT Passing only works with the AFS kaserver#KerberosTgtPassing no# Set this to 'yes' to enable PAM keyboard-interactive authentication# Warning: enabling this may bypass the setting of 'PasswordAuthentication'#PAMAuthenticationViaKbdInt no#X11Forwarding noX11Forwarding yes#X11DisplayOffset 10#X11UseLocalhost yes#PrintMotd yes#PrintLastLog yes
#KeepAlive yes
#UseLogin no#UsePrivilegeSeparation yes#PermitUserEnvironment no#Compression yes#MaxStartups 10# no default banner path#Banner /some/path#VerifyReverseMapping no#ShowPatchLevel no# override default of no subsystemsSubsystem sftp    /usr/libexec/openssh/sftp-server

Storage Data Has Metric Collection Errors

The following Enterprise Manager collection error message may appear from agents installed via silent or agentdownload install mechanisms:

snmhsutl.c:executable nmhs should have root suid enabled.

Perform the required root install actions (via root.sh script) to resolve this issue. It may take up to 24 hours before the resolution is reflected.

Cannot Add Systems to Grid Environment from the Grid Control Console

You cannot add new targets to your grid environment if you do not have an agent already installed.

To install the agent from your Grid Control console:

Log in to the Grid Control console and go to the Deployments page.
Click Install Agent under the Agent Installation section.
In the Agent Deploy home page that appears, select the appropriate installation option that you want to perform. See Chapter5, "Agent Deploy Installation Prerequisites" for more information.

Need More Help

If this appendix does not solve the problem you encountered, try these other sources:

Oracle Enterprise Manager Release Notes, available on the Oracle Technology Network website (http://www.oracle.com/technology/documentation).
OracleMetaLink (http://metalink.oracle.com).

If you do not find a solution for your problem, log a service request.

Footnote Legend

Footnote 1: The upgrade process creates a new OMS home and a new database home. The Upgrade assistants upgrade the datafiles and SYSMAN schema, and then configure the new Oracle homes.