23 Troubleshooting the Backup and Recovery Tool

This chapter describes common problems that you might encounter when using the Backup and Recovery Tool, and explains how to solve them. It contains the following topic:

Problems and Solutions

23.1 Problems and Solutions

This section describes common problems and solutions. It contains the following topics:

Receiving restore_config Operation Fails Error
Receiving Missing Files Messages During restore_config Operation
File-Based Repository Restoration Fails
Cannot Run a Cold Backup on Identity Management or J2EE Instance
Failure Due to Loss or Corruption of OPMN.XML File
A restore_config Operation Fails
Backup Operation Fails on a DCM File-Based Repository
Timeout Occurs While Trying to Stop Processes Using opmnctl stopall
Using the Backup and Recovery Tool to Perform a Recovery Fails Due to an Unknown Log Sequence Number
Enterprise Manager Cannot Access Restored Nodes on New Hosts
Restore of Portal Fails After Deleting OC4J Instance
Cold Backups Do Not Shut Down All Databases in RAC Environment
A restore_instance Fails at restore_repos Stage
Changing ORACLE_HOME May Cause Backup or Recovery Failure
Restore Operation Changes Farm Topology Leaving an Instance in Inconsistent State
Post-deployment Changes to Configuration Files Are Lost After Restoring DCM-Managed Components

23.1.1 Receiving restore_config Operation Fails Error

A restore_config operation fails.

Problem

A restore_config operation fails with the following error:

C:\OracleAS\IM_1128/dcm/bin/dcmctl.bat applyarchiveto -archive 
2004-11-29_11-23-18 -script 

ADMN-906025 
Base Exception:
The exception, 100999, occurred at Oracle Application Server instance 
"im_1128.stajx14.us.oracle.com"
"See base exception for details.See base exception for details."
Resolution:
Resolve the indicated problem at the Oracle Application Server instance where
it occurred then resync the instance
java.lang.Exception: Could not delete file
C:\OracleAS\IM_1128\j2ee\OC4J_SECURITY\application-
deployments\wirelesssso\jazn-data.xml. Please check file permissions.
at oracle.security.jazn.smi.JAZNPlugin.commit(Unknown Source)
at oracle.ias.sysmgmt.repository.DcmPlugin.commit(Unknown Source)

Solution

If you see an error similar to "Could not delete file jazn-data.xml", execute the following steps:

Stop all the OC4J processes using the following command:

ORACLE_HOME/opmn/bin/opmnctl stopproc ias-component=OC4J

Rerun the restore_config operation.

23.1.2 Receiving Missing Files Messages During restore_config Operation

A restore_config operation generates missing file messages.

Problem

During a restore_config operation, you receive messages indicating that files are missing, for example:

Could not copy file C:\Product\OracleAS\Devkit_1129/testdir/ to 
C:\Product\OracleAS\Devkit_1129\backup_restore\cfg_bkp/2004-12-01_03-26-22.

Solution

During a restore_config operation, a temporary configuration backup is taken so that, if the restore fails, the temporary backup can be restored returning the instance to the same state as before the restore.

If some files are deleted (including files/directories specified in config_misc_files.inp) before a restore operation, then, during the temporary backup, messages are displayed indicating that certain files are missing. These error/warning messages should be ignored since the missing files are restored as part of the restore_config operation.

23.1.3 File-Based Repository Restoration Fails

A file-based repository restoration fails.

Problem

File-based repository restoration fails with the error indicating that the dcm daemons across the farm could not be restarted.

C:\fbfhost\backup_restore>bkp_restore.bat -m restore_repos -t 
2004-12-07_13-49-13 
 
C:\fbfhost\backup_restore>echo off
Stopping dcm-daemon across the farm ...
Importing file based repository ...
Restarting dcm-daemon across the farm ...
  Problem running command (Returned 150)
  c:\fbfhost/opmn/bin/opmnctl @farm restartproc ias-component=dcm-daemon
The file based repository has been restored.
But, dcm daemons across farm could not be restarted.
Please take the appropriate action.
See c:\logs/2004-12-07_13-50-18_restore_repos.log for more info

Solution

At this point, the file-based repository has been restored successfully. Now, perform the following steps on the repository host:

Stop the dcm-daemon process on the file based repository host:

ORACLE_HOME/opmn/bin/opmnctl stopproc ias-component=dcm-daemon

Start the dcm-daemon processes across farm:

ORACLE_HOME/opmn/bin/opmnctl @farm startproc ias-component=dcm-daemon

23.1.4 Cannot Run a Cold Backup on Identity Management or J2EE Instance

You cannot run a cold backup on Identity Management or a J2EE instance.

Problem

When backup_cold is attempted on Identity Management or a J2EE instance, the following error message displays:

C:\Product\OracleAS\SSO_1203\backup_restore>bkp_restore.bat -v -m backup_cold
 
 C:\Product\OracleAS\SSO_1203\backup_restore>echo off
   ======================================== 
 Running command:
 C:\Product\OracleAS\SSO_1203/dcm/bin/dcmctl.bat whichfarm -v -script >>
 C:\Product\OracleAS\SSO_1203\backups\log_path/2004-12-09_03-56-55_whichfarm.log
 C:/Product/OracleAS/SSO_1203/backup_restore/config/config.inp: Invalid
 'database backup_path' specified
 VALUE_NOT_SET - No such file or directory
 Consider using '-f' to force creation of this path
 Failure: backup_cold failed

Solution

The backup_cold operation should be used only on the repository hosts—Metadata Repository instance or any instance hosting a file-based repository.

23.1.5 Failure Due to Loss or Corruption of OPMN.XML File

The loss or corruption of the opmn.xml file is causing a failure.

Problem

The loss or corruption of the opmn.xml file caused the following error:

ADMN-906025 
Base Exception:
The exception, 100999, occurred at Oracle Application Server instance
"J2EE_1123.stada07.us.oracle.com"

Resolution

Perform the following steps to restore the opmn.xml file:

Run

bkp_restore.bat -m restore_config -t <timestamp>

If that command fails, stop the OC4J processes.

Rerun

bkp_restore.bat -m restore_config -t <timestamp>

23.1.6 A restore_config Operation Fails

A restore_config operation fails or the ORACLE_HOME/j2ee/OC4J_SECURITY directory is deleted.

Problem:

The ORACLE_HOME/j2ee/OC4J_SECURITY directory is accidently deleted or a restore_config operation fails with the following error:

ADMN-906025 
Base Exception:
The exception, 806212, occurred at Oracle Application Server instance
"OID.stada07.us.oracle.com"
"OPMN Request: /start?mode=sync&process-type=OC4J_SECURITY

OPMN Response: HTTP/1.1 204 No Content
Content-Length: 724 
Content-Type: text/html
Response: 0 of 1 processes started.
.
<?xml version='1.0' encoding='US-ASCII'?> 
<response>
<opmn id="stada07:6200" http-status="204" http-response="0 of 1 processes started.">
  <ias-instance id="OID.stada07.us.oracle.com">
    <ias-component id="OC4J"> 
      <process-type id="OC4J_SECURITY">
        <process-set id="default_island"> 
          <process id="511967353" pid="956" status="Init" index="1"
log="C:\Product\OracleAS\OID\opmn\logs\OC4J~OC4J_SECURITY~default_island~1"
.
operation="request" result="failure"> 
        <msg code="-21" text="failed to start a managed process after the maximum retry limit">

Solution:

To resolve this problem, run the following command:

On UNIX systems:

bkp_restore.sh -m restore_config -F DCM-resyncforce

On Windows systems:

bkp_restore.bat -m restore_config -F DCM-resyncforce

23.1.7 Backup Operation Fails on a DCM File-Based Repository

The backup of a DCM file-based repository fails.

Problem:

The backup of a DCM file-based repository fails because of missing or corrupted files in the repository.

Solution:

If *.bom files are missing, use restore_config to restore the repository and then backup the repository.

For all other files, use restore_repos to restore the repository, and then run any of the backup options to backup the repository.

23.1.8 Timeout Occurs While Trying to Stop Processes Using opmnctl stopall

During backup_instance_cold, backup_instance_cold_incr and restore_instance operations, a timeout may occur while trying to stop processes using the opmnctl stopall.

Problem:

During some operations involving the backup or restore of a server instance, a timeout may occur while trying to stop processes using the opmnctl stopall command. This can occur because of heavy machine load or a process taking a long time to shut down. Under these conditions, you may receive an error message similar to the following:

Oracle Application Server instance backup failed.
Stopping all opmn managed processes ... 

Failure : backup_instance_cold_incr failed 

Unable to stop opmn managed processes !!!

Solution:

Running opmnctl stopall a second time should resolve this problem.

23.1.9 Using the Backup and Recovery Tool to Perform a Recovery Fails Due to an Unknown Log Sequence Number

When performing a recovery using the Backup and Recovery Tool, the RMAN recovery fails due to an unknown log sequence number. Use the following command to correct the problem:

sqlplus> alter database open resetlogs;

23.1.10 Enterprise Manager Cannot Access Restored Nodes on New Hosts

After using Loss of Host Automation to restore the nodes to new hosts, Enterprise Manager cannot access the nodes.

Problem

The scenario is that all nodes on a farm were lost. After using Loss of Host Automation to restore the nodes to new hosts, Enterprise Manager cannot access the nodes. The cause of this problem is that the dcmCache.xml files are not updated between restores of the individual nodes.

Solution

After restoring the first node, save a copy of dcmCache.xml from the second node. After restoring the second node, copy the saved copy of dcmCache.xml to the second node. Restart all processes on both nodes.

23.1.11 Restore of Portal Fails After Deleting OC4J Instance

A restore of a Portal instance fails after deleting an OC4J instance that was part of the backup being restored.

Problem

After a successful backup of an Infrastructure and a Portal with an OC4J instance, a restore of the Infrastructure succeeds, but the restore of the Portal fails. The OC4J instance was deleted before the restore.

Solution

Before running a restore on the Portal, run the following command:

dcmctl resyncInstance -force

23.1.12 Cold Backups Do Not Shut Down All Databases in RAC Environment

If the Oracle Application Server Metadata Repository is installed in an existing Oracle database (RepCA database), which is configured as a Real Application Cluster (RAC), then before performing a Full Cold Backup using Enterprise Manager or executing backup_instance_cold or backup_cold in command-line mode, you must shut down all the instances in the cluster database. You can use Enterprise Manager to shutdown the entire cluster database, run srvctl stop database to stop all the started instances or run SQL*PLUS to shut down each started instance.

23.1.13 A restore_instance Fails at restore_repos Stage

Running restore_instance fails when trying to restore the database (restore_repos).

Problem

Restoring an instance fails with the following error:

unable to find archive log 
archive log thread=1 sequence=3
released channel: dev1 
RMAN-00571: =========================================================== 
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== 
RMAN-00571: =========================================================== 
RMAN-03002: failure of recover command at <time>
RMAN-06054: media recovery requesting unknown log: thread <> seq <> lows cn <>

Solution

Perform the following steps to resolve the problem:

Complete database recovery by running the following command:
```
sqlplus > alter database open resetlogs;
```
Configuration recovery:
```
perform opmnctl startall 
```

Configuration restore:

On UNIX:

bkp_restore.sh -m restore_config -t <timestamp>

On Windows:

bkp_restore.bat -m restore_config -t <timestamp>

23.1.14 Changing ORACLE_HOME May Cause Backup or Recovery Failure

Changing ORACLE_HOME from the ORACLE_HOME used to start the database may result in an error while performing backup or recovery operations.

Problem

Changing ORACLE_HOME to a different directory from the directory used to start the database may result in errors when trying to perform backup or recovery. For example, if you started the database with ORACLE_HOME set to home/foo and later try to connect to private/foo, you will not be able to connect to the original instance.

Solution

To verify where ORACLE_HOME resides, run the following command:

$ /usr/ucb/ps -auxeww | grep pmon

If the value returned for ORACLE_HOME is different from the environment ORACLE_HOME, restart the database with the ORACLE_HOME set for the environment.

23.1.15 Restore Operation Changes Farm Topology Leaving an Instance in Inconsistent State

A restore operation on one instance can change the farm topology leaving another instance on the farm in an inconsistent state.

Problem

The scenario: install core1 as a file-based repository host and take a cold backup. Install core2 and join it to core1 as a file-based repository client. Restore the file-based repository for core1. This will corrupt core2 as it was joined to core1 after the cold backup. Core2 points to core1 as the file-based repository host, but there is no record of core2 in core1 after the restore.

Resolution

Before restoring the file-based host (core1), run dcmctl leavefarm on core2. After restoring the repository, run dcmctl joinfarm on core2.

Alternatively, restore core2 with a backup taken prior to joining it to the core1 file-based repository.

23.1.16 Post-deployment Changes to Configuration Files Are Lost After Restoring DCM-Managed Components

Post-deployment changes to configuration files are lost after restoring DCM-managed component configurations.

Problem

After deploying Oracle Application Server, changes made to configuration files, such as web.xml (1 per application), are lost after the Backup and Recovery Tool restores DCM-managed component configurations.

Solution

After the restore operation completes, the web.xml files can be copied from the configuration backup using the following manual procedure:

Find the config_backup_path value from ORACLE_HOME/backup_restore/config/config.inp file.
Change the current directory to the config_backup_path directory:
```
cd config_backup_path
```
Locate the config backup jar file containing the web.xml files with the changes.
Copy the config backup jar file to a temporary location:
```
cp config_bkp_yyyy-mm-dd_hh-mm-ss.jar /tmp
```
Unjar the config backup jar file at temporary location:
```
cd /tmp
jar xvf config_bkp_yyyy-mm-dd_hh-mm-ss.jar
```

Find the web.xml files in config backup directory:

cd config_bkp_yyyy-mm-dd_hh-mm-ss

On UNIX:

find . -name web.xml -print
./j2ee/home/applications/dms/WEB-INF/web.xml
./j2ee/home/applications/BC4J/webapp/WEB-INF/web.xml
./j2ee/home/default-web-app/WEB-INF/web.xml

Restore the web.xml files into the ORACLE_HOME:

cp j2ee/home/applications/dms/WEB-INF/web.xml
ORACLE_HOME/j2ee/home/applications/dms/WEB-INF/web.xml
cp j2ee/home/applications/BC4J/webapp/WEB-INF/web.xml
ORACLE_HOME/j2ee/home/applications/BC4J/WEB-INF/web.xml
cp j2ee/home/default-web-app/WEB-INF/web.xml
ORACLE_HOME/j2ee/home/default-web-app/WEB-INF/web.xml

Alternatively, you can combine steps 6 and 7 in a script. This can be done in a UNIX shell script as follows:

CSH> foreach (i) `find . -name web.xml -print`
CSH> cp $i $ORACLE_HOME\$i
CSH> end