Skip Headers
Oracle® Application Server Administrator's Guide
10g Release 2 (10.1.2)
B13995-06
  Go To Documentation Library
Home
Go To Product List
Solution Area
Go To Table Of Contents
Contents
Go To Index
Index

Previous
Previous
Next
Next
 

22 Recovery Strategies and Procedures

This chapter describes Oracle Application Server recovery strategies and procedures for different types of failures and outages.

It contains the following topics:

22.1 Recovery Strategies

This section describes Oracle Application Server recovery strategies for different types of failures and outages. It contains the following topics:

22.1.1 Recovery Strategies for Data Loss, Host Failure, or Media Failure (Critical)

This section describes recovery strategies for outages that involve actual data loss or corruption, host failure, or media failure where the host or disk cannot be restarted and are permanently lost. This type of failure requires some type of data restoration before the Oracle Application Server environment (middle tier, Infrastructure, or both) can be restarted and continue with normal processing.

The strategies in this section use point-in-time recovery of the middle tier and Infrastructure. This means that, no matter where the loss occurred, the Infrastructure and the middle tier are always restored together so they are in sync as they were at the time of the last backup. Notice that in an Oracle Application Server environment recovery, the Infrastructure is always restored before the middle tier.

Assumptions

The following assumptions apply to the recovery strategies in this section:

  • ARCHIVELOG mode was enabled for all Metadata Repository backups.

  • Complete recovery of the database can be performed, that is, no redo log files have been lost.

  • No administrative changes were made since the last backup. If administrative changes were made since the last backup, they will need to be reapplied after recovery is complete.


    See Also:

    Appendix G, "Examples of Administrative Changes" to learn more about administrative changes

Determining Which Strategy to Use

Recovery strategies are listed in the following tables:

If the loss occurred in both the Infrastructure and middle tier, follow the Infrastructure recovery strategy first, then the middle tier.

Table 22-1 Recovery Strategies for Data Loss, Host Failure, and Media Failure in Infrastructures

Type of Loss Recovery Strategies

Loss of host

You can restore to a new host that has the same hostname.

Follow the procedure in Section 22.2.3, "Restoring an Infrastructure to a New Host".

Oracle software/binary loss or corruption

If any Oracle binaries have been lost or corrupted, you must recover the entire Infrastructure.

Follow the procedure in Section 22.2.2, "Restoring an Infrastructure to the Same Host".

Database or data failure of the Metadata Repository (datafile loss, control file loss, media failure, disk corruption)

If the Metadata Repository is corrupted due to data loss or media failure, you can restore and recover it.

Follow the procedure in Section 22.2.5, "Restoring and Recovering the Metadata Repository".

Deletion or corruption of configuration files

If you lose any configuration files in the Infrastructure Oracle home, you can restore them.

Follow the procedure in Section 22.2.6, "Restoring Infrastructure Configuration Files".

Deletion or corruption of configuration files and data failure of the Metadata Repository

If you lose configuration files and the Metadata Repository is corrupted, you can restore and recover both.

Follow these procedures:

  1. Section 22.2.6, "Restoring Infrastructure Configuration Files"

  2. Section 22.2.5, "Restoring and Recovering the Metadata Repository"


Table 22-2 Recovery Strategies for Data Loss, Host Failure, and Media Failure in Middle-Tier Instances

Type of Loss Recovery Strategies

Loss of host

If the host has been lost, you have two options:

  • You can restore to a new host that has the same hostname and IP address.

  • You can restore to a new host that has a different hostname and IP address.

In either case, follow the procedure in Section 22.2.8, "Restoring a Middle-Tier Installation to a New Host".

Note that if the original host had a middle-tier installation and an Infrastructure, you cannot restore the middle-tier to a host with a different hostname or IP address.

Oracle software/binary deletion or corruption

If any Oracle binaries have been lost or corrupted, you must restore the entire middle tier to the same host.

Follow the procedure in Section 22.2.7, "Restoring a Middle-Tier Installation to the Same Host".

Deletion or corruption of configuration files

If you lose any configuration files in the middle tier Oracle home, you can restore them.

Follow the procedure in Section 22.2.9, "Restoring Middle-Tier Configuration Files".


22.1.2 Recovery Strategies for Process Failures and System Outages (Non-Critical)

This section describes recovery strategies for process failures and system outages. These types of outages do not involve any data loss, and therefore do not require any files to be recovered. In some cases, failure may be transparent and no manual intervention is required to recover the failed component. However, in some cases, manual intervention is required to restart a process or component. While these strategies do not strictly fit into the category of backup and recovery, they are included in this book for completeness.

Determining Which Strategy to Use

Recovery strategies for process failures and system outages are listed in the following tables:

Table 22-3 Recovery Strategies for Process Failures and System Outages in Infrastructures

Type of Outage How to Check Status and Restart

Host failure - no data loss

To restart:

  1. Restart the host.

  2. Start the Infrastructure. Refer to Section 3.2.3, "Starting OracleAS Infrastructure".

Metadata Repository instance failure (loss of the contents of a buffer cache or data residing in memory)

To check status:

  1. Try connecting to the database using SQL*Plus.

  2. Check the state as follows:

SQL> select status from v$instance;

To restart:

sqlplus /nolog
SQL> connect sys/password as sysdba
SQL> startup
SQL> quit

Metadata Repository listener failure

To check status:

lsnrctl status

To restart:

lsnrctl start

Oracle Internet Directory server process (oidldapd) failure

To check status:

ldapcheck

To restart:

opmnctl startproc ias-component=OID

Oracle Internet Directory monitor process (oidmon) failure

To check status:

ldapcheck

To restart:

opmnctl startproc ias-component=OID

Application Server Control Console failure

To check status:

emctl status iasconsole

To restart:

emctl start iasconsole

Oracle HTTP Server process failure

To check status:

opmnctl status

To restart:

opmnctl startproc ias-component=HTTP_Server

OC4J instance failure

To check status:

opmnctl status

To restart:

opmnctl startproc process-type=OC4J_instance_name

Delegated Administration Service instance failure

To check status:

opmnctl status

To restart:

opmnctl startproc ias-component=OC4J process-type=OC4J_SECURITY

OPMN daemon failure

To check status:

opmnctl status

To restart:

opmnctl start

Table 22-4 Recovery Strategies for Process Failures and System Outages in Middle-Tier Instances

Type of Outage How to Check Status and Restart

Host failure - no data loss

To restart:

  1. Restart the host.

  2. Start the middle tier. Refer to Section 3.2.5, "Starting a Middle-Tier Instance"

Application Server Control Console failure

To check status:

emctl status iasconsole

To restart:

emctl start iasconsole

Oracle HTTP Server process failure

To check status:

opmnctl status

To restart:

opmnctl startproc ias-component=HTTP_Server

OC4J instance failure

To check status:

opmnctl status

To restart:

opmnctl startproc process-type=OC4J_instance_name

OPMN daemon failure

To check status:

opmnctl status

To restart:

opmnctl start

OracleAS Web Cache failure

To check status:

opmnctl status

To restart:

opmnctl startproc ias-component=WebCache

22.2 Recovery Procedures

This section contains the procedures for performing different types of recovery.

It contains the following topics:

22.2.1 Using Application Server Control Console to Recover an Oracle Application Server Instance

You can use the Oracle Enterprise Manager 10g Application Server Control Console to manage backup and recovery of an Oracle Application Server instance. Use the following procedure to recover an Oracle Application Server instance:

Before performing a restore operation (restore_instance or restore_config) on an instance in a cluster, all OC4J processes across the cluster must be stopped. Use the following command to stop the processes:

ORACLE_HOME/opmn/bin/opmnctl @cluster stopproc ias-component=OC4J

Some OC4J components (such as Wireless) do not have ias-component=OC4J. For these components use the uniqueid value to stop the OC4J process. To determine which components have a uniqueid, use the following command:

ORACLE_HOME\opmn\bin\opmnctl @cluster status -fmt %typ%uid%prt -noheaders

The following is an example of the output from the command:

CUSTOM | N/A | DSA
LOGLDR | N/A | logloaderd
DCMDaemon | 1444413512 | dcm-daemon
WebCache | 1500577871 | WebCache
WebCache-admin | 1500577872 | WebCacheAdmin
OHS | 1500577870 | HTTP_Server
performance | 1500577873 | performance_server
messaging | 1500577874 | messaging_server
OC4J | 1500577865 | OC4J_Wireless

Stop all the OC4J processes, for which the second column (uid) value is not "N/A", with the following command:

ORACLE_HOME\opmn\bin\opmnctl @cluster stopproc uniqueid=1500577865

opmnctl: stopping opmn managed processes...

  1. From the Home page for an application server instance, click Backup/Recovery to display the Backup/Recovery page.

  2. Click Perform Recovery. Depending on the type of installation, the middle tier recovery screen or the Infrastructure recovery screen displays:

    Middletier recovery screen.
    Description of the illustration asadm048.gif

    Infrastructure recovery screen.
    Description of the illustration asadm049.gif

  3. For the Infrastructure recovery screen, you can click the Recover Control Files check box to recover the control files for the instance. Click OK to perform the restore.

After the restore operation is complete, use the following command to restart the OC4J processes across the cluster:

ORACLE_HOME/opmn/bin/opmnctl @cluster startproc ias-component=OC4J

For components that use uniqueid, you can restart their process by using the appropriate ias-component value or by using the following command:

opmnctl startall

22.2.2 Restoring an Infrastructure to the Same Host

This section describes how to restore an Infrastructure to the same host. You can use this procedure when you have lost some or all of your Oracle binaries.

Refer to Section 21.3.5, "Recovering an Instance on the Same Host" to restore the image backup of the Infrastructure Oracle home from your complete Oracle Application Server environment backup.


Note:

If your Infrastructure is split and has Identity Management in one Oracle home, and the Metadata Repository in another Oracle home, perform this step on both Oracle homes.


Note:

If you receive a WWC-41439 error while trying to login to the Portal Home page, do one or all of the following:
  • Remove aliases from your Apache configuration.

  • Include the domain in the ServerName parameter.

  • Fix the Host in the IASInstance element and ListenPort in the WebCacheComponent element in iasconfig.xml and run ptlconfig -dad portal-site. The ptlconfig script and the iasconfig.xml file is normally located in the directory portal/conf under the OracleAS Portal and OracleAS Wireless middle-tier home.


22.2.3 Restoring an Infrastructure to a New Host

Refer to Section 21.3.3, "Restoring a Node on a New Host" to perform the following types of restores:

  • Restore an Infrastructure to the same host after the operating system has been reinstalled. The hostname must remain the same on the host.

  • Restore an Infrastructure to a new host that has the same hostname as the original host.


Note:

If your Infrastructure is split and has Identity Management in one Oracle home, and the Metadata Repository in another Oracle home, perform the procedures on both Oracle homes as described in Section 22.2.4, "Restoring an Identity Management Instance to a New Host" and Section 22.2.5.2, "Restoring and Recovering the Metadata Repository to a New Host".

22.2.4 Restoring an Identity Management Instance to a New Host

Refer to Section 21.3, "Recovering a Loss of Host Automatically" to perform the following types of restores:

  • Restore Identity Management to the same host after the operating system has been reinstalled. The hostname must remain the same on the host.

  • Restore Identity Management to a new host that has the same or different hostname as the original host.

22.2.5 Restoring and Recovering the Metadata Repository

The section describes how to restore and recover the Metadata Repository. You can use this when there has only been corruption to the Metadata Repository, and not to any other files in the Oracle home.

Restore and recover the Metadata Repository from your latest backup using your own procedure or the OracleAS Backup and Recovery Tool. Restart all Infrastructure processes after restoring a Metadata Repository.

The following sections describe Oracle recommended procedures for using the OracleAS Backup and Recovery Tool to restore and recover the Metadata Repository:

22.2.5.1 Restoring and Recovering the Metadata Repository to the Same Host

This section covers several circumstances under which you may need to restore and recover the Metadata Repository to the same host:

Corrupted or Lost Datafile

If a datafile is corrupted or lost, you can use the following command to restore from the latest backup and perform a full recovery:

For UNIX:

bkp_restore.sh -m restore_repos

For Windows:

bkp_restore.bat -m restore_repos

Corrupted or Lost Control File

If a control file is corrupted or lost, you can use the following command to restore a control file backup, restore the datafiles, and perform a full recovery:

For UNIX:

bkp_restore.sh -m restore_repos -c

For Windows:

bkp_restore.bat -m restore_repos -c

When you use the -c option, it restores the control file. This causes entries for tempfiles in locally-managed temporary tablespaces to be removed. You must add a new tempfile to the TEMP tablespace, or Oracle will display error ORA-25153: Temporary Tablespace is Empty.

To add a tempfile to the TEMP tablespace:

SQL> alter tablespace "TEMP" add tempfile 'ORACLE_HOME/oradata/GDB/
temp01.dbf' size 5120K autoextend on next 8k maxsize unlimited;

GDB is the first part of the global database name.

Note that when you restore a control file, the tool performs an "alter database open resetlogs." This invalidates all backups and archivelogs. You should immediately perform a complete cold backup of the Metadata Repository, which will serve as the new baseline for your subsequent partial online backups.

Point-in-Time Recovery and Flashback Recovery

If you lost configuration files in your middle-tier or Infrastructure installation and restored those, you may want to restore or flashback the database to the same point-in-time as the configuration file backup. You can do this using one of the following commands:

For UNIX:

bkp_restore.sh -m restore_repos -u timestamp

bkp_restore.sh flashback_repos -u timestamp

For Windows:

bkp_restore.bat -m restore_repos -u timestamp

bkp_restore.bat flashback_repos -u timestamp

Flashback recovery to a point-in-time can undo any logical data corruption or user error. Flashback cannot undo physical data corruption due to media failure. Using the restore_repos command, you can recover and restore the database to a point-in-time for both logical and physical data corruption. However, Flashback is faster at recovering logical data corruption because it does not require restoring backups.

You can specify any time between the time of your first backup and the current time, as long as none of the online redo logs were compromised. If any online redo logs are missing or corrupted, the latest time that can be specified is the time at which the last backup was made.

Note that when you do point-in-time recovery, the tool performs an "alter database open resetlogs." This invalidates all backups and archivelogs. You should immediately perform a complete cold backup of the Metadata Repository, which will serve as the new baseline for your subsequent partial online backups.

The Backup and Recovery Tool supports point-in-time recovery through resetlogs in all Oracle databases: Infrastructure with Identity Manager and Metadata Repository, RepCA, and generic Oracle databases (for example, OCS Infostore). The following is an example of a point-in-time recovery through resetlogs:

At time T1, a backup of the database is taken. Changes are made to the database. At time T2, a new backup is taken. More changes are made to the database. At time T3, another backup is taken. More changes are made. At time T4, the user restores and recovers the database to T3. Since this is a point-in-time recovery, the Backup and Recovery Tool opens the database with resetlogs to start a new log sequence after the recovery. At time T5, the user restores and recovers the database to T2 through the resetlogs created at T4.

Multiple backward point-in-time recoveries are supported for backups taken using backup_instance_cold, backup_instance_online, backup_instance_incr. To perform multiple backward point-in-time recoveries using backup_cold, backup_online, and backup_incr, you must follow the backup operation immediately with backup_config.

22.2.5.2 Restoring and Recovering the Metadata Repository to a New Host

When you restore the Metadata Repository to a new host (with the same hostname), the new host will not have the online redo logs that existed on the original host. Therefore, you cannot perform a full recovery; RMAN would give an error stating that it cannot find a certain log file (the online redo log file). Instead, you should do a point-in-time recovery using a time sometime between the first and most recent backup. You can do this by specifying the proper timestamp for the LOHA reconfigure operation. Use the procedure at Section 21.3.3, "Restoring a Node on a New Host" to restore the Metadata Repository.

During the LOHA reconfigure process, if the RMAN command returns an error and the log shows that the datafiles were restored and recovered, then LOHA will issue an "alter database open resetlogs" and the database will be opened in a consistent state. If no datafiles were restored and recovered, it is most likely that an early timestamp was specified. You should retry the command with a later timestamp.

LOHA uses the -c option during the restore process which means that the control file is restored from backup. This causes entries for tempfiles in locally-managed temporary tablespaces to be removed and a new TEMP tablespace to be added automatically. Restoring the control file means that an "alter database open resetlogs" is always performed, which invalidates all backups and archivelogs. You should immediately perform a complete cold backup of the Metadata Repository, which will serve as the new baseline for your subsequent partial online backups.

22.2.6 Restoring Infrastructure Configuration Files

This section describes how to restore the configuration files in an Infrastructure Oracle home. You can use this procedure when configuration files have been lost or corrupted.

It contains the following tasks:

Task 1: Stop the Infrastructure

Refer to Section 3.2.4, "Stopping OracleAS Infrastructure" for instructions.

Task 2: Restore Infrastructure Configuration Files


Note:

If your Infrastructure is split and has Identity Management in one Oracle home, and the Metadata Repository in another Oracle home, perform this task on both Oracle homes.

Restore all configuration files from your most recent backup. You can perform this task using your own procedure or the OracleAS Backup and Recovery Tool. For example, to do this using the tool:

  • On UNIX systems:

    bkp_restore.sh -m restore_config -t timestamp
    
    
  • On Windows systems:

    bkp_restore.bat -m restore_config -t timestamp
    

Task 3: Apply Recent Administrative Changes

If you made any administrative changes since the last time you did an online backup, reapply them now.


See Also:

Appendix G, "Examples of Administrative Changes" to learn more about administrative changes

Task 4: Start the Infrastructure

Refer to Section 3.2.3, "Starting OracleAS Infrastructure" for instructions.

22.2.7 Restoring a Middle-Tier Installation to the Same Host

To restore a middle-tier installation to the same host, refer to Section 21.3.5, "Recovering an Instance on the Same Host".

22.2.8 Restoring a Middle-Tier Installation to a New Host

This section describes how to restore and recover a middle-tier installation to a new host. You can use this procedure to:

  • Restore a middle-tier installation to the same host after the operating system has been reinstalled.

  • Restore a middle-tier installation to a new host. The new host may have the same hostname and IP address as the original host, or a different hostname, IP address, or both.

If the DCM repository is a database, start the OPMN and Oracle Internet Directory processes on the corresponding infrastructure instance.

  • Use the following command to start the OPMN process:

    opmnctl start
    
    
  • Use the following command to start the Oracle Internet Directory process:

    opmnctl startproc ias-component=OID
    
    

    Use the following command to check if the DCM repository is a database or a file-based repository:

    ORACLE_HOME/dcm/bin/dcmctl whichfarm
    
    

    The preceding command returns one of the following messages:

    Repository Type: Database => uses a database repository
    Repository Type: Distributed File Based => uses a file based repository
    
    

Perform the steps in Section 21.3.3, "Restoring a Node on a New Host" to restore the image backup, system files and instance reconfiguration. Note that the middle-tier configuration remains in the same state as the original instance. If the hostname remains the same, run an instance restore to bring the instance to the desired point in time. If the hostname is different, the state cannot be changed since backups of the original host are not valid for a different hostname.


Note:

There is a special step required for updating OracleAS Portal and OracleAS Wireless when you change the hostname.When you change the hostname, the OracleAS Wireless server URL changes to use the new hostname. You must update OracleAS Portal with the new OracleAS Wireless service URL.Refer to the section on "Updating the Oracle AS Wireless Portal Service URL Reference" in "Oracle Application Server Portal Configuration Guide" for instructions.

22.2.9 Restoring Middle-Tier Configuration Files

This section describes how to restore the configuration files in a middle-tier Oracle home. Use this procedure when configuration files have been lost or corrupted.

It contains the following tasks:

Task 1: Stop the Middle-Tier Instance

Refer to Section 3.2.6, "Stopping a Middle-Tier Instance" for instructions.

If the middle-tier instance uses a DCM repository (file-based or database), make sure the DCM repository is up.

Task 2: Restore Middle-Tier Configuration Files

Restore all configuration files from your most recent backup. You can perform this task using your own procedure or the OracleAS Backup and Recovery Tool. For example, to do this using the tool:

  • For UNIX systems:

    bkp_restore.sh -m restore_config -t timestamp
    
    
  • For Windows systems:

    bkp_restore.bat -m restore_config -t timestamp
    

Task 3: Apply Recent Administrative Changes

If you made any administrative changes since the last time you did an online backup, reapply them now.


See Also:

Appendix G, "Examples of Administrative Changes" to learn more about administrative changes

Task 4: Start the Middle-Tier Instance

Refer to Section 3.2.5, "Starting a Middle-Tier Instance" for instructions.

22.2.10 Restoring a File-Based Repository to a New Host

This section describes how to restore a DCM file-based repository to a new host. This section contains the following tasks:

Task 1: Restore Image Backup, System Files and Instance Reconfiguration

If the DCM repository is a database, start the OPMN and Oracle Internet Directory processes on the corresponding infrastructure instance.

  • Use the following command to start the OPMN process:

    opmnctl start
    
    
  • Use the following command to start the Oracle Internet Directory process:

    opmnctl startproc ias-component=OID
    
    

    Use the following command to check if the DCM repository is a database or a file-based repository:

    ORACLE_HOME/dcm/bin/dcmctl whichfarm
    
    

    The preceding command returns one of the following messages:

    Repository Type: Database => uses a database repository
    Repository Type: Distributed File Based => uses a file based repository
    
    

Perform the steps in Section 21.3.3, "Restoring a Node on a New Host" to restore the image backup, system files and instance reconfiguration.

Task 2: Inform the Original Host That It Is No Longer a Repository Host (If Required)

Now that the file-based repository is restored to the new host, the original host may need to be informed that it is no longer a repository host. If the new host was already a part of the farm and is not a replacement for the original host, and the original host is still part of the farm, execute the following command on the original host:

dcmctl repositoryrelocated

22.2.11 Restoring an Oracle Application Server Instance

Use the following command to restore an Oracle Application Server instance to a particular point in time:

bkp_restore.sh -m restore_instance -t 2004-09-21_06-12-45 -c

bkp_restore.bat -m restore_instance -t 2004-09-21_06-12-45 -c

Before performing a restore operation (restore_instance or restore_config) on an instance in a cluster, all OC4J processes across the cluster must be stopped. Use the following command to stop the processes:

ORACLE_HOME/opmn/bin/opmnctl @cluster stopproc ias-component=OC4J

Some OC4J components (such as Wireless) do not have ias-component=OC4J. For these components use the uniqueid value to stop the OC4J process. To determine which components have a uniqueid, use the following command:

ORACLE_HOME\opmn\bin\opmnctl @cluster status -fmt %typ%uid%prt -noheaders

The following is an example of the output from the command:

CUSTOM | N/A | DSA
LOGLDR | N/A | logloaderd
DCMDaemon | 1444413512 | dcm-daemon
WebCache | 1500577871 | WebCache
WebCache-admin | 1500577872 | WebCacheAdmin
OHS | 1500577870 | HTTP_Server
performance | 1500577873 | performance_server
messaging | 1500577874 | messaging_server
OC4J | 1500577865 | OC4J_Wireless

Stop all the OC4J processes, for which the second column (uid) value is not "N/A", with the following command:

ORACLE_HOME\opmn\bin\opmnctl @cluster stopproc uniqueid=1500577865

opmnctl: stopping opmn managed processes...

After the restore operation is complete, use the following command to restart the OC4J processes across the cluster:

ORACLE_HOME/opmn/bin/opmnctl @cluster startproc ias-component=OC4J

For components that use uniqueid, you can restart their process by using the appropriate ias-component value or by using the following command:

opmnctl startall