Oracle® Application Server Administrator's Guide
10g Release 2 (10.1.2) B13995-06 |
|
Previous |
Next |
This chapter describes Oracle Application Server recovery strategies and procedures for different types of failures and outages.
It contains the following topics:
This section describes Oracle Application Server recovery strategies for different types of failures and outages. It contains the following topics:
Recovery Strategies for Data Loss, Host Failure, or Media Failure (Critical)
Recovery Strategies for Process Failures and System Outages (Non-Critical)
This section describes recovery strategies for outages that involve actual data loss or corruption, host failure, or media failure where the host or disk cannot be restarted and are permanently lost. This type of failure requires some type of data restoration before the Oracle Application Server environment (middle tier, Infrastructure, or both) can be restarted and continue with normal processing.
The strategies in this section use point-in-time recovery of the middle tier and Infrastructure. This means that, no matter where the loss occurred, the Infrastructure and the middle tier are always restored together so they are in sync as they were at the time of the last backup. Notice that in an Oracle Application Server environment recovery, the Infrastructure is always restored before the middle tier.
Assumptions
The following assumptions apply to the recovery strategies in this section:
ARCHIVELOG
mode was enabled for all Metadata Repository backups.
Complete recovery of the database can be performed, that is, no redo log files have been lost.
No administrative changes were made since the last backup. If administrative changes were made since the last backup, they will need to be reapplied after recovery is complete.
See Also: Appendix G, "Examples of Administrative Changes" to learn more about administrative changes |
Determining Which Strategy to Use
Recovery strategies are listed in the following tables:
Table 22-1, "Recovery Strategies for Data Loss, Host Failure, and Media Failure in Infrastructures"
Use this table if you experience data loss, host failure, or media failure in an Infrastructure installation. Find the type of loss and follow the recommended procedure. The procedures apply to Infrastructure that are installed into a single Oracle home, as well as Infrastructures with Identity Management in one Oracle home and a Metadata Repository in another Oracle home or host.
Use this table if you experience data loss, host failure, or media failure in a middle-tier installation. Find the type of loss and follow the recommended procedure.
If the loss occurred in both the Infrastructure and middle tier, follow the Infrastructure recovery strategy first, then the middle tier.
Table 22-1 Recovery Strategies for Data Loss, Host Failure, and Media Failure in Infrastructures
Type of Loss | Recovery Strategies |
---|---|
Loss of host |
You can restore to a new host that has the same hostname. Follow the procedure in Section 22.2.3, "Restoring an Infrastructure to a New Host". |
Oracle software/binary loss or corruption |
If any Oracle binaries have been lost or corrupted, you must recover the entire Infrastructure. Follow the procedure in Section 22.2.2, "Restoring an Infrastructure to the Same Host". |
Database or data failure of the Metadata Repository (datafile loss, control file loss, media failure, disk corruption) |
If the Metadata Repository is corrupted due to data loss or media failure, you can restore and recover it. Follow the procedure in Section 22.2.5, "Restoring and Recovering the Metadata Repository". |
Deletion or corruption of configuration files |
If you lose any configuration files in the Infrastructure Oracle home, you can restore them. Follow the procedure in Section 22.2.6, "Restoring Infrastructure Configuration Files". |
Deletion or corruption of configuration files and data failure of the Metadata Repository |
If you lose configuration files and the Metadata Repository is corrupted, you can restore and recover both. Follow these procedures: |
Table 22-2 Recovery Strategies for Data Loss, Host Failure, and Media Failure in Middle-Tier Instances
Type of Loss | Recovery Strategies |
---|---|
Loss of host |
If the host has been lost, you have two options:
In either case, follow the procedure in Section 22.2.8, "Restoring a Middle-Tier Installation to a New Host". Note that if the original host had a middle-tier installation and an Infrastructure, you cannot restore the middle-tier to a host with a different hostname or IP address. |
Oracle software/binary deletion or corruption |
If any Oracle binaries have been lost or corrupted, you must restore the entire middle tier to the same host. Follow the procedure in Section 22.2.7, "Restoring a Middle-Tier Installation to the Same Host". |
Deletion or corruption of configuration files |
If you lose any configuration files in the middle tier Oracle home, you can restore them. Follow the procedure in Section 22.2.9, "Restoring Middle-Tier Configuration Files". |
This section describes recovery strategies for process failures and system outages. These types of outages do not involve any data loss, and therefore do not require any files to be recovered. In some cases, failure may be transparent and no manual intervention is required to recover the failed component. However, in some cases, manual intervention is required to restart a process or component. While these strategies do not strictly fit into the category of backup and recovery, they are included in this book for completeness.
Determining Which Strategy to Use
Recovery strategies for process failures and system outages are listed in the following tables:
Table 22-3, "Recovery Strategies for Process Failures and System Outages in Infrastructures"
Use this table if you experience a failure or outage in an Infrastructure. Find the type of outage and follow the recommended procedure. The procedures apply to Infrastructures that are installed into a single Oracle home, as well as Infrastructures with Identity Management in one Oracle home and a Metadata Repository in another Oracle home or host.
Table 22-4, "Recovery Strategies for Process Failures and System Outages in Middle-Tier Instances"
Use this table if you experience a failure or outage on a middle-tier installation. Find the type of outage and follow the recommended procedure. The table contains UNIX commands. You can use the same commands on Windows by inverting the slashes, or you can use the Services tool in the Control Panel.
Table 22-3 Recovery Strategies for Process Failures and System Outages in Infrastructures
Type of Outage | How to Check Status and Restart |
---|---|
Host failure - no data loss |
To restart:
|
Metadata Repository instance failure (loss of the contents of a buffer cache or data residing in memory) |
To check status:
SQL> select status from v$instance; To restart: sqlplus /nolog
SQL> connect sys/password as sysdba
SQL> startup
SQL> quit
|
Metadata Repository listener failure |
To check status: lsnrctl status To restart: lsnrctl start |
Oracle Internet Directory server process ( |
To check status: ldapcheck To restart: opmnctl startproc ias-component=OID |
Oracle Internet Directory monitor process ( |
To check status: ldapcheck To restart: opmnctl startproc ias-component=OID |
Application Server Control Console failure |
To check status:
To restart: emctl start iasconsole |
Oracle HTTP Server process failure |
To check status: opmnctl status To restart: opmnctl startproc ias-component=HTTP_Server |
OC4J instance failure |
To check status: opmnctl status To restart: opmnctl startproc process-type=OC4J_instance_name
|
To check status: opmnctl status To restart: opmnctl startproc ias-component=OC4J process-type=OC4J_SECURITY |
|
OPMN daemon failure |
To check status: opmnctl status To restart: opmnctl start |
Table 22-4 Recovery Strategies for Process Failures and System Outages in Middle-Tier Instances
Type of Outage | How to Check Status and Restart |
---|---|
Host failure - no data loss |
To restart:
|
Application Server Control Console failure |
To check status:
To restart: emctl start iasconsole |
Oracle HTTP Server process failure |
To check status: opmnctl status To restart: opmnctl startproc ias-component=HTTP_Server |
OC4J instance failure |
To check status: opmnctl status To restart: opmnctl startproc process-type=OC4J_instance_name
|
OPMN daemon failure |
To check status: opmnctl status To restart: opmnctl start |
OracleAS Web Cache failure |
To check status: opmnctl status To restart: opmnctl startproc ias-component=WebCache |
This section contains the procedures for performing different types of recovery.
It contains the following topics:
You can use the Oracle Enterprise Manager 10g Application Server Control Console to manage backup and recovery of an Oracle Application Server instance. Use the following procedure to recover an Oracle Application Server instance:
Before performing a restore operation (restore_instance
or restore_config
) on an instance in a cluster, all OC4J processes across the cluster must be stopped. Use the following command to stop the processes:
ORACLE_HOME/opmn/bin/opmnctl @cluster
stopproc ias-component=OC4J
Some OC4J components (such as Wireless) do not have ias-component=OC4J
. For these components use the uniqueid value to stop the OC4J process. To determine which components have a uniqueid, use the following command:
ORACLE_HOME\opmn\bin\opmnctl @cluster status -fmt %typ%uid%prt -noheaders
The following is an example of the output from the command:
CUSTOM | N/A | DSA LOGLDR | N/A | logloaderd DCMDaemon | 1444413512 | dcm-daemon WebCache | 1500577871 | WebCache WebCache-admin | 1500577872 | WebCacheAdmin OHS | 1500577870 | HTTP_Server performance | 1500577873 | performance_server messaging | 1500577874 | messaging_server OC4J | 1500577865 | OC4J_Wireless
Stop all the OC4J processes, for which the second column (uid) value is not "N/A", with the following command:
ORACLE_HOME\opmn\bin\opmnctl @cluster stopproc uniqueid=1500577865
opmnctl: stopping opmn managed processes...
From the Home page for an application server instance, click Backup/Recovery to display the Backup/Recovery page.
Click Perform Recovery. Depending on the type of installation, the middle tier recovery screen or the Infrastructure recovery screen displays:
For the Infrastructure recovery screen, you can click the Recover Control Files check box to recover the control files for the instance. Click OK to perform the restore.
After the restore operation is complete, use the following command to restart the OC4J processes across the cluster:
ORACLE_HOME/opmn/bin/opmnctl @cluster
startproc ias-component=OC4J
For components that use uniqueid, you can restart their process by using the appropriate ias-component value or by using the following command:
opmnctl startall
This section describes how to restore an Infrastructure to the same host. You can use this procedure when you have lost some or all of your Oracle binaries.
Refer to Section 21.3.5, "Recovering an Instance on the Same Host" to restore the image backup of the Infrastructure Oracle home from your complete Oracle Application Server environment backup.
Note: If your Infrastructure is split and has Identity Management in one Oracle home, and the Metadata Repository in another Oracle home, perform this step on both Oracle homes. |
Note: If you receive a WWC-41439 error while trying to login to the Portal Home page, do one or all of the following:
|
Refer to Section 21.3.3, "Restoring a Node on a New Host" to perform the following types of restores:
Restore an Infrastructure to the same host after the operating system has been reinstalled. The hostname must remain the same on the host.
Restore an Infrastructure to a new host that has the same hostname as the original host.
Note: If your Infrastructure is split and has Identity Management in one Oracle home, and the Metadata Repository in another Oracle home, perform the procedures on both Oracle homes as described in Section 22.2.4, "Restoring an Identity Management Instance to a New Host" and Section 22.2.5.2, "Restoring and Recovering the Metadata Repository to a New Host". |
Refer to Section 21.3, "Recovering a Loss of Host Automatically" to perform the following types of restores:
Restore Identity Management to the same host after the operating system has been reinstalled. The hostname must remain the same on the host.
Restore Identity Management to a new host that has the same or different hostname as the original host.
The section describes how to restore and recover the Metadata Repository. You can use this when there has only been corruption to the Metadata Repository, and not to any other files in the Oracle home.
Restore and recover the Metadata Repository from your latest backup using your own procedure or the OracleAS Backup and Recovery Tool. Restart all Infrastructure processes after restoring a Metadata Repository.
The following sections describe Oracle recommended procedures for using the OracleAS Backup and Recovery Tool to restore and recover the Metadata Repository:
Restoring and Recovering the Metadata Repository to the Same Host
Restoring and Recovering the Metadata Repository to a New Host
This section covers several circumstances under which you may need to restore and recover the Metadata Repository to the same host:
Corrupted or Lost Datafile
If a datafile is corrupted or lost, you can use the following command to restore from the latest backup and perform a full recovery:
For UNIX:
bkp_restore.sh -m restore_repos
For Windows:
bkp_restore.bat -m restore_repos
Corrupted or Lost Control File
If a control file is corrupted or lost, you can use the following command to restore a control file backup, restore the datafiles, and perform a full recovery:
For UNIX:
bkp_restore.sh -m restore_repos -c
For Windows:
bkp_restore.bat -m restore_repos -c
When you use the -c
option, it restores the control file. This causes entries for tempfiles in locally-managed temporary tablespaces to be removed. You must add a new tempfile to the TEMP tablespace, or Oracle will display error ORA-25153: Temporary Tablespace is Empty.
To add a tempfile to the TEMP tablespace:
SQL> alter tablespace "TEMP" add tempfile 'ORACLE_HOME/oradata/GDB/ temp01.dbf' size 5120K autoextend on next 8k maxsize unlimited;
GDB
is the first part of the global database name.
Note that when you restore a control file, the tool performs an "alter database open resetlogs." This invalidates all backups and archivelogs. You should immediately perform a complete cold backup of the Metadata Repository, which will serve as the new baseline for your subsequent partial online backups.
Point-in-Time Recovery and Flashback Recovery
If you lost configuration files in your middle-tier or Infrastructure installation and restored those, you may want to restore or flashback the database to the same point-in-time as the configuration file backup. You can do this using one of the following commands:
For UNIX:
bkp_restore.sh -m restore_repos -u timestamp bkp_restore.sh flashback_repos -u timestamp
For Windows:
bkp_restore.bat -m restore_repos -u timestamp bkp_restore.bat flashback_repos -u timestamp
Flashback recovery to a point-in-time can undo any logical data corruption or user error. Flashback cannot undo physical data corruption due to media failure. Using the restore_repos
command, you can recover and restore the database to a point-in-time for both logical and physical data corruption. However, Flashback is faster at recovering logical data corruption because it does not require restoring backups.
You can specify any time between the time of your first backup and the current time, as long as none of the online redo logs were compromised. If any online redo logs are missing or corrupted, the latest time that can be specified is the time at which the last backup was made.
Note that when you do point-in-time recovery, the tool performs an "alter database open resetlogs." This invalidates all backups and archivelogs. You should immediately perform a complete cold backup of the Metadata Repository, which will serve as the new baseline for your subsequent partial online backups.
The Backup and Recovery Tool supports point-in-time recovery through resetlogs in all Oracle databases: Infrastructure with Identity Manager and Metadata Repository, RepCA, and generic Oracle databases (for example, OCS Infostore). The following is an example of a point-in-time recovery through resetlogs:
At time T1, a backup of the database is taken. Changes are made to the database. At time T2, a new backup is taken. More changes are made to the database. At time T3, another backup is taken. More changes are made. At time T4, the user restores and recovers the database to T3. Since this is a point-in-time recovery, the Backup and Recovery Tool opens the database with resetlogs to start a new log sequence after the recovery. At time T5, the user restores and recovers the database to T2 through the resetlogs created at T4.
Multiple backward point-in-time recoveries are supported for backups taken using backup_instance_cold
, backup_instance_online
, backup_instance_incr
. To perform multiple backward point-in-time recoveries using backup_cold
, backup_online
, and backup_incr
, you must follow the backup operation immediately with backup_config
.
When you restore the Metadata Repository to a new host (with the same hostname), the new host will not have the online redo logs that existed on the original host. Therefore, you cannot perform a full recovery; RMAN would give an error stating that it cannot find a certain log file (the online redo log file). Instead, you should do a point-in-time recovery using a time sometime between the first and most recent backup. You can do this by specifying the proper timestamp for the LOHA reconfigure operation. Use the procedure at Section 21.3.3, "Restoring a Node on a New Host" to restore the Metadata Repository.
During the LOHA reconfigure process, if the RMAN command returns an error and the log shows that the datafiles were restored and recovered, then LOHA will issue an "alter database open resetlogs" and the database will be opened in a consistent state. If no datafiles were restored and recovered, it is most likely that an early timestamp was specified. You should retry the command with a later timestamp.
LOHA uses the -c option during the restore process which means that the control file is restored from backup. This causes entries for tempfiles in locally-managed temporary tablespaces to be removed and a new TEMP tablespace to be added automatically. Restoring the control file means that an "alter database open resetlogs" is always performed, which invalidates all backups and archivelogs. You should immediately perform a complete cold backup of the Metadata Repository, which will serve as the new baseline for your subsequent partial online backups.
This section describes how to restore the configuration files in an Infrastructure Oracle home. You can use this procedure when configuration files have been lost or corrupted.
It contains the following tasks:
Task 1: Stop the Infrastructure
Refer to Section 3.2.4, "Stopping OracleAS Infrastructure" for instructions.
Task 2: Restore Infrastructure Configuration Files
Note: If your Infrastructure is split and has Identity Management in one Oracle home, and the Metadata Repository in another Oracle home, perform this task on both Oracle homes. |
Restore all configuration files from your most recent backup. You can perform this task using your own procedure or the OracleAS Backup and Recovery Tool. For example, to do this using the tool:
On UNIX systems:
bkp_restore.sh -m restore_config -t timestamp
On Windows systems:
bkp_restore.bat -m restore_config -t timestamp
Task 3: Apply Recent Administrative Changes
If you made any administrative changes since the last time you did an online backup, reapply them now.
See Also: Appendix G, "Examples of Administrative Changes" to learn more about administrative changes |
Task 4: Start the Infrastructure
Refer to Section 3.2.3, "Starting OracleAS Infrastructure" for instructions.
To restore a middle-tier installation to the same host, refer to Section 21.3.5, "Recovering an Instance on the Same Host".
This section describes how to restore and recover a middle-tier installation to a new host. You can use this procedure to:
Restore a middle-tier installation to the same host after the operating system has been reinstalled.
Restore a middle-tier installation to a new host. The new host may have the same hostname and IP address as the original host, or a different hostname, IP address, or both.
If the DCM repository is a database, start the OPMN and Oracle Internet Directory processes on the corresponding infrastructure instance.
Use the following command to start the OPMN process:
opmnctl start
Use the following command to start the Oracle Internet Directory process:
opmnctl startproc ias-component=OID
Use the following command to check if the DCM repository is a database or a file-based repository:
ORACLE_HOME/dcm/bin/dcmctl whichfarm
The preceding command returns one of the following messages:
Repository Type: Database => uses a database repository Repository Type: Distributed File Based => uses a file based repository
Perform the steps in Section 21.3.3, "Restoring a Node on a New Host" to restore the image backup, system files and instance reconfiguration. Note that the middle-tier configuration remains in the same state as the original instance. If the hostname remains the same, run an instance restore to bring the instance to the desired point in time. If the hostname is different, the state cannot be changed since backups of the original host are not valid for a different hostname.
Note: There is a special step required for updating OracleAS Portal and OracleAS Wireless when you change the hostname.When you change the hostname, the OracleAS Wireless server URL changes to use the new hostname. You must update OracleAS Portal with the new OracleAS Wireless service URL.Refer to the section on "Updating the Oracle AS Wireless Portal Service URL Reference" in "Oracle Application Server Portal Configuration Guide" for instructions. |
This section describes how to restore the configuration files in a middle-tier Oracle home. Use this procedure when configuration files have been lost or corrupted.
It contains the following tasks:
Task 1: Stop the Middle-Tier Instance
Refer to Section 3.2.6, "Stopping a Middle-Tier Instance" for instructions.
If the middle-tier instance uses a DCM repository (file-based or database), make sure the DCM repository is up.
Task 2: Restore Middle-Tier Configuration Files
Restore all configuration files from your most recent backup. You can perform this task using your own procedure or the OracleAS Backup and Recovery Tool. For example, to do this using the tool:
For UNIX systems:
bkp_restore.sh -m restore_config -t timestamp
For Windows systems:
bkp_restore.bat -m restore_config -t timestamp
Task 3: Apply Recent Administrative Changes
If you made any administrative changes since the last time you did an online backup, reapply them now.
See Also: Appendix G, "Examples of Administrative Changes" to learn more about administrative changes |
Task 4: Start the Middle-Tier Instance
Refer to Section 3.2.5, "Starting a Middle-Tier Instance" for instructions.
This section describes how to restore a DCM file-based repository to a new host. This section contains the following tasks:
Task 1: Restore Image Backup, System Files and Instance Reconfiguration
Task 2: Inform the Original Host That It Is No Longer a Repository Host (If Required)
Task 1: Restore Image Backup, System Files and Instance Reconfiguration
If the DCM repository is a database, start the OPMN and Oracle Internet Directory processes on the corresponding infrastructure instance.
Use the following command to start the OPMN process:
opmnctl start
Use the following command to start the Oracle Internet Directory process:
opmnctl startproc ias-component=OID
Use the following command to check if the DCM repository is a database or a file-based repository:
ORACLE_HOME/dcm/bin/dcmctl whichfarm
The preceding command returns one of the following messages:
Repository Type: Database => uses a database repository Repository Type: Distributed File Based => uses a file based repository
Perform the steps in Section 21.3.3, "Restoring a Node on a New Host" to restore the image backup, system files and instance reconfiguration.
Task 2: Inform the Original Host That It Is No Longer a Repository Host (If Required)
Now that the file-based repository is restored to the new host, the original host may need to be informed that it is no longer a repository host. If the new host was already a part of the farm and is not a replacement for the original host, and the original host is still part of the farm, execute the following command on the original host:
dcmctl repositoryrelocated
Use the following command to restore an Oracle Application Server instance to a particular point in time:
bkp_restore.sh -m restore_instance -t 2004-09-21_06-12-45 -c bkp_restore.bat -m restore_instance -t 2004-09-21_06-12-45 -c
Before performing a restore operation (restore_instance
or restore_config
) on an instance in a cluster, all OC4J processes across the cluster must be stopped. Use the following command to stop the processes:
ORACLE_HOME/opmn/bin/opmnctl @cluster
stopproc ias-component=OC4J
Some OC4J components (such as Wireless) do not have ias-component=OC4J
. For these components use the uniqueid value to stop the OC4J process. To determine which components have a uniqueid, use the following command:
ORACLE_HOME\opmn\bin\opmnctl @cluster status -fmt %typ%uid%prt -noheaders
The following is an example of the output from the command:
CUSTOM | N/A | DSA LOGLDR | N/A | logloaderd DCMDaemon | 1444413512 | dcm-daemon WebCache | 1500577871 | WebCache WebCache-admin | 1500577872 | WebCacheAdmin OHS | 1500577870 | HTTP_Server performance | 1500577873 | performance_server messaging | 1500577874 | messaging_server OC4J | 1500577865 | OC4J_Wireless
Stop all the OC4J processes, for which the second column (uid) value is not "N/A", with the following command:
ORACLE_HOME\opmn\bin\opmnctl @cluster stopproc uniqueid=1500577865
opmnctl: stopping opmn managed processes...
After the restore operation is complete, use the following command to restart the OC4J processes across the cluster:
ORACLE_HOME/opmn/bin/opmnctl @cluster
startproc ias-component=OC4J
For components that use uniqueid, you can restart their process by using the appropriate ias-component value or by using the following command:
opmnctl startall