4
Backup and Recovery Strategies

This chapter offers guidelines and considerations for developing an effective backup and recovery strategy. It includes the following topics:

Backup Strategies

Before you create an Oracle database, decide how to protect the database against potential media failures. If you do not develop a backup strategy before creating your database, then you may not be able to perform recovery if a disk failure damages the datafiles, online redo log files, or control files.

This section describes general guidelines that can help you decide when to perform database backups and which parts of a database you should back up. Of course, the specifics of your strategy depend on the constraints under which you are operating.

This section contains these topics:

Obeying the Golden Rule of Backup and Recovery

The set of files needed to recover from the failure of any Oracle database file--a datafile, control file, or online redo log--is called the redundancy set. The redundancy set contains:

The last backup of the control file and all the datafiles
All archived redo logs generated after the last backup was taken
A duplicate of the online redo log files generated by Oracle multiplexing, operating system mirroring, or both
A duplicate of the current control file generated by Oracle multiplexing, operating system mirroring, or both
Configuration files such as the server parameter file, tnsnames.ora, and listener.ora

The golden rule of backup and recovery is: the set of disks or other media that contain the redundancy set should be separate from the disks that contain the datafiles, online redo logs, and control files. This strategy ensures that the failure of a disk that contains a datafile does not also cause the loss of the backups or redo logs needed to recover the datafile. Consequently, a minimal production-level database requires at least two disk drives: one to hold the files in the redundancy set and one to hold the database files.

Always keep the redundancy set separate from the primary files in every way possible: on separate volumes, separate file systems, and separate RAID devices. These systems are reliable, but they can and do fail. Keeping the redundancy set separate ensures that you can recover from a failure without losing committed transactions.

You can implement a system that follows the golden rule in several different ways. Oracle recommends following these guidelines:

Multiplex the online redo log files and current control file at the Oracle level, not only at the operating system or hardware level. Multiplexing at the Oracle level has the advantage that an I/O failure or lost write should only corrupt one of the copies.
Use operating system or hardware mirroring for at least the control file, because Oracle does not provide complete support for control file multiplexing: if one multiplexed copy of the control file fails, then the Oracle instance shuts down.
Use operating system or hardware mirroring for the primary datafiles if possible to avoid having to apply media recovery for simple disk failures.
Keep at least one copy of the entire redundancy set--including the most recent backup--on hard disk.

If the redundancy copy is created by splitting a local mirror, then it is not as good as a backup created through operating system or RMAN commands because it relies on the mirroring subsystem for both the primary files and redundancy set copy. The last file backup, such as the last backup to tape, is the redundancy set copy. Hence, keep archived logs needed to recover this copy.
If your database is stored on a RAID device, then place the redundancy set on a set of devices that is not in the same RAID device.
If you keep the redundancy set on tapes, then maintain at least two copies of the data because tapes can fail. Also, if you have more than one copy of the same data, then consider keeping backups from different points in time. In this way, if one backup or split mirror was done when the database was corrupted, then you have an older backup when the database was not corrupted.

Choosing the Database Archiving Mode

Before you create an Oracle database, decide how you plan to protect it against potential failures. Answer the following questions:

Is it acceptable to lose any data if a disk failure damages some of the files that constitute a database?

If not, then run the database in ARCHIVELOG mode, ideally with a multiplexed online redo log, a multiplexed control file, and multiplexed archive redo logs. If you can afford to lose all data from your last backup to the point of failure, then you can operate in NOARCHIVELOG mode and avoid the extra maintenance chores. You may have alternative ways of re-creating the data.
Will you need to recover to a noncurrent time?

If you need to perform incomplete recovery to correct an erroneous change to the database, then run in ARCHIVELOG mode and perform control file backups whenever making structural changes. Incomplete recovery is easiest when you have a backup control file reflecting the database structure at the desired time.
Does the database need to be available at all times?

High-availability databases always operate in ARCHIVELOG mode to take advantage of online datafile backups.

After you have answered these questions and determined which mode to use, follow the guidelines for either:

Backing Up a NOARCHIVELOG Database

If you run the database in NOARCHIVELOG mode, Oracle does not archive filled groups of online redo log files. Therefore, the only protection against a disk failure is the most recent whole backup of the database. Follow these guidelines:

Make whole database backups regularly, according to the amount of work that you can afford to lose. For example, if you can afford to lose the amount of work accomplished in one week, then make a consistent whole database backup once every week. If you can afford to lose only a day's work, then make a consistent whole database backup every day. For large databases with a high amount of activity, you usually cannot afford to lose work. In this case, you should operate the database in ARCHIVELOG mode.
Whenever you alter the physical structure of a database operating in NOARCHIVELOG mode, immediately take a consistent whole database backup. A whole database backup fully reflects the new structure of the database.

Backing Up an ARCHIVELOG Database

If you run your database in ARCHIVELOG mode, then the archiver archives groups of online redo log files. Therefore, the archived redo log coupled with the online redo log and datafile backups can protect the database from a disk failure, providing for complete recovery from a disk failure to the instant that the failure occurred (or, to the desired noncurrent time). Following are common backup strategies for a database operating in ARCHIVELOG mode:

Back up the entire database after you create it. This initial whole database backup is the foundation of your backups because it provides backups of all datafiles and the control file of the associated database.

Note:

When you perform this initial whole database backup, make sure that the database is in ARCHIVELOG mode first. Otherwise, the backup control files will contain the NOARCHIVELOG mode setting.

Make backups of tablespaces when the database is open or closed to keep the database backups up-to-date. So long as you have the necessary archived logs to recover the backup, you never have to shut down the database to make a backup.

In particular, back up the datafiles of extensively used tablespaces frequently to reduce database recovery time. If a more recent datafile backup restores a damaged datafile, then you need to apply less redo (or incremental backups) to the restored datafile to roll it forward to the time of the failure.

You can also use a datafile copy taken while the database is open and the tablespace is online to restore datafiles. You must apply the appropriate redo log files to these restored datafiles to make the data consistent and bring it forward to the specified point in time.
Back up the control file every time you make a structural change to the database. If you run in ARCHIVELOG mode and the database is open, then use either RMAN or the SQL statement ALTER DATABASE BACKUP CONTROLFILE.
Back up archived logs frequently. It is strongly recommended that you keep at least two copies of archived logs: one on disk and another on off-line storage (tape, optical disks, and so forth). Keep the logs on disk as long as possible but back them up as soon as possible.

Multiplexing Control Files, Online Redo Logs, and Archived Redo Logs

Control files, online redo logs, and archived redo logs are crucial files for backup and recovery operations. The loss of any of these files can cause you to lose data irrevocably. You should maintain:

At least two copies of the control file on different disks mounted under different disk controllers. You should use Oracle to multiplex the copies and your operating system to mirror each copy.
Two or more copies of your online redo log on different disks. The online redo data is crucial for instance, crash, and media recovery.
Two or more copies of your archived redo log on different disks and, if possible, different media.

See Also:
Oracle9i Database Concepts for a conceptual overview of all Oracle data structures.

Performing Backups Frequently and Regularly

Frequent backups are essential for any recovery scheme. Base the frequency of backups on the rate or frequency of database changes such as:

Addition and deletion of tables
Insertions and deletions of rows in existing tables
Updates to data within tables

If users generate a significant amount of DML, then database backup frequency should be proportionally high. Alternatively, if a database is mainly read-only, and if updates are issued only infrequently, then you can back up the database less frequently.

You can use either RMAN or user-managed methods to create backup scripts. If you set persistent configurations using RMAN's CONFIGURE command, however, then you should not typically need to write extensive scripts. You can regularly run BACKUP DATABASE PLUS ARCHIVELOG.

See Also:

Oracle9i Recovery Manager User's Guide to learn how to create, delete, replace, and print stored scripts

Performing Backups Before and After You Make Structural Changes

Administrators as well as users make changes to a database. If you make any of the following structural changes, then perform a backup of the appropriate portion of your database immediately before and after completing the following changes:

Create or drop a tablespace.
Add or rename a datafile in an existing tablespace.
Add, rename, or drop an online redo log group or member.

The part of the database that you should back up depends on your archiving mode:

Mode	Action
`ARCHIVELOG`	Make a control file backup (using RMAN or using the `ALTER` `DATABASE` statement with the `BACKUP` `CONTROLFILE` option) before and after a structural alteration. Of course, you can back up other parts of the database as well.
`NOARCHIVELOG`	Make a consistent whole database backup immediately before and after the modification.

Backing Up Often-Used Tablespaces

Many DBAs find that regular whole database backups are not in themselves sufficient for a robust backup strategy. If you run in ARCHIVELOG mode, then you can back up the datafiles of an individual tablespace or even a single datafile. This option is useful if a portion of a database is used more extensively than others, for example, the SYSTEM tablespace and automatic undo tablespaces.

By making more frequent backups of the extensively used datafiles of a database, you avoid a long recovery time. For example, you may make a whole database backup once every two weeks. If the database experiences heavy traffic during the week, then a media failure on Friday can force you to apply a tremendous amount of redo during recovery. If you had backed up your most frequently accessed tablespaces three times a week, then you could apply a smaller number of changes to roll the restored file forward to the time of the failure.

See Also:

Oracle9i Database Administrator's Guide for information about managing undo tablespaces

Performing Backups After Unrecoverable Operations

If users are creating tables or indexes using the UNRECOVERABLE option, then make backups after the objects are created. When tables and indexes are created as UNRECOVERABLE, Oracle does not log redo data, which means that you cannot recover these objects from existing backups.

Note:

If using RMAN, then you can make an incremental backup.

See Also:

Oracle9i SQL Reference for information about the UNRECOVERABLE option of the CREATE TABLE ... AS SELECT and CREATE INDEX statements.

Performing Whole Database Backups After Opening with the RESETLOGS Option

After you open a database with the RESETLOGS option, Oracle Corporation recommends that you immediately perform a whole database backup. If you do not, and if a disaster occurs, then it is possible to lose all changes made after opening the database.

In certain cases, you can restore a backup made prior to a RESETLOGS and recover the database, but the procedure is complicated and requires you to have a control file backup from before and after the RESETLOGS operations. A whole database backup created after a RESETLOGS protects against this situation.

See Also:

Oracle9i Recovery Manager User's Guide to learn how to recover using a backup created before a RESETLOGS

Archiving Older Backups

You may need to store older backups for two basic reasons:

An older backup is necessary for performing incomplete recovery to a time before your most recent backup
Your most recent backup is corrupted

If you want to recover to a noncurrent time, then you need a database backup that completed before the desired time. For example, if you make backups on the 1st and 14th of February, then decide at the end of the month to recover your database to February 7th, you must use the February 1st (or earlier) backup.

For a database operating in NOARCHIVELOG mode, the backup that you use must be a consistent whole database backup. Of course, you cannot perform media recovery using this backup. For a database operating in ARCHIVELOG mode, the whole database backup:

Does not need to be consistent because redo is available to recover it
Should have completed before the intended recovery time
Should have all archived logs necessary to recover the datafiles to the required point-in-time
Should be recovered with a control file that reflects the database's structure at the point-in-time that ends the recovery

For added protection, keep two or more database backups (with associated archived redo logs) previous to the current backup. Thus, if your most recent backups are not usable, then you will not lose all of your data.

Knowing the Constraints for Distributed Database Backups

If the database is a member of a distributed database system, then all databases in the system should operate in the same archiving mode. Note the consequences and constraints contained in the following table.

Mode	Constraint	Consequence
`ARCHIVELOG`	Closed cleanly	Backups at each node can be performed autonomously, that is, individually and without time coordination.
`NOARCHIVELOG`	Closed cleanly	Consistent whole database backups must be performed at the same global time to plan for global distributed database recovery. For example, if a database in New York is backed up at midnight EST, the database in San Francisco should be backed up at 9 PM PST.

See Also:

Oracle9i Database Administrator's Guide to learn how to manage distributed database systems

Exporting Data for Added Protection and Flexibility

Because the Oracle Export utility can selectively export specific objects, consider exporting portions or all of a database for supplemental protection and flexibility in a database's backup strategy. This strategy is especially useful for logical backups of the RMAN recovery catalog, because you can quickly reimport this data into any database and rebuild the catalog if the recovery catalog database is lost.

Database exports are not a substitute for whole database backups and cannot provide the same complete recovery advantages that the built-in functionality of Oracle offers. For example, you cannot apply archived logs to logical backups in order to update lost changes. An export provides a snapshot of the logical data (tables, stored procedures, and so forth) in a database when the export was made.

See Also:

Oracle9i Database Utilities for an account of the Export utility

Avoiding the Backup of Online Redo Logs

Although it may seem that you should back up online redo logs along with the datafiles and control file, this technique is dangerous. You should not back up online redo logs for the following reasons:

The best method for protecting the online logs against media failure is by multiplexing them, that is, having multiple log members in each group, on different disks and disk controllers.
If your database is in ARCHIVELOG mode, then the archiver is already archiving the filled redo logs.
If your database is in NOARCHIVELOG mode, then the only type of backups that you should perform are closed, consistent, whole database backups. The files in this type of backup are all consistent and do not need recovery, so the online logs are not needed.
You may accidentally restore backups of online redo logs while not intending to, thereby corrupting the database.

A number of situations are possible in which restoring the online logs cause significant problems to the database. The following sections describe scenarios that illustrate how restoring backed up online logs severely compromises recovery.

Unintentionally Restoring Online Redo Logs: Scenario

When a crisis occurs, it is easy to make a simple mistake. When restoring the whole database, you can accidentally restore the online redo logs, thus overwriting the current online logs with the older, useless backups. This action forces you to perform incomplete recovery instead of the intended complete recovery, thereby losing the ability to recover valuable data contained in the overwritten redo logs.

Erroneously Creating Multiple Parallel Redo Log Timelines: Scenario

If you face a problem where the best course of action is to restore the database from a consistent backup and not perform any recovery, then you may think it is safe to restore the online logs and thereby avoid opening the database with the RESETLOGS option. The problem is that Oracle eventually generates a log sequence number that was already generated by the database during the previous timeline.

For example, say that the most recent archived log for database prod1 has a log sequence number of 100. Assume that you restore a consistent backup of the database along with backed up online redo logs and then do not open with the RESETLOGS option. Assume also that the restored online log is at log sequence 50. Eventually, the database archives a log with the log sequence number of 100--so you now have two copies of log 100 with completely different contents.

If you then face another disaster and need to restore from this backup and roll forward, then you may find it difficult to identify which log with sequence number 100 is the correct one. If you had reset the logs, then you would have created a new incarnation of the database. You could only apply archived logs created by this new incarnation to this incarnation.

Note:

RMAN does not permit you to back up online redo logs.

Keeping Records of the Hardware and Software Configuration of the Server

During the stress of a recovery situation, it is important that you have all necessary information at your disposal. This is especially true if for some reason you need to contact Oracle Support because you run into a problem that you do not understand. You should have the following documentation about the hardware configuration:

The name of the node that hosts the database
The make and model of the production machine
The version and patch of the operating system
The disk capacity of the host
The number of disks and disk controllers
The disk capacity and free space
The media management vendor (if you use a third-party media manager)
The type and number of media management devices

You should also keep the following documentation about the software configuration:

The name of the database instance (SID)
The database identifier (DBID)
The version and patch release of the Oracle database server
The version and patch release of the networking software
The method (RMAN or user-managed) and frequency of database backups
The method of restore and recovery (RMAN or user-managed)
The datafile mount points

You should keep this information both in electronic and hardcopy form. For example, if you save this information in a text file on the network or in an email message, then if the entire system goes down, you may not have this data available.

Restore and Recovery Strategies

Oracle provides a variety of procedures and tools to assist you with recovery. To develop an effective recovery strategy, do the following:

Testing Backup and Recovery Strategies

Practice backup and recovery techniques in a test environment before and after you move to a production system. In this way, you can measure the thoroughness of your strategies and minimize problems before they occur in a real situation. Performing test recoveries regularly ensures that your archiving, backup, and recovery procedures work. It also helps you stay familiar with recovery procedures, so that you are less likely to make a mistake in a crisis.

If you use RMAN, then run the DUPLICATE command to create a test database using backups of your production database. If you perform user-managed backup and recovery, then you can either create a new database, a standby database, or a copy of an existing database by using a combination of operating system and SQL*Plus commands.

When testing your backup and recovery strategy, ask yourself these questions:

If a disk failed and destroyed some of the database files, could I perform a full recovery of the files on this disk? Test separately for loss of datafiles, control files, and online redo logs.
If a user accidentally dropped a table, how could I recover from it? Test scenarios involving incomplete recovery of the whole database, tablespace point-in-time recovery, and using the Import utility.
What if the alert_SID.log revealed that one or more tables contained corrupt blocks? Test block recovery using the RMAN BLOCKRECOVER command. Also, troubleshoot recovery with the SQL*Plus RECOVER ... TEST command.
If the entire data center was destroyed by a fire, could you perform disaster recovery? Assume that all you have is an archived tape containing backups. How would you recover the database?

See Also:
Oracle9i Recovery Manager User's Guide for RMAN testing methods, and Oracle9i User-Managed Backup and Recovery Guide to learn how to troubleshoot SQL*Plus recovery

Validating Backups and Restores Using RMAN

If you use RMAN, then you can use the VALIDATE keyword on the BACKUP and RESTORE commands. BACKUP VALIDATE tests whether you are able to make a valid backup of database files. RESTORE VALIDATE tests whether you are able to restore an RMAN backup. Note that neither of these commands produces any actual output files.

Planning a Response to Media Failures

Media failure is the biggest threat to your data. A media failure is a physical problem that occurs when a computer unsuccessfully attempts to read from or write to a file necessary to operate the database. Common types of media problems include:

A disk drive that holds one of the database files experiences a head crash.
A datafile, online or archived redo log, or control file is accidentally deleted, overwritten, or corrupted.

The technique you use to recover from media failure of a database file depends heavily on the type of media failure that occurred. For example, the strategy you use to recover from a corrupted datafile is different from the strategy for recovering from the loss of the control file.

The basic steps for media recovery are:

Determine which files to recover.
Determine the type of media recovery required: complete or incomplete, open database or closed database.

Restore backups or copies of necessary files: datafiles, control files, and the archived redo logs necessary to recover the datafiles.

Note:

If you do not have a backup, then you can still perform recovery if you have the necessary redo logs and the control file contains the name of the damaged file. If you cannot restore a file to its original location, then you must relocate the restored file and rename the file in the control file.

Apply redo records to recover the datafiles. (When using Recovery Manager, apply redo records or incremental backups, or both.)
Reopen the database. If you perform incomplete recovery or restore a backup control file, then you must open the database with the RESETLOGS option.

See Also:
Oracle9i Recovery Manager User's Guide to learn how to perform media recovery using RMAN

When you perform datafile media recovery, you choose either complete recovery or incomplete recovery. The type of recovery method you use depends on the situation. Table 4-1 displays typical scenarios and strategies.

Table 4-1 Typical Media Failures and Recovery Strategies

Lost/Inaccessible Files	Archiving Mode	Status	Strategy
One or more datafiles	`NOARCHIVELOG`	Closed	Restore whole database from a consistent database backup. All changes made after the backup are lost. Open the database with the `RESETLOGS` option. Note: The only time you can open a database without performing `RESETLOGS` after restoring a `NOARCHIVELOG` backup is when you have not already overwritten the online log files that were current at the time of the most recent backup.
One or more datafiles and an online redo log	`NOARCHIVELOG`	Closed	Restore whole database from consistent backup. You lose all changes made after the last backup. Open the database with the `RESETLOGS` option.
One or more datafiles and all control files	`NOARCHIVELOG`	Closed	Restore the whole database and control file from consistent backup. You lose all changes made after the last backup. Open the database with the `RESETLOGS` option.
One or more (but not all) datafiles	`ARCHIVELOG`	Open	Perform tablespace or datafile recovery while the database is open. The tablespaces or datafiles are taken offline, restored from backups, recovered, and placed online. No changes are lost and the database remains available during the recovery.
All datafiles	`ARCHIVELOG`	Closed	Restore the backup datafiles, then mount the control file and recover the database completely. Assuming all redo logs are available, you can open the database as normal (that is, do not perform a `RESETLOGS`).
One or more datafiles and an archived redo log required for recovery	`ARCHIVELOG`	Open	Perform TSPITR on the tablespaces containing the lost datafiles up to the point of the latest available archived redo log.
All control files and possibly one or more datafiles	`ARCHIVELOG`	Not open	Restore the lost control files and datafiles from backups and recover the datafiles. No changes are lost, but the database is unavailable during recovery. Open the database with the `RESETLOGS` option.
All control files and possibly one or more datafiles, as well as an archived or online redo log required for recovery	`ARCHIVELOG`	Not open	Restore the necessary files from backups, then perform incomplete recovery of the database up to the point of the most recent available log. You will lose all changes contained in the lost log and in all subsequent logs. Open the database with the `RESETLOGS` option.

Online Redo Log Recovery

The method of recovery from loss of all members in an online log group depends on a number of factors, such as:

The state of the database (open, crashed, closed cleanly, and so on)
Whether the lost group was current
Whether the lost group has been archived

For example:

If you lose the current group, and the database is not closed cleanly (either it is open, or it has crashed), then you will have to restore an old backup and do point in time recovery, followed by open resetlogs. You will lose all transactions that were in the lost log.
If you lose the current group, and the database is closed cleanly, you can open resetlogs with no transaction loss. You should immediately take a new full backup.
If you lose a noncurrent group, you can use the 'alter database clear logfile' command to re-create all members in the group. No transactions are lost.
If the group that you lost was archived before it was lost, nothing further is required. If the group was not archived, you should immediately take a new full backup.

Planning a Response to Datafile Block Corruption

If selected blocks within a datafile are corrupt, then you may not have to restore and recover the whole datafile. Instead, you can perform block media recovery. The Recovery Manager BLOCKRECOVER command can restore and recover specified data blocks while the database is open and the corrupted datafile is online.

See Also:

Oracle9i Recovery Manager User's Guide to learn how to perform block media recovery

Planning the Response to Non-Media Failures

Although media recovery is your primary concern when developing your recovery strategy, you should understand the basic types of non-media failures as well as the causes and solutions for each.

Statement Failure

A statement failure is a logical failure in the handling of a statement in an Oracle program. The Oracle database server or the operating system usually returns an error code and a message when a statement failure occurs.

User Error

Users errors are any mistakes that users make in adding data to or deleting data from the database. If you have a logical backup of a table from which data has been lost, sometimes you can simply import it back into the table.

Depending on the scenario, you may have to perform some type of incomplete media recovery to correct user errors. You can perform either database point-in-time recovery (DBPITR) or tablespace point-in-time recovery (TSPITR). The following table explains the difference between these types of incomplete recovery.

Type	Description
DBPITR	Restore whole database backup. Recover the database to the time just before the error. Open `RESETLOGS`
TSPITR	Create auxiliary instance with RMAN or user-managed methods. Recover the tablespace on the auxiliary to the time just before the error. Import data back into the primary database.

Instance Failure

Instance failure occurs when an instance abnormally terminates. An instance failure can occur because:

A power outage causes the server to crash.
The server becomes unavailable because of hardware problems.
The operating system crashes.
One of the Oracle background processes fails.
You issue a SHUTDOWN ABORT statement.

Fortunately, Oracle performs instance recovery automatically: all you need to do is restart the database. Oracle automatically detects that the database was not shut down cleanly, then applies committed and uncommitted redo records in the redo log to the datafiles and rolls back uncommitted data. Finally, Oracle synchronizes the datafiles and control file and opens the database.

4 Backup and Recovery Strategies

Backup Strategies

Obeying the Golden Rule of Backup and Recovery

Choosing the Database Archiving Mode

Backing Up a NOARCHIVELOG Database

Backing Up an ARCHIVELOG Database

Multiplexing Control Files, Online Redo Logs, and Archived Redo Logs

Performing Backups Frequently and Regularly

Performing Backups Before and After You Make Structural Changes

Backing Up Often-Used Tablespaces

Performing Backups After Unrecoverable Operations

Performing Whole Database Backups After Opening with the RESETLOGS Option

Archiving Older Backups

Knowing the Constraints for Distributed Database Backups

Exporting Data for Added Protection and Flexibility

Avoiding the Backup of Online Redo Logs

Unintentionally Restoring Online Redo Logs: Scenario

Erroneously Creating Multiple Parallel Redo Log Timelines: Scenario

Keeping Records of the Hardware and Software Configuration of the Server

Restore and Recovery Strategies

Testing Backup and Recovery Strategies

Validating Backups and Restores Using RMAN

Planning a Response to Media Failures

Table 4-1 Typical Media Failures and Recovery Strategies

Online Redo Log Recovery

Planning a Response to Datafile Block Corruption

Planning the Response to Non-Media Failures

Statement Failure

User Error

Instance Failure

4
Backup and Recovery Strategies