Oracle® Database Oracle Clusterware and Oracle Real Application Clusters Administration and Deployment Guide 10g Release 2 (10.2) Part Number B14197-03 |
|
|
View PDF |
The Oracle Clusterware includes two important components: the voting disk and the Oracle Cluster Registry (OCR). The voting disk is a file that manages information about node membership and the OCR is a file that manages cluster and RAC database configuration information. This chapter describes how to administer the voting disks and the Oracle Cluster Registry (OCR) under the following topics:
Administering the Oracle Cluster Registry in Real Application Clusters
Administering Multiple Cluster Interconnects on UINIX-Based Platforms
Oracle recommends that you select the option to configure multiple voting disks during Oracle Clusterware installation to improve availability. After installation, use the following procedures to regularly backup your voting disks and to recover them as needed:
Note: You can dynamically add voting disks after you complete the Oracle Clusterware and RAC installation processes. |
See Also: "Administering the Oracle Cluster Registry in Real Application Clusters" for more information about administering the OCR |
Run the following command to back up a voting disk. Perform this operation on every voting disk as needed where voting_disk_name
is the name of the active voting disk and backup_file_name
is the name of the file to which you want to back up the voting disk contents:
dd if=voting_disk_name of=backup_file_name
Note: You can use theocopy command in Windows environments or use the crsctl commands described in the following note. |
Run the following command to recover a voting disk where backup_file_name
is the name of the voting disk backup file and voting_disk_name
is the name of the active voting disk:
dd if=backup_file_name of=voting_disk_name
Note: If you have multiple voting disks, then you can remove the voting disks and add them back into your environment using thecrsctl delete css votedisk path and crsctl add css votedisk path commands respectively, where path is the complete path of the location on which the voting disk resides. |
You can dynamically add and remove voting disks after installing Real Application Clusters. Do this using the following commands where path
is the fully qualified path for the additional voting disk. Run the following command as the root
user to add a voting disk:
crsctl add css votedisk path
Run the following command as the root
user to remove a voting disk:
crsctl delete css votedisk path
Note: If your cluster is down, then you can use-force option to modify the voting disk configuration with either of these commands without interacting with active Oracle Clusterware daemons. However, using the -force option while any cluster node is active may corrupt your configuration. |
This section describes how to administer the OCR. The OCR contains information about the cluster node list, instance-to-node mapping information, and information about Oracle Clusterware resource profiles for applications that you have customized as described in Chapter 14, "Making Applications Highly Available Using Oracle Clusterware". The procedures discussed in this section are:
Managing Backups and Recovering the OCR Using OCR Backup Files
Diagnosing OCR Problems with the OCRDUMP and OCRCHECK Utilities
Overriding the Oracle Cluster Registry Data Loss Protection Mechanism
Implementing the Oracle Hardware Assisted Resilient Data Initiative for the OCR
Upgrading and Downgrading the OCR Configuration in Real Application Clusters
The Oracle installation process for RAC gives you the option of automatically mirroring the OCR. This creates a second OCR to duplicate the original OCR. You can put the mirrored OCR on an Oracle cluster file system disk, on a shared raw device, or on a shared raw logical volume.
You can also manually mirror the OCR if you:
Upgraded to release 10.2 but did not choose to mirror the OCR during the upgrade
Created only one OCR during the Oracle Clusterware installation
Note: Oracle strongly recommends that you use mirrored OCRs if the underlying storage is not RAID. This prevents the OCR from becoming a single point of failure. |
In addition to mirroring the OCR, you can also replace the OCR if Oracle displays an OCR failure alert in Enterprise Manager or in the Oracle Clusterware alert log file. You can also repair an OCR location if there is a mis-configuration or other type of OCR error. In addition, you can remove an OCR location if, for example, your system experiences a performance degradation due to OCR processing or if you transfer your OCR to RAID storage devices and chose to no longer use multiple OCRs. Use the following procedures to perform these tasks:
Note: The operations in this section affect the OCR cluster-wide: they change the OCR configuration information in theocr.loc file on UNIX-based systems and the Registry keys on Windows-based systems. However, the ocrconfig command cannot modify OCR configuration information for nodes that are shut down or for nodes on which Oracle Clusterware is not running. |
You can add an OCR location after an upgrade or after completing the RAC installation. If you already mirror the OCR, then you do not need to add an OCR location; Oracle automatically manages two OCRs when it mirrors the OCR. RAC environments do not support more than two OCRs, a primary OCR and a second OCR.
Note: If your OCR resides on a cluster file system file or if the OCR is on an network file system, then create the target OCR file before performing the procedures in this section. |
Run the following command to add an OCR location using either destination_file
or disk
to designate the target location of the additional OCR:
ocrconfig -replace ocr destination_file or disk
Run the following command to add an OCR mirror location using either destination_file
or disk
to designate the target location of the additional OCR:
ocrconfig -replace ocrmirror destination_file or disk
Note: You must beroot user to run ocrconfig commands. |
You can replace a mirrored OCR using the following procedure as long as one OCR-designated file remains online:
Verify that the OCR that you are not going to replace is online.
Verify that Oracle Clusterware is running on the node on which the you are going to perform the replace operation.
Note: The OCR that you are replacing can be either online or offline. In addition, if your OCR resides on a cluster file system file or if the OCR is on an network file system, then create the target OCR file before continuing with this procedure. |
Run the following command to replace the OCR using either destination_file
or disk
to indicate the target OCR:
ocrconfig -replace ocr destination_file or disk
Run the following command to replace an OCR mirror location using either destination_file
or disk
to indicate the target OCR:
ocrconfig -replace ocrmirror destination_file or disk
If any node that is part of your current RAC environment is shut down, then run the command ocrconfig -repair
on any node that is stopped to enable that node to rejoin the cluster after you restart the stopped node.
You may need to repair an OCR configuration on a particular node if your OCR configuration changes while that node is stopped. For example, you may need to repair the OCR on a node that was not up while you were adding, replacing, or removing an OCR. To repair an OCR configuration, run the following command on the node on which you have stopped the Oracle Clusterware daemon:
ocrconfig –repair ocrmirror device_name
This operation only changes the OCR configuration on the node from which you run this command. For example, if the OCR mirror device name is /dev/raw1
, then use the command syntax ocrconfig -repair ocrmirror /dev/raw1
on this node to repair its OCR configuration.
Note: You cannot perform this operation on a node on which the Oracle Clusterware daemon is running. |
To remove an OCR location, at least one other OCR must be online. You can remove an OCR location to reduce OCR-related overhead or to stop mirroring your OCR because you moved your the OCR to redundant storage such as RAID. Perform the following procedure to remove an OCR location from your RAC environment:
Ensure that at least one OCR other than the OCR that you are removing is online.
Caution: Do not perform this OCR removal procedure unless there is at least one other active OCR online. |
Run the following command on any node in the cluster to remove the OCR:
ocrconfig -replace ocr
Run the following command on any node in the cluster to remove the mirrored OCR:
ocrconfig -replace ocrmirror
These commands update the OCR configuration on all of the nodes on which Oracle Clusterware is running.
See Also: You can also use the-backuploc option to move the OCR to another location as described in Appendix D, " Oracle Cluster Registry Configuration Tool Command Syntax" |
Note: When removing an OCR location, the remaining OCR must be online. If you remove a primary OCR, then the mirrored OCR becomes the primary OCR. |
This section describes two methods for copying OCR content and using it for recovery. The first method uses automatically generated OCR file copies and the second method uses manually created OCR export files.
The Oracle Clusterware automatically creates OCR backups every four hours. At any one time, Oracle always retains the last three backup copies of the OCR. The CRSD process that creates the backups also creates and retains an OCR backup for each full day and at the end of each week.
You cannot customize the backup frequencies or the number of files that Oracle retains. You can use any backup software to copy the automatically generated backup files at least once daily to a different device from where the primary OCR resides.
The default location for generating backups on UNIX-based systems is CRS_home
/cdata/
cluster_name
where cluster_name
is the name of your cluster. The Windows-based default location for generating backups uses the same path structure.
Note: You must beroot user to run ocrconfig commands. |
If an application fails, then before attempting to restore the OCR, restart the application. As a definitive verification that the OCR failed, run an ocrcheck
and if the command returns a failure message, then both the primary OCR and the OCR mirror have failed. Attempt to correct the problem using one of the following platform-specific OCR restoration procedures.
Note: You cannot restore your configuration from an automatically created OCR backup file using the -import option, which is explained in "Administering the Oracle Cluster Registry with OCR Exports". You must instead use the -restore option as described in the following sections. |
Use the following procedure to restore the OCR on UNIX-based systems:
Identify the OCR backups using the ocrconfig -showbackup
command. Review the contents of the backup using ocrdump -backupfile
file_name
where file_name
is the name of the backup file.
Stop Oracle Clusterware on all of the nodes in your RAC database by executing the init.crs stop
command on all of the nodes.
Perform the restore by applying an OCR backup file that you identified in Step 1 using the following command where file_name
is the name of the OCR that you want to restore. Make sure that the OCR devices that you specify in the OCR configuration exist and that these OCR devices are valid before running this command.
ocrconfig -restore file_name
Restart Oracle Clusterware on all of the nodes in your cluster by restarting each node or by running the init.crs start
command.
Run the following command to verify the OCR integrity where the -n all
argument retrieves a listing of all of the cluster nodes that are configured as part of your cluster:
cluvfy comp ocr -n all [-verbose]
Use the following procedure to restore the OCR on Windows-based systems:
Identify the OCR backups using the ocrconfig -showbackup
command. Review the contents of the backup using ocrdump -backupfile
file_name
where file_name
is the name of the backup file.
On all of the remaining nodes, disable the following OCR clients and stop them using the Service Control Panel: OracleClusterVolumeService
, OracleCSService
, OracleCRService
, and the OracleEVMService
.
Execute the restore by applying an OCR backup file that you identified in Step 1 with the ocrconfig
-restore
file name
command. Make sure that the OCR devices that you specify in the OCR configuration exist and that these OCR devices are valid.
Start all of the services that were stopped in step 2. Restart all of the nodes and resume operations in cluster mode.
Run the following command to verify the OCR integrity where the -n all
argument retrieves a listing of all of the cluster nodes that are configured as part of your cluster:
cluvfy comp ocr -n all [-verbose]
See Also: "Cluster Verification Utility Oracle Clusterware Component Verifications" for more information about enabling and using CVU |
You can use the OCRDUMP and OCRCHECK utilities to diagnose OCR problems as described under the following topics:
Use the OCRDUMP utility to write the OCR contents to a file so that you can examine the OCR content.
The OCR has a mechanism that prevents data loss due to accidental overwrites. If you configure a mirrored OCR and if the OCR cannot access the two mirrored OCR locations and also cannot verify that the available OCR contains the most recent configuration, then the OCR prevents further modification to the available OCR. The OCR prevents overwriting by prohibiting Oracle Clusterware from starting on the node on which the OCR resides. In such cases, Oracle displays an alert message in either Enterprise Manager, the Oracle Clusterware alert log files, or both.
Sometimes this problem is local to only one node and you can use other nodes to start your cluster database. In such cases, Oracle displays an alert message in Enterprise Manager, the Oracle Clusterware alert log, or both.
However, if you are unable to start any cluster nodes in your environment and if you cannot repair the OCR, then you can override the protection mechanism. This procedure enables you to start the cluster using the available OCR, thus enabling you to use the updated OCR file to start your cluster. However, this can result in the loss of data that was not available at the time that the previous known good state was created.
Note: Overriding the OCR using this procedure can result in the loss of OCR updates that were made between the time of the last known good OCR update made to the currently-accessible OCR and the time at which you performed this procedure. In other words, running theocrconfig -overwrite command can result in data loss if the OCR that you are using to perform the overwrite does not contain the latest configuration updates for your cluster environment. |
Perform the following procedure to overwrite the OCR if a node cannot start up and if the alert log contains a a CLSD-1009 or CLSD-1011 message.
Attempt to resolve the cause of the a CLSD-1009 or CLSD-1011 message. Do this by comparing the node's OCR configuration (ocr.loc
on Unix-based systems and the Registry on Windows-based systems) with other nodes on which Oracle Clusterware is running. If the configurations do not match, then run ocrconfig -repair
. If the configurations match, then ensure that the node can access all of the configured OCRs by running an ls
command on Unix-based systems or a dir
command on Windows-based systems. Oracle issues a warning when one of the configured OCR locations is not available or if the configuration is incorrect.
Ensure that the most recent OCR contains the latest OCR updates. Do this by taking output from the ocrdump
command and determine whether it has your latest updates.
If you cannot resolve the CLSD message, then run the command ocrconfig -overwrite
to bring up the node.
In addition to using the automatically created OCR backup files, you should also export the OCR contents before and after making significant configuration changes, such as adding or deleting nodes from your environment, modifying Oracle Clusterware resources, or creating a database. Do this by using the ocrconfig
-export
command. This exports the OCR content to a file format.
Using the ocrconfig -export
command enables you to restore the OCR using the -import
option if your configuration changes cause errors. For example, if you have unresolvable configuration problems, or if you are unable to restart your clusterware after such changed, then restore your configuration using one of the following platform-specific procedures:
Importing Oracle Cluster Registry Content on UNIX-Based Systems
Importing Oracle Cluster Registry Content on Windows-Based Systems
Note: Most configuration changes that you make not only change the OCR contents, configuration changes also cause file and database object creation. Some of these changes are often not restored when you restore the OCR. Do not perform an OCR restore as a correction to revert to previous configurations if some of these configuration changes should fail. This may result in an OCR that has contents that do not match the state of the rest of your system. |
Use the following procedure to import the OCR on UNIX-based systems:
Identify the OCR export file that you want to import by identifying the OCR export file that you previously created using the ocrconfig -export file_name
command.
Stop Oracle Clusterware on all of the nodes in your RAC database by executing the init.crs stop
command on all of the nodes.
Perform the import by applying an OCR export file that you identified in Step 1 using the following command where file_name
is the name of the OCR file from which you want to import OCR information:
ocrconfig -import file_name
Restart Oracle Clusterware on all of the nodes in your cluster by restarting each node.
Run the following Cluster Verification Utility (CVU) command to verify the OCR integrity where the -n all
argument retrieves a listing of all of the cluster nodes that are configured as part of your cluster:
cluvfy comp ocr -n all [-verbose]
Note: You cannot import an exported OCR backup file, which is described in "Managing Backups and Recovering the OCR Using OCR Backup Files". You must instead use the-import option as described in the following sections. |
Use the following procedure to import the OCR on Windows-based systems:
Identify the OCR export file that you want to import by running the ocrconfig -showbackup
command. .
Stop the following OCR clients on each node in your RAC environment using the Service Control Panel: OracleClusterVolumeService, OracleCMService, OracleEVMService, OracleCSService, and the OracleCRService.
Import an OCR export file using the ocrconfig
-import
command from one node.
Restart all of the affected services on all nodes.
Run the following Cluster Verification Utility (CVU) command to verify the OCR integrity where node_list
is a list of all of the nodes in your cluster database:
cluvfy comp ocr -n all [-verbose]
The Oracle Hardware Assisted Resilient Data (HARD) initiative prevents data corruptions from being written to permanent storage. If you enable HARD, then the OCR writes HARD-compatible blocks. To determine whether the device used by the OCR supports HARD and then enable it, review the Oracle HARD white paper at:
http://www.oracle.com/technology/deploy/availability/htdocs/HARD.html
When you install Oracle Clusterware, Oracle automatically runs the ocrconfig
-upgrade
command. To downgrade, follow the downgrade instructions for each component and also downgrade the OCR using the ocrconfig
-downgrade
command. If you are upgrading the OCR to Oracle Database 10g release 10.2, then you can use the cluvfy
command to verify the integrity of the OCR. If you are downgrading, you cannot use the Cluster Verification Utility (CVU) commands to verify the OCR for pre-10.2 release formats.
In Oracle9i, the OCR did not write HARD-compatible blocks. If the device used by OCR is enabled for HARD, then use the method described in the HARD white paper to disable HARD for the OCR before downgrading your OCR. If you do not disable HARD, then the downgrade operation fails.
In RAC environments that run on UNIX-based platforms, you can use the CLUSTER_INTERCONNECTS
initialization parameter to specify an alternative interconnect for the private network.
The CLUSTER_INTERCONNECTS
initialization parameter requires the IP address of the interconnect instead of the device name. It enables you to specify multiple IP addresses, separated by colons. RAC network traffic is distributed between the specified IP addresses.
The CLUSTER_INTERCONNECTS
initialization parameter is useful only in a UNIX-based environments where UDP IPC is enabled. The CLUSTER_INTERCONNECTS
parameter enables you to specify an interconnect for all IPC traffic to include Oracle Global Cache Service (GCS), Global Enqueue Service (GES), and Interprocessor Parallel Query (IPQ).
Overall cluster stability and performance may improve when you force Oracle GCS, GES, and IPQ over a different interconnect by setting the CLUSTER_INTERCONNECTS
initialization parameter. For example, to use the network interface whose IP address is 129.34.137.212 for all GCS, GES, and IPQ IPC traffic, set the CLUSTER_INTERCONNECTS
parameter as follows:
CLUSTER_INTERCONNECTS=129.34.137.212
Use the ifconfig
or netstat
command to display the IP address of a device. This command provides a map between device names and IP addresses. For example, to determine the IP address of a device, run the following command as the root
user:
# /usr/sbin/ifconfig -a fta0: flags=c63<UP,BROADCAST,NOTRAILERS,RUNNING,MULTICAST,SIMPLEX> inet 129.34.137.212 netmask fffffc00 broadcast 129.34.139.255 ipmtu 1500 lo0: flags=100c89<UP,LOOPBACK,NOARP,MULTICAST,SIMPLEX,NOCHECKSUM> inet 127.0.0.1 netmask ff000000 ipmtu 4096 ics0: flags=1100063<UP,BROADCAST,NOTRAILERS,RUNNING,NOCHECKSUM,CLUIF> inet 10.0.0.1 netmask ffffff00 broadcast 10.0.0.255 ipmtu 7000 sl0: flags=10<POINTOPOINT> tun0: flags=80<NOARP>
In the preceding example, the interface fta0:
has an IP address of 129.34.137.212 and the interface ics0:
has an IP address of 10.0.0.1.
Bear in mind the following important points when using the CLUSTER_INTERCONNECTS
initialization parameter:
The IP addresses specified for the different instances of the same database on different nodes must belong to network adapters that connect to the same network. If you do not follow this rule, then inter-node traffic may pass through bridges and routers or there may not be a path between the two nodes at all.
Specify the CLUSTER_INTERCONNECTS
initialization parameter in the parameter file, setting a different value for each database instance.
If you specify multiple IP addresses for this parameter, then list them in the same order for all instances of the same database. For example, if the parameter for instance 1 on node 1 lists the IP addresses of the alt0:
, fta0:
, and ics0:
devices in that order, then the parameter for instance 2 on node 2 must list the IP addresses of the equivalent network adapters in the same order.
If the interconnect IP address specified is incorrect or does not exist on the system, then Oracle Database uses the default cluster interconnect device. On Tru64 UNIX for example, the default device is ics0:
.
Some operating systems support run-time failover and failback. However, if you use the CLUSTER_INTERCONNECTS
initialization parameter, then failover and failback are disabled.
Note: Failover and failback andCLUSTER_INTERCONNECTS are not supported on AIX systems. |