2 Oracle Application Server High Availability Framework

Whereas Chapter 1 provided an overview of high availability in general, this chapter introduces you to the specific sets of features, services, and environments that Oracle Application Server provides to ensure high availability for all its components and services. It contains the following sections:

Section 2.1, "Redundant Architectures"
Section 2.2, "High Availability Services in Oracle Application Server"

2.1 Redundant Architectures

Oracle Application Server provides redundancy by offering support for multiple instances supporting the same workload. These redundant configurations provide increased availability either through a distributed workload, through a failover setup, or both.

From the entry point to an Oracle Application Server system (content cache) to the back end layer (data sources), all the tiers that are crossed by a request can be configured in a redundant manner with Oracle Application Server. The configuration can be an active-active configuration using OracleAS Cluster or an active-passive configuration using OracleAS Cold Failover Cluster.

In the following sections, we describe the basics of these configurations:

Section 2.1.1, "Oracle Application Server Active-Active Configurations: Oracle Application Server Clusters"
Section 2.1.2, "Oracle Application Server Active-Passive Configurations: Oracle Application Server Cold Failover Clusters"

2.1.1 Oracle Application Server Active-Active Configurations: Oracle Application Server Clusters

Oracle Application Server provides an active-active redundant model for all its components with OracleAS Clusters. In an OracleAS Cluster, two or more Oracle Application Server instances are configured to serve the same application workload. These instances can reside on the same machine or on different machines.The active instances may be front-ended by an external load balancer, which can redirect requests to any of the active instances, or by some other application-level configuration, such as address lists, to distribute the requests.

The most common properties of an OracleAS Cluster configuration include:

Identical instance configuration

The instances are meant to serve the same workload or application. Their configuration guarantees that they deliver the same exact reply to the same request. Some configuration properties may be identical and others may be instance-specific, such as local host name information.
Managed collectively

Changes made to one system will usually need to be propagated to the other systems in an active-active configuration.
Operate independently

In order to provide maximum availability, the loss of one Oracle Application Server instance in an active-active configuration should not affect the ability of the other instances to continue to serve requests.

The advantages of an OracleAS Cluster configuration include:

Increased availability

An active-active configuration is a redundant configuration. Loss of one instance can be tolerated because other instance can continue to serve the same requests.
Increased scalability and performance

Multiple identically-configured instances provide the capability to have a distributed workload shared among different machines and processes. If configured correctly, new instances can also be added as the demand of the application grows.

In general, the term OracleAS Cluster describes clustering at the Oracle Application Server instance level. However, if it is necessary to call out the specific type of instances being clustered, this document will use OracleAS Cluster (type) to characterize the cluster solution. For example:

two or more J2EE instances are known as OracleAS Cluster (J2EE)
two or more OracleAS Portal instances are known as OracleAS Cluster (Portal)
two or more Oracle Identity Management instances are known as OracleAS Cluster (Identity Management)

2.1.2 Oracle Application Server Active-Passive Configurations: Oracle Application Server Cold Failover Clusters

Oracle Application Server provides an active-passive model for all its components using OracleAS Cold Failover Clusters. In an OracleAS Cold Failover Cluster configuration, two or more application server instances are configured to serve the same application workload but only one is active at any particular time. These instances can reside on the same machine or on different machines.

The most common properties of an OracleAS Cold Failover Cluster configuration include:

Shared storage

The passive Oracle Application Server instance in an active-passive configuration has access to the same Oracle binaries, configuration files, and data as the active instance.
Virtual hostname

During OracleAS Infrastructure installation, you can specify a virtual hostname in the Specify Virtual Hostname screen. This OracleAS Infrastructure virtual hostname can be managed by a hardware cluster or a load balancer and is used by the middle-tier and OracleAS Infrastructure components to access the OracleAS Infrastructure. This is regardless of whether the OracleAS Infrastructure is in a single node installation, in the OracleAS Cold Failover Cluster solution, or in the OracleAS Cluster solution.

The virtual hostname is the hostname associated with the virtual IP. This is the name that is chosen to give the Oracle Application Server middle-tier a single system view of the OracleAS Infrastructure with the help of a hardware cluster or load balancer. This name-IP entry must be added to the DNS that the site uses, so that the middle-tier nodes can associate with the OracleAS Infrastructure without having to add this entry into their local /etc/hosts (or equivalent) file. For example, if the two physical hostnames of the hardware cluster are node1.mycompany.com and node2.mycompany.com, the single view of this cluster can be provided by the name selfservice.mycompany.com. In the DNS, selfservice maps to the virtual IP address of the OracleAS Infrastructure, which either floats between node1 and node2 via a hardware cluster or maps to node1 and node2 by a load balancer, all without the middle tier knowing which physical node is active and actually servicing a particular request.

See Also:
Section 1.2.2, "Oracle Application Server Base Architecture"

You cannot specify a virtual hostname during Oracle Application Server middle-tier installation, but you can still use a virtual hostname via a hardware cluster or load balancer by following the post-installation configuration steps for cold failover cluster middle tiers. See the Oracle Application Server Installation Guide.
Failover procedure

An active-passive configuration also includes a set of scripts and procedures to detect failure of the Active instance and to failover to the Passive instance while minimizing downtime.

The advantages of an OracleAS Cold Failover Cluster configuration include:

Increased availability

If the active instance fails for any reason or must be taken offline, an identically configured passive instance is prepared to take over at any time.
Reduced operating costs

In an active-passive configuration only one set of processes is up and serving requests. Management of the active instance is generally less than managing an array of active instances.
Application independence

Some applications may not be suited to an active-active configuration. This may include applications which rely heavily on application state or on information stored locally. An active-passive configuration has only one instance serving requests at any particular time.

In general, the term OracleAS Cold Failover Cluster describes clustering at the Oracle Application Server instance level. However, if it is necessary to call out the specific type of instances being clustered, this document will use OracleAS Cold Failover Cluster (type) to characterize the cluster solution. For example

OracleAS Cold Failover Cluster (Identity Management)
OracleAS Cold Failover Cluster (Middle-Tier)

From the entry point of an Oracle Application Server system (content cache) to the back end layer (data sources), all the tiers that are crossed by a client request can be configured in a redundant manner either in an active-active configuration using OracleAS Clusters or in an active-passive configuration using OracleAS Cold Failover Clusters.

2.2 High Availability Services in Oracle Application Server

Oracle Application Server provides different features and topologies to support high availability across its stack. This includes solutions that extend across both the OracleAS middle-tier and the OracleAS Infrastructure tier.

This section describes the following high availability services in Oracle Application Server:

Section 2.2.1, "Process Death Detection and Automatic Restart"
Section 2.2.2, "Configuration Management"
Section 2.2.3, "State Replication"
Section 2.2.4, "Server Load Balancing and Failover"
Section 2.2.5, "Backup and recovery"
Section 2.2.6, "Disaster Recovery"

2.2.1 Process Death Detection and Automatic Restart

An Oracle Application Server instance consists of many different running processes to serve client requests. Ensuring high availability means ensuring that all these processes run smoothly, fulfill requests, and do not experience any unexpected hangs or failures.

The interdependency of these processes must also be managed so that they are brought up in the proper sequence, with processes starting up only after the processes that they are dependent on have started successfully.

Oracle Application Server provides high availability and management services at the process level with Oracle Process Manager and Notification Server (OPMN)

2.2.1.1 Process Management with Oracle Process Manager and Notification Server

OPMN has the following capabilities:

Provides automatic death detection of Oracle Application Server processes.
Provides an integrated way to operate Oracle Application Server components.
Provides automatic restart of Oracle Application Server processes when they become unresponsive, terminate unexpectedly, or become unreachable as determined by ping and notification operations.
Channels all events from different Oracle Application Server component instances to all Oracle Application Server components that can utilize them.
Enables gathering of host and Oracle Application Server process statistics and tasks.
Does not depend on any other Oracle Application Server component being up and running before it can be started and used.

2.2.1.1.1 Automated Process Management with OPMN

OPMN can be used to explicitly manage the following Oracle Application Server processes:

Oracle HTTP Server
Oracle Application Server Containers for J2EE
Distributed Configuration Management daemon
OracleAS Log Loader
OracleAS Guard (for disaster recovery)
Oracle Internet Directory
OracleAS Port Tunnel
OracleAS Web Cache
Oracle Business Intelligence Discoverer
OracleAS Wireless

In addition, OPMN implicitly manages any applications that rely on the above components. For example, any J2EE applications that run under OC4J are managed by OPMN.

OPMN is also extensible, providing the capability to add information about custom processes including load environment information, stopping procedures, and methods for death detection and restart.

2.2.1.1.2 Distributed Process Control with OPMN

Although OPMN can manage processes on a local Oracle Application Server instance, OPMN daemons running on different instances can also work together to provide distributed process management and control.

For example, a command issued on one machine can be used to start all processes or a specific process type across all local and remote Oracle Application Server instances.

OPMN consists of two major components:

Oracle Notification Server (ONS)

The ONS is the transport mechanism for failure, recovery, startup, and other related notifications between components in Oracle Application Server. It operates according to a publish-subscribe model: an Oracle Application Server component receives a notification of a certain type per its subscription to ONS. When such a notification is published, ONS sends it to the appropriate subscribers.
Oracle Process Manager (PM)

The PM is the centralized process management mechanism in Oracle Application Server and is used to manage Oracle Application Server processes. It is responsible for starting, restarting, stopping, and monitoring every process it manages. The PM handles all requests sent to OPMN associated with controlling a process or obtaining status about a process. The PM is also responsible for performing death-detection and automatic restart of the processes it manages. The Oracle Application Server processes that the PM is configured to manage are specified in a file named opmn.xml. The PM waits for a user command to start specific or all processes. When a specific process or all processes are to be stopped, the PM receives a request as specified by the request parameters.

2.2.2 Configuration Management

Managing and ensuring component high availability involves not only managing processes but also the configuration information for those processes both locally and across a set of Oracle Application Server instances.

2.2.2.1 Configuration Management with Distributed Configuration Management

Distributed Configuration Management (DCM) is a management framework that enables you to create and manage multiple Oracle Application Server instances as one. Multiple instances enable Oracle Application Server to handle large volumes of traffic reliably since the workload is distributed among the instances.

DCM enables you to:

keep a configuration synchronized across multiple Oracle Application Server instances
archive and restore versions of configurations
export and import configurations between Oracle Application Server instances and clusters

DCM enables you to archive, import and export, and synchronize the configurations of multiple OracleAS instances as if they were a single Oracle Application Server instance. To provide this management functionality, DCM keeps information about an Oracle Application Server instance's configuration in either a file-based or an Oracle database-based repository known as the OracleAS Metadata Repository.

The OracleAS Metadata Repository contains:

configuration files for Oracle HTTP Server, OC4J, OPMN, and OracleAS JAAS Provider components
deployed J2EE applications
information about the OPMN instance or OracleAS Cluster

2.2.2.1.1 Configuration Synchronization and Management with DCM

With DCM, you can manage configuration information for the following Oracle Application Server components and applications:

Oracle HTTP Server
Oracle Application Server Containers for J2EE
Oracle Process Manager and Notification Server
Oracle Application Server Java Authentication and Authorization Service (JAAS) Provider
J2EE applications

The configuration information for each of these components is stored in the metadata repository for each OracleAS instance. Once an OracleAS instance is managed by DCM, configuration information can then be:

archived for future use
restored locally from a previous archive
replicated to another OracleAS instance to provide configuration synchronization across a cluster of OracleAS instances

2.2.2.1.2 Distributed Application Deployment with DCM

Oracle's Distributed Configuration Management tool, dcmctl, enables synchronization of configuration information across a cluster of OracleAS instances. This includes the ability to deploy new J2EE applications on one instance of the cluster and then have the same application automatically deployed by each member of the cluster.

Once an application has been deployed in this way, any instance in the cluster can then receive and serve requests for that application.

2.2.3 State Replication

One of the advantages of a distributed application is the ability to set up multiple redundant processes that can all serve the same requests. In the event that one of these application processes becomes unavailable, another application process can service the request.

Some applications may require Oracle Application Server to maintain stateful information across consecutive requests. In order to provide transparent failover of these requests, it is necessary to recreate this application state across multiple processes. Oracle Application Server enables the replication of state in J2EE applications through OracleAS Cluster (OC4J). In an OracleAS Cluster (OC4J), several processes work together to deliver the same J2EE application and replicate the state created by it. This enables the transparent failover of requests between the participants in the cluster. Two different types of state are typically maintained in a J2EE application: HTTP session state (updated by servlets and JSPs) and stateful session EJB state (updated by stateful session EJB instances). OracleAS Cluster (OC4J) enables the replication of both.

See Also:

The OC4J Clustering chapter in the Oracle Application Server Containers for J2EE User's Guide.

2.2.4 Server Load Balancing and Failover

Load balancing involves the ability to distribute requests among two or more processes.

Features of a software or hardware external load balancer includes:

load balancing algorithm

A rule or set of rules for how to allocate requests across the different instances. The most common load balancing algorithms include simple round-robin or assignment based on some weighted property of the instance such as the response time or capacity of that instance relative to other instances.
death detection

The ability to recognize failed requests to one or more instances, and additionally, the ability to mark those instances as inactive so that no further requests will be forwarded to them.

2.2.4.1 Internal Load Balancing Mechanism Provided in Oracle Application Server

Different load balancing mechanisms are provided to communicate the components in an Oracle Application Server system. Load balancing takes place:

from Oracle Application Server Web Cache to Oracle HTTP Servers
from Oracle HTTP Servers to OC4J processes for J2EE applications
from Oracle HTTP Servers to the database for PLSQL applications
intra OC4J processes from the presentation layer components (servlets and JSPs) to the business layer components (EJBs)
from OC4J processes to databases

All sub-tiers in Oracle Application Server are enabled to manage failures in the connections that they establish with the next tier as follows:

Connections established from OracleAS Web Cache to Oracle HTTP Servers: OracleAS Web Cache detects failures in the replies returned by Oracle HTTP Servers and routes the new requests to the available Oracle HTTP Servers.
Connections established from Oracle HTTP Servers to OC4J processes: Oracle HTTP Server maintains a routing table of available OC4J processes and routes new requests only to those OC4J processes that are up an running.
Connections established from Oracle HTTP Servers to databases: mod_plsql detects failures in the database and routes requests to the available database nodes.
Connections established between OC4J processes: OC4J detects failures in the RMI invocations to the EJB tier and fails communication over to available EJB nodes.
Connections established between OC4J processes and databases: OC4J drivers are enabled to detect failures of database nodes and re-route requests to available nodes.

2.2.4.2 External Load Balancers

To load balance requests among many Oracle Application Server instances in an active-active configuration, Oracle recommends the use of an external load balancer.

When several Oracle Application Server instances are grouped to work together, they present themselves as a single virtual entry point to the system, which hides the multiple instance configuration. External load balancers can send requests to any application server instance in a cluster, as any instance can service any request. An administrator can raise the capacity of the system by introducing additional application server instances. These instances can be installed on separate nodes to allow for redundancy in case of node failure.

There are different types of external load balancers you can use with Oracle Application Server instances. Table 2-1 summarizes the different types.

Table 2-1 Types of External Load Balancers

Load Balancer Type	Description
Hardware load balancer	Hardware load balancing involves placing a hardware load balancer in front of a group of Oracle Application Server instances or OracleAS Web Cache. The hardware load balancer routes requests to the Oracle HTTP Server or OracleAS Web Cache instances in a client-transparent fashion.
Software load balancer	Software load balancer involves using some process that intercepts the different calls to an application server and routes those requests to redundant components.
Lvs network load balancer for Linux	With some Linux operating systems, you can use the operating system to perform network load balancing.
Windows Network Load Balancer (applicable to Windows version of Oracle Application Server)	With some Windows operating systems, you can use the operating system to perform network load balancing. For example, with Microsoft Advanced Server, the NLB functionality enables you to send requests to different machines that share the same virtual IP or MAC address. The servers themselves to do not need to be clustered at the operating system level.

External Load Balancer Requirements

Oracle does not provide external load balancers. You can get external load balancers from other companies.

To ensure that your external load balancer can work with Oracle Application Server, check that your external load balancer meets the requirements listed in Table 2-2.

Note that you may not need all the requirements listed in the table. The requirements for an external load balancer depend on the topology being considered, and on the Oracle Application Server components that are being load balanced.

Table 2-2 External Load Balancer Requirements

External Load Balancer Requirement	Description
Virtual servers and port configuration	A virtual server is a logical address created in a load balancer. The virtual server maps to a group of resources that are load balanced for a request. You need to be able to create virtual server names and ports on your load balancer, and the virtual server names and ports must meet the following requirements: The load balancer should allow configuration of multiple virtual servers. For each virtual server, the load balancer should allow configuration of traffic management on more than one port. For example, for OracleAS Cluster (Identity Management), the load balancer needs to be configured with a virtual server and port for HTTP / HTTPS traffic, and separate virtual servers and ports for LDAP and LDAPS traffic. The virtual server names must be associated with IP addresses and be part of your DNS. Clients must be able to access the external load balancer through the virtual server names.
Persistence / stickiness	Persistence (sometimes called stickiness) refers to the load balancer's ability to establish an identifier for a connection and, based on that identifier, route all subsequent connections from the same client to the same destination host. Some components of Oracle Application Server use persistence or stickiness in an external load balancer. Here are some examples: For Oracle Delegated Administration Services, you need to configure cookie persistence on the external load balancer for HTTP traffic. Specifically, you need to set up cookie persistence for URIs starting with `/oiddas/`. This is the URI for Oracle Delegated Administration Services. Cookie-based persistence is highly recommended. If your external load balancer does not allow you to set cookie persistence at the URI level, then set the cookie persistence for all HTTP traffic. In either case, set the cookie to expire when the browser session expires. Refer to your external load balancer documentation for details. For Oracle Internet Directory, do not set a persistence setting for the external load balancer. For OracleAS Single Sign-On, a persistence setting is not required. However, you may set a persistence or stickiness compatible with Oracle HTTP Server. For OracleAS Portal, enable cookie-based persistence for OracleAS Web Cache. For Reports Server, persistence setting may be needed in certain cases. See Section 5.4.3, "OracleAS Reports Services in Active-Active Configurations" for details.
Resource monitoring / port monitoring / process failure detection	You need to set up the external load balancer to detect service and node failures (through notification or some other means) and to stop directing non-Oracle Net traffic to the failed node. If your external load balancer has the ability to automatically detect failures, you should use it. For example, for OracleAS Cluster (Identity Management), specific components that the external load balancer should monitor are Oracle Internet Directory, OracleAS Single Sign-On, and Oracle Delegated Administration Services. To monitor these components, set up monitors for the following protocols: LDAP and LDAPS listen ports HTTP and HTTPS listen ports (depending on the deployment type) These monitors should use the respective protocols to monitor the services. That is, use LDAP for the LDAP port, LDAP over SSL for the LDAP SSL port, and HTTP/HTTPS for the Oracle HTTP Server port. If your external load balancer does not offer these monitors, consult your external load balancer documentation for the best method of setting up the external load balancer to automatically stop routing incoming requests to a service that is unavailable.
Network Address Translation (NAT)	The load balancer should have the capability to perform network address translation (NAT) for traffic being routed from clients to the Oracle Application Server nodes. This is specifically required for OracleAS Portal deployments, where the load balancer should allow enabling NAT for requests originating from within the OracleAS Portal node to the load balancer virtual server (for example, requests such as Parallel Page Engine (PPE) loopbacks and cache invalidation requests).
Fault tolerant mode	It is highly recommended that you configure the load balancer to be in fault-tolerant mode.
Other	It is highly recommended that you configure the load balancer virtual server to return immediately to the calling client when the backend services to which it forwards traffic are unavailable. This is preferred over the client disconnecting on its own after a timeout based on the TCP/IP settings on the client machine because the timeout may be set to a long period of time.

Figure 2-1 depicts an example deployment of a hardware load balancing router with Oracle Application Server.

Figure 2-1 Example load balancing router deployment with Oracle Application Server

Description of "Figure 2-1 Example load balancing router deployment with Oracle Application Server"

Load balancing improves scalability by providing an access point through which requests are routed to one of many available instances. Instances can be added to the group that the external load balancer serves to accommodate additional users.Load balancing improves availability by routing requests to the most available instances. If one instance goes down, or is particularly busy, the external load balancer can send requests to another active instance.

See Also:

Section 4.7, "Using OracleAS Single Sign-On with OracleAS Cluster (Middle-Tier)"

2.2.5 Backup and recovery

Protecting against the data loss of any system components is critical to maintaining a highly available environment. Regular, complete backups of all Oracle Application Server environment is recommended.

A complete Oracle Application Server environment backup includes:

A full backup of all files in the middle-tier Oracle homes (this includes Oracle software files and configuration files).
A full backup of all files in the OracleAS Infrastructure Oracle home (this includes Oracle software files and configuration files).
A complete cold backup of the OracleAS Metadata Repository.
A full backup of the Oracle system files on each host in your environment.

2.2.5.1 Oracle Application Server Backup and Recovery Tool

The most frequently changing critical files in an Oracle installation are configuration files and data files. Oracle provides the Oracle Application Server Backup and Recovery Tool (OracleAS Backup and Recovery Tool) to backup these configuration and data files.

The OracleAS Backup and Recovery Tool is a Perl script and associated configuration files. You can use this tool to backup and recover the following types of files:

configuration files in the middle-tier and OracleAS Infrastructure Oracle homes
OracleAS Metadata Repository files

The OracleAS Backup and Recovery Tool is installed by default whenever you install Oracle Application Server. The tool is installed in the $ORACLE_HOME/backup_restore directory.

The OracleAS Backup and Recovery Tool supports the following installation types:

J2EE and Web Cache
Portal and Wireless
OracleAS Infrastructure (Identity Management and Metadata Repository)
OracleAS Infrastructure (Identity Management only)
OracleAS Infrastructure (Metadata Repository only)
OracleAS TopLink (standalone or installed into a OracleAS middle-tier Oracle home)
Oracle Application Server Integration Business Activity Monitoring
Oracle Content Management Software Development Kit

2.2.6 Disaster Recovery

Disaster recovery refers to how a system can be recovered from catastrophic site failures caused by natural or unnatural disasters. Additionally, disaster recovery can also refer to how a system is managed for planned outages. For most disaster recovery situations, the solution involves replicating an entire site, not just pieces of hardware or subcomponents. This also applies to the Oracle Application Server Disaster Recovery (OracleAS Disaster Recovery) solution.

In the most common configuration, a standby site is created to mirror the production site. Under normal operation, the production site actively services client requests. The standby site is maintained to mirror the applications and content hosted by the production site.

2.2.6.1 Oracle Application Server Guard

OracleAS Guard automates the restoration of a production site on its corresponding standby site. To protect a complete Oracle Application Server environment from disasters, OracleAS Guard performs the following operations:

Instantiates the standby site: instantiates an Oracle Application Server standby farm that mirrors a primary farm.
Verifies configuration: verifies that a farm meets the requirements to be used as a standby farm for the corresponding primary farm.
Site synchronization: synchronizes the production and the standby sites.