13 Optimizing Oracle Application Server Wireless

This chapter, through the following sections, discusses factors that enable application developers to optimize Oracle Application Server Wireless.

Section 13.1, "Overview of OracleAS Wireless Optimization"
Section 13.2, "Transport Performance Monitoring"
Section 13.3, "Optimizing the Async Listener Performance"
Section 13.4, "Optimizing Data Feeder Performance"
Section 13.5, "Optimizing the Performance of the Oracle HTTP Server"
Section 13.6, "Optimizing the Oracle Process Management and Notification Service (OPMN)"
Section 13.7, "Optimizing the Database Connections"
Section 13.8, "Optimizing the Capacity of Webcache"
Section 13.9, "Optimizing JVM Performance"
Section 13.10, "Optimizing the OC4J_Wireless Server Instance"
Section 13.11, "Tuning the Performance of the Operating System"

13.1 Overview of OracleAS Wireless Optimization

Oracle Application Server Wireless, when installed, initializes a default setup that is appropriate for the performance of most applications. However, you may need to use additional tuning knobs to adjust performance, since applications vary in features, hardware setup, and performance requirements.

This chapter discusses the tuning options and methods available within Oracle Application Server Wireless and the performance logger utility. It also discusses JVM tuning, JDBC connection performance, and TCP/IP stack tuning.

Note:

Throughout the documentation, you can substitute UNIX for Solaris in all instances except for this chapter. The tuning knobs described in this chapter are Solaris-specific.

13.2 Transport Performance Monitoring

You can view the performance statistics of the Transport system from the Oracle Enterprise Manager 10g Application Server Control by first selecting the middle-tier node on the Farm page of the Oracle Enterprise Manager 10g Application Server Control and then by clicking Wireless on the Application Server Home page to access the OracleAS Wireless system management functions described in Chapter 3, "Managing the OracleAS Wireless Server". Click the Site Performance tab of the Wireless Home page. From the Component Performance Section, select Messaging Servers. The Messaging Server performance page appears (Figure 13-1).

Figure 13-1 The Messaging Server Performance Metrics

Description of "Figure 13-1 The Messaging Server Performance Metrics"

This page displays the client side and server side Messaging Server performance metrics. For each of the Messaging Server performance metrics, Wireless displays performance data by process name and delivery type (for example, SMS).

The client side performance metrics include:

Average Sending Response Time

The average time of a sending method. On the client side, a sending method is called to send a message. This time is the period from when the method is called to the time the method returns. When the method returns, the message is saved in a database persistently, but is not delivered.

Total Number of Sending Requests

The total time that the sending method is called by the client process. A sending method called once to send a message to a set of destinations counts as a single sending request.

Total Number of Sending Requests Sent

The total number of successful calls, where a message is delivered to a proper gateway and its receipt is acknowledged. The client process can call the sending method many times to send many messages. Some of these requests fail, as in the case where a destination cannot be reached. Other requests could be undergoing processing.

Total Number of Sending Requests Failed

The total number of all calls that are known to have failed.

Average Receiving Process Time

The performance of the listener in terms of the time taken by the onMessage call-back.

The server-side performance metrics include:

Average Sending Process Time

The performance of a driver in terms of the time taken by the sending method of the driver. The driver performance is measured by delivery type (for example, SMS), process time (the time taken by a driver to send a message to the proper gateway), dequeue time, and driver process time. When you measure the performance of the transport system, you can deduct the process time, because the transport system is waiting while the driver sends a message. If the driver is fast, then the system does not wait long.

Average Receiving Response Time

Once a transport driver receives a message, the message is passed to the transport system by an onMessage method. The response time is the time taken by the onMessage method. Once the onMessage returns, the received message is saved in a database for dispatching.

Total Number of Received Messages

The total number of times the transport drivers call the onMessage call-back method.

Total Number of Received Messages Dispatched

The total number of received messages which are dispatched to, and are accepted by, the listeners. Among received messages, some may be in processing. Others may not have been dispatched to listeners, or listeners may have failed to process dispatched messages.

Total Number of Received Messages Dispatch Failed

The total number of received messages which failed to dispatch to a listener.

For more information on the Site Performance tab, see Section 3.8.

13.2.1 Factors Affecting Transport Performance

This section describes the factors that affect the messaging server performance. Topics include:

Section 13.2.1.1, "The Sending and Receiving Threads of a Driver"
Section 13.2.1.2, "Messaging Server Client Threads"
Section 13.2.1.3, "JDBC Connection Pool"
Section 13.2.1.4, "Performance Monitor Process"
Section 13.2.1.5, "AQ Tuning"
Section 13.2.1.6, "Cleansing Messaging Server Tables"

13.2.1.1 The Sending and Receiving Threads of a Driver

The sending and receiving threads are the number of threads spawned by the Messaging Server to call a driver's send and receive methods.

Every delivery type has an associated driver queue in which the Messaging Server queues the messages that are to be sent using a specific delivery type. Every messaging client has an associated service queue in which the Messaging Server queues the messages that it receives from the gateway.

By increasing the number of sending threads, you also increase the number of threads that both dequeue the message from the driver queue and pass it to the send method that performs the send operation by submitting the message to the gateway. If the dequeue rate from the driver queue is low, then increasing the number of sending threads enhances performance.

Likewise, by increasing the number of receiving threads, you also increase the number of threads that both receive messages from the gateway and queue them to the service queue for a messaging client. The messaging client then receives messages from this queue. Increasing the number of working threads improves performance if the enqueue rate is low or if there are many pending messages in the gateway.

Tips:

To find the optimum number of sending threads, study the impact of varying the number of sending threads on the resource utilization of the database machine (mainly I/O and CPU) and the dequeue rate from the driver queue
To find the optimum number of receiving threads, study the impact of varying the number of receiving threads on the resource utilization of the database machine (mainly I/O and CPU) and the enqueue rate of the service queue.
Setting the number of sending threads for the driver between 7 and 10 and the number of receiving threads between 3 and 5 at average load yields decent performance from the Messaging Server.

13.2.1.1.1 Finding the Queue Table for a Driver's send Method

The name of the table that queues the messages for the driver to send is of the form trans_t_<queue_id>. The <queue id> for a delivery type is found in the trans_driver_queue table. For example, if the <queue id> for the SMS delivery type is 1, then all of the messages sent using the SMS delivery type are in the table, trans_t_1.

13.2.1.1.2 Finding the Queue Table for a Service or Client

The messaging server receives a message from the gateway and then queues it into AQ if any client or service has been registered for the message. All messages received by a driver for a particular service or client is queued in a the queue table of the form trans_t_<queue_id>. The <queue_ id> for a service name is found in the trans_service_queue table. For example, if the <queue_ id> for a service called ASYNCAGENT is 2, then all of the messages for this service are stored in the queue table called trans_t_2.

13.2.1.1.3 Increasing the Number of Sending and Receiving Threads for Driver Instance

You can increase the number of sending and receiving threads by editing a driver instance. To edit a driver instance, first select a Messaging Server process from the Standalone Processes section of the OracleAS Wireless System Manager Home page (accessed through Oracle Enterprise Manager 10g Application Server Control as described in Section 13.2) which has the driver instance with the thread values that you need to edit. From the Driver Instances section of the detail page for the selected Messaging Server Process, select the driver and then click Edit. The Properties page for the selected driver appears (Figure 13-2), with its fields populated by the values set for the selected driver. Change the values for the Sending Threads and Receiving Threads parameters as needed and then click OK to commit any changes.

Figure 13-2 The Driver Instance Properties Page

Description of "Figure 13-2 The Driver Instance Properties Page"

13.2.1.2 Messaging Server Client Threads

The Messaging Server client is a service that is registered with the Messaging Server for sending and receiving messages (for example, the Async Listener and the Notification Engine are clients of the Messaging Server). A service queue is associated with every client that is registered to receive messages from the Messaging Server. The service queue contains all of the messages intended for the client and received by the Messaging Server.

Messaging Server client threads are the number of threads that the client uses to dequeue the messages from its service queue. To find the name of the service queue table for a client, refer to Section 13.2.1.1.2. If the dequeuing rate from the service queue is low, then you can improve performance by increasing the number of Messaging Server client threads. You can increase the client threads for Messaging Server clients using the appropriate configuration page for the Messaging Server client in the OracleAS Wireless system management described in Chapter 3, "Managing the OracleAS Wireless Server". For example, increasing the thread pool size of the Messaging Server client for the Async Listener configures the Messaging Server client threads for the Async Listener. For more information, see Section 3.10.2.1.

13.2.1.3 JDBC Connection Pool

The JDBC Connection Pool parameter defines the size of the database connection pool. If you increase the number of driver or client threads to more than 10, then increasing the maximum number of connections from 10 (the default value) to 50 ensures decent performance at peak load.

13.2.1.4 Performance Monitor Process

If a very high load is expected to hit the Messaging Server and if performance data logged by the Performance Monitor is not required, then stopping the Performance Monitor process will improve performance. At high loads, this process has been observed to consume considerable resources, resulting in low throughput.

13.2.1.5 AQ Tuning

AQ (Advanced Queuing) operations result in high number of insertions and deletions from the database. Hence, I/O values on the database will be high and will need careful tuning. Based on the volume of operations, you may want to increase the number of I/O controllers on the machine.

In the test environment, the following observations have been verified.

With the 3 I/O controller, a throughput of 40 messages per second with 7 sending threads was achieved.
With the 12 I/O controller, a throughput of 100 messages per second with 9 sending threads was achieved.

13.2.1.6 Cleansing Messaging Server Tables

Cleaning database tables can improve performance if the Messaging Server or the Performance Monitor process has been running for a long time. You can delete the following tables if the Messaging Server has been running for a long time and if the listed tables contain many rows:

Trans_message
Trans_store
Trans_ids

Note:

Before deleting these tables, be sure that you do not need message details about the messages received and sent by the Messaging Server.

If the Performance Monitor process has been running for a long time and if the listed tables contain many rows, the you can truncate the following tables to improve performance:

trans_request_log
trans_handle_log
trans_process_log
trans_enqueue_log
trans_dequeue_log

Note:

Before truncating these tables, be sure that you do not need any performance data.

13.2.1.6.1 Recreating the Transport Repository

To recreate the transport repository from scratch, run the following scripts from SQL*Plus connected to the repository as wireless user. Be sure to stop the middle tier before running the following scripts, which are located at wireless/repository/sql directory:

trans_clean.sql
trans_setup.sql
trans_setup.pls
trans_setup.plb

13.3 Optimizing the Async Listener Performance

The performance logging framework at the Web Server level collects performance-related data for an Async Listener. To view this data, you first select an Async Listener process (located in the Web-Based Processes section of the OracleAS Wireless System Manager Home page). The detail page for the selected process appears. Clicking the Performance tab of the detail page invokes the process's Performance page. The page includes the following performance metrics:

Number of Messages Received

The number of messages received, grouped by process ID.

Average Message Response Time (seconds)

The average time a message stayed on the server.

Average Message Queue Size

The average size of the message queue on an hourly basis for today.

Service Access Count

The number of times that each application was accessed today.

User Access Count

The number of messages issued by each user device.

Number of Errors

The number of errors on an hourly basis.

13.3.1 Tuning the Performance of the Async Listener

The following sections describe the tuning knobs available in OracleAS Wireless that affect Async Listener performance:

Section 13.3.1.1, "Tuning the Working Threads for the Async Listener"
Section 13.3.1.2, "Adjusting the Thread Pool Size of Messaging Server Client"
Section 13.3.1.3, "Adjusting the Sending and Receiving Threads"

13.3.1.1 Tuning the Working Threads for the Async Listener

The Async Listener Configuration page (Figure 13-3) enables you to change the number of working threads for the Async Listener. By default, the value for the Working Threads parameter is 10. You can increase this parameter to a higher value to accommodate a higher request rate.

Figure 13-3 Configuring the Async Worker Threads

Description of "Figure 13-3 Configuring the Async Worker Threads"

13.3.1.2 Adjusting the Thread Pool Size of Messaging Server Client

Increasing the size of the thread pool enables the Messaging Server client to handle higher loads. You can adjust the size of the thread pool from the Messaging Server Client page (Figure 13-4). To access this page (Figure 13-4), select Messaging Server Client (located under Notification Engine in the Component Configuration section of the Administration tab. For more information on configuring the Messenger Sever client, see Section 3.10.2.1.

Figure 13-4 Adjusting the Thread Pool Size

Description of "Figure 13-4 Adjusting the Thread Pool Size"

13.3.1.3 Adjusting the Sending and Receiving Threads

You can also speed up de-queuing and enqueuing by increasing the number of sending and receiving threads. For more information, see Section 13.2.1.1.3.

13.4 Optimizing Data Feeder Performance

Parsing input is a costly operation. The performance of such operations depends largely on the amount of memory available to the Java Virtual machine (JVM). To handle a high feed size, you can increase the heap size of the Data Feeder process. Normally, parsing XML feeds consume larger resources than CSV (comma-separated variable) feeds.

In the test environment, the following observations have been verified.

With a large XML feed of 25 MB, a throughput of 43 data rows per-second was achieved by using a heap size of 512 MB.
For a CSV feed of the same volume, a throughput of 48 data rows per-second was achieved.

13.5 Optimizing the Performance of the Oracle HTTP Server

The following sections describe the configuration directives that you can tune in httpd.conf(located in the ORACLE_HOME/Apache/Apache/conf/ directory) to enhance performance of the Oracle HTTP Server (OHS).

Section 13.5.1, "MaxClients"
Section 13.5.2, "MaxRequestsPerChild"
Section 13.5.3, "MaxSpareServers"
Section 13.5.4, "MinSpareServers"
Section 13.5.5, "Start Servers"
Section 13.5.6, "Timeout"

13.5.1 MaxClients

This is the maximum number of servers that can run. An optimum number should be used based on load. A low number causes clients to be locked out; a high number of servers consumes more resources.

13.5.2 MaxRequestsPerChild

The number of requests that a child process handles before it expires and gets re-spawned. The default value 0 means that it will never expire. As as result, you should limit this value. Generally, 10000 is sufficient.

13.5.3 MaxSpareServers

This is the maximum number of pre-spawned processes that are available in the pool of the Apache process that handles connections. The suggested value may vary, as 10 will suffice for most requirements.

13.5.4 MinSpareServers

This is the minimum number of child processes that need to be pre-spawned all the time. The value 5 will suffice for most requirements.

13.5.5 Start Servers

The number of servers to start initially. If a sudden load is expected on startup, then this value should be increased.

13.5.6 Timeout

The number of seconds before incoming receives, and outgoing sends the time out. The recommended value is 300 seconds.

13.6 Optimizing the Oracle Process Management and Notification Service (OPMN)

Because the default file descriptor number per JVM is low, you should increase this number to a higher value in the ORACLE_HOME/opmn/bin/opmnctl script by modifying (or adding) the following line:

> ulimit -n 2048

Note:

You must bounce OMPN after you increase the value of the default file descriptor.

13.7 Optimizing the Database Connections

Oracle Application Server uses database connections for Single Sign On (SSO), Oracle Internet Directory (OID), and other connections. Because the default number of connections may not suffice for a high number of users, you should therefore increase this number as users increase. You can increase this number by modifying the relevant files in the database.

13.8 Optimizing the Capacity of Webcache

The Webcache capacity should be set to a high value depending upon the load. For example, if you are hitting 50 requests per second, then you must set the capacity to 1000. You can increase the Webcache capacity by editing the properties of the Webcache's origin server. To edit the origin server:

Click Web Cache in the Application Server Home page of the Enterprise Manager. The Web Cache Home page appears.
Click the Administration tab.
Click Origin Servers (located in the Application section under Properties). The Origin Server page appears.
Select an origin server and then click Edit. The Edit Origin Server page appears.
Increase the value in the Capacity field.
Click OK to commit the change.

Note:

Depending on the incoming requests and the size of documents to be cached, you must also change the maximum incoming connections and the maximum cache size. You can change these values using the Resource Limits and Timeouts page (accessed by clicking Resource Limits and Timeouts in the Web Cache section of the Web Cache Administration tab).

13.9 Optimizing JVM Performance

Because Java applications run within the context of the JVM, some default properties of the JVM must be modified to enable the applications to run faster and consume fewer resources.

Since garbage collection (GC) was not a parallel process until Java 1.3.1, it can become a principal performance bottleneck as the number of CPU's increase. Java 1.4.2 implements the concept of generational garbage collections. Generations are memory pools that hold objects of different ages. There are two GC cycles: Minor Collection and Major collection.

Minor Collection

Typically young objects, which comprise the young generation, die fast. Minor collection occurs when memory pool (or generation) of young objects fills up. During Minor Collection, live objects from the young generation memory pool are eventually copied to a tenured pool. The young generation consists of Eden, where objects are initially allocated and two survivor spaces. One of these survivor spaces remains empty and serves as the destination where live objects in Eden and the other survivor space are copied. Objects are copied between the survivor spaces until they become old enough to be tenured (copied to the tenured generation).

Major Collection

The collection of older generation, or tenured objects. Typically, a Major Collection is slower than a Minor Collection because it involves all live objects.

The first step in tuning is to observe the frequency of GC by using the following command-line option:

> java -verbose: gc classname

This command results in output similar Example 13-1.

Example 13-1 GC Frequency

> [GC 866K->764K(1984K), 0.0037943 secs]
> [GC 1796K->1568K(2112K), 0.0068823 secs]
> [Full GC 2080K->1846K(3136K), 0.0461094 secs]
> [GC 2047K->1955K(3136K), 0.0157263 secs]

For more information on parallel GC, see Section 13.10.1.

13.9.1 Modifying the Default GC Behavior

The following knobs (available within Java 1.4.2) change the default GC behavior by modifying the heap and generation size.

-Xms and -Xmx

The total size of the heap is bounded by the -Xms and -Xmx values. -Xms is the minimum size of the heap and -Xmx is the maximum size to which the heap can grow. Having a larger heap reduces the frequency of collections. Increase the heap size as the number of processors increase, since allocation can be done in parallel.

-XXNewSize and -XX:MaxNewSize

These options are specific to Sun Microsystem's HotSpot VM. The young generation size is bounded by these values. A smaller generation, which means a faster rate of Minor Collections and lower frequency of Major Collections, is best suited for Web applications.

By changing these parameters, you can change the frequency of collections as desired by the application.

13.9.1.1 Improving GC Performance

Other knobs that improve GC performance include:

- XX: New Ratio

NewRatio controls the young generation size. For example, setting -XX:NewRatio=3 means that the ration between the young and tenured generation is 1:3.

-XX:SurvivorRatio

Use the parameter SurvivorRatio to tune the size of the survivor spaces. For example, -XX:SurivorRatio=6 sets the ratio between each survivor space and Eden at 1:6. In other words, each survivor space is one-eighth of the young generation (not one-seventh, because there are two survivor spaces).

-XX: +UseParallelGC

The throughput collector is a generational collector similar to the default collector but with multiple threads that perform minor collection. Enable the throughput collector using the command-line flag -XX:+UseParallelGC.

-XX:ParallelGCThreads

Use the ParallelGCThreads command-line option to control the number of garbage collector threads.

-Xss

This is the size of the stack on a per-thread basis. Its default value changes from platform to platform. If the number of threads running in the application is high, then you can decrease the default size. If, for example, the threads require a high stack space for parsing operations and recursive calls, then increasing the stack size can provide significant performance increase.

Tune the value of these options according to the application type. Table 13-1 describes a typical setup for the E420/Solaris box with four 450Mhz processors and four GB RAM to support 2000 concurrent users.

Table 13-1 Typical Setup for the E420/Solaris Box

Options	Recommended Value
`-Xms`	1024m
`-Xmx`	1536m
`-XX: NewRatio`	2
`-XX: SurvivorRatio`	16
`-XX: ParallelGCThreads` (Use with `-XX:+UseParallelGC`)	4
`-Xss`	256K
`-XX:UseLWPSynchronization`	This thread model should be used.

13.9.2 Modifying JVM Parameters

Modify these Java attributes in the Java Options field, located in the Command Line Options section of the Oracle Enterprise Manger's Server Properties page (Figure 13-5). To access this page, first select the OC4J_Wireless process on the Home page and then select Administration. Next, select Server Properties (located under Instance Properties). The Server Properties page appears.

Figure 13-5 Editing the Java Options

Description of "Figure 13-5 Editing the Java Options"

13.10 Optimizing the OC4J_Wireless Server Instance

The JVM heap-tuning options -Xms512m and -Xmx1024m increase the maximum memory size to 1 GB (or more) and enable the OC4J_Wireless server instance to support a large number of concurrent users.

To support higher hit rates, increase the MaxClients parameter in httpd.conf. For example, setting the MaxClients parameter to 1024 in httpd.conf allows up to 1024 concurrent HTTP requests. Consequently, increased requests result in an increased number of Oracle Application Server threads in the OC4J_Wireless server instance. To support large numbers of Oracle Application Server threads in the OC4J_Wireless server instance, reduce the thread stack size to 256k using the JVM option, -Xss256k. The default thread stack size in the Solaris environment is 512k.

For more information on the JVM options, see Section 13.9.

13.10.1 Enabling Parallel Garbage Collection

For OC4J_Wireless server instances running on multi-CPU machines, set the JVM options to enable the Parallel Garbage Collection (GC) algorithm in JDK 1.4. Set the ParallelGCThreads parameter to the number of CPUs in the host. The following JVM options for JDK 1.4 increase the performance of the OC4J_Wireless Server instance in 4-CPU Solaris machines:

-XX:+UseParallelGC -XX:ParallelGCThreads=4

The following GC tuning parameters provide improved performance for the OC4J_Wireless server instance:

-XX:NewRatio=2 -XX:SurvivorRatio=16

13.11 Tuning the Performance of the Operating System

This section describes tuning methods for the operating system's performance of Oracle Application Server Wireless.

13.11.1 TCP/IP Tuning

Correctly tuned TCP/IP settings improve performance. The indicators for changing default parameters are primarily TCP connection drops while making the three-way handshake, and the system refusing connections at a certain load.

Note:

The information in this section applies only to Solaris.

Use the following UNIX command to check for TCP connection drops:

netstat - s | grep Drop

Note the following value:

tcpListenDrop, tcpListenDropQ0, tcpHalfOpenDrop

Any value other than zero suggests the need for changing the tcp connection queue size. While any value for tcpListernDrop suggests a bottleneck in executing the accept() call and value for tcpListenDropQ0, it is an indication of SYN flood or denial-of-services attack.

Use the following UNIX command to check if connections should be replenished more quickly:

netstat | grep TIME_WAIT | wc - l

Note the number of connections in the TIME_WAIT state. If the rate of establishing connections (load) is known, then you can compute the time taken to run out of connections. To ensure that new connections are readily available, you can decrease the tcp_time_wait_interval to a low value of 10000 ms.

You can set most of these values using UNIX command ndd. For example:

> ndd - set  /dev/tcp tcp_time_wait_interval 10000

These parameters (described in Table 13-2), take effect after the application is restarted. They should be added to the system startup file so that they are not lost after a reboot

You must change the tcp_conn_hash_size in the file /etc/system after a reboot.

Table 13-2 Operating System Performance Parameters

Parameter	Setting	Comments
`tcp_time_wait_interval`	10000	The time out for disposing closed connection information. This makes new connections readily available.
`tcp_xmit_hiwat`	65536	The size of the TCP transfer windows for sending and receiving data determines how much data can be sent without waiting for an acknowledgment. This can speed up large data transfers significantly.
`tcp_conn_req_max_q` `tcp_conn_req_max_q0`	10240	The size of the complete (and incomplete connection) queue. Generally, the default values are sufficient. However, it is recommended to increase these values to 10240. These values can be changed if connection drop problems are observed.
`tcp_slow_start_initial`	4	This setting changes the data transmission rate. Changing this value is important to work arounds to bugs used some operating systems in the implementation of slow start algorithms.

Solaris Kernel Recommendations

To enhance performance, change the Solaris Kernel performance parameters (described in Table 13-3) in the file /etc/system.

Table 13-3 Solaris Kernel Performance Parameters

Parameter	Value	Comment
`rlim_fd_max`	8192	The hard limit for number of file descriptors.
`rlim_fd_cur`	2048	The soft limit for number of file descriptors.
`lwp_default_stksize`	0x4000	The LWP stack size.
`rpcmod:svc_run_stksize`	0x4000	The NFS stack size.
`sq_max_size`	1600	By increasing `sq_max_size`, you increase the number of message blocks (mblk) that can be in any given syncq. For every 64mb, add 25 to its value. As a result, the value for 4GB is 1600.