OracleAS TopLink Writing Optimization Features

Table 10-3 lists the OracleAS TopLink write optimization features.

Table 10-3 Write Optimization Features

Feature	Effect on Performance
Unit of Work	Improves performance by updating only the changed fields and objects. Minimizes the amount of tracking required (which can be expensive) by registering only those objects that will change. Note: The Unit of Work supports marking classes as read-only, thus avoiding tracking of objects that do not change.
Parameterized SQL	Improves performance for frequently executed SQL statements.
Batch writing	Allows you to group all insert, update, and delete commands from a transaction into a single database call. This dramatically reduces the number of calls to the database.
Sequence number preallocation	Dramatically improves insert performance.
Does exist alternatives	The does exist call on write object can be avoided in certain situations by checking the cache for does exist, or assuming the existence of the object.

Writing Case 1: Batch Writes

The most common write performance problem occurs when a batch job inserts a large volume of data into the database. For example, consider a batch job that loads a large amount of data from one database and then migrates the data into another. The objects involved:

Are simple individual objects with no relationships
Use generated sequence numbers as their primary key
Have an address that also uses a sequence number

The batch job loads 10,000 employees from the first database and inserts them into the target database. With no optimization, the batch job reads all the records from the source database, acquires a Unit of Work from the target database, registers all objects, and commits the Unit of Work.

Example 10-14 No Optimization

/* Read all the employees, acquire a Unit of Work and register them. */

// Read all the employees from the database. This requires 1 SQL call, but will be very memory intensive as 10,000 objects will be read.
Vector employees = sourceSession.readAllObjects(Employee.class);

//SQL: Select * from Employee

// Acquire a Unit of Work and register the employees.
UnitOfWork uow = targetSession.acquireUnitOfWork();
uow.registerAllObjects(employees);
uow.commit();

//SQL: Begin transaction
//SQL: Update Sequence set count = count + 1 where name = 'EMP'
//SQL: Select count from Sequence
//SQL: ... repeat this 10,000 times + 10,000 times for the addresses ...
//SQL: Commit transaction
//SQL: Begin transaction
//SQL: Insert into Address (...) values (...)
//SQL: ... repeat this 10,000 times
//SQL: Insert into Employee (...) values (...)
//SQL: ... repeat this 10,000 times
//SQL: Commit transaction}

This batch job performs poorly, because it requires 60,000 SQL executions. It also reads huge amounts of data into memory, which can raise memory performance issues. OracleAS TopLink offers several optimization features to improve the performance of this batch job.

To improve this operation:

Leverage OracleAS TopLink batch reads and cursor support.
Implement sequence number preallocation.
Use batch writing to write to the database.

If your database does not support batch writing, use parameterized SQL to implement the write query.
Implement multiprocessing.

Cursors and Batch Writes

To optimize the query in Example 10-14, use a cursored stream to read the employees from the source database. You can also employ a cache identity map, rather than a full identity map, in both the source and target databases.

To address the potential for memory problems, use the releasePrevious() method after each read to stream the cursor in groups of 100. Register each batch of 100 employees in a new Unit of Work and commit them.

Although this procedure does not reduce the amount of executed SQL, it does address potential out-of-memory issues. When your system runs out of memory, the result is performance degradation that increases over time, and excessive disk activity caused by memory swapping on disk.

Sequence Number Preallocation

SQL select calls are more resource-intensive than SQL modify calls, so you can realize large performance gains by reducing the number of select calls you issue. The code in Example 10-14 uses the select calls to acquire sequence numbers. You can substantially improve performance if you use sequence number preallocation.

In OracleAS TopLink, you can configure the sequence preallocation size on the login object (the default size is 50). Example 10-14 uses a preallocation size of 1 to demonstrate this point. If you stream the data in batches of 100 as suggested in "Cursors and Batch Writes", then set the sequence preallocation size to 100. Because employees and addresses in the example both use sequence numbering, you further improve performance by letting them share the same sequence. If you set the preallocation size to 200, the number of SQL executions is reduced from 60,000 to 20,200.

Batch Writing

Batch writing enables you to combine a group of SQL statements into a single statement, and send it to the database as a single database execution. This feature reduces the communication time between the application and the server, and substantially improves performance.

You can enable batch writing on the login object with the useBatchWriting() method. If you add batch writing to Example 10-14, you execute each batch of 100 employees as a single SQL execution. The number of SQL executions is reduced from 20,200 to 300.

Parameterized SQL

OracleAS TopLink supports parameterized SQL and prepared statement caching. Using parameterized SQL improves write performance, because it avoids the prepare cost of a SQL execution.

You cannot use batch writing and parameterized SQL together, because batch writing does not use individual statements. The performance benefits of batch writing are much greater than those of parameterized SQL, so use batch writing if it is supported by your database.

Parameterized SQL avoids the prepare component of SQL execution, but does not reduce the number of executions. Because of this, it normally offers only moderate performance gains. However, if your database does not support batch writing, parameterized SQL improves performance. If you add parameterized SQL in Example 10-14, you must still execute 20,200 SQL executions, but parameterized SQL reduces the number of SQL PREPAREs to 4.

Multiprocessing

You can use multiple processes or multiple systems to split the batch job into several smaller jobs. In this example, splitting the batch job across threads enables you to synchronize reads from the cursored stream, and use parallel Units of Work on a single system.

This leads to a performance increase, even if the system has only a single processor, because it takes advantage of the wait times inherent in SQL execution. While one thread waits for a response from the server, another thread uses the waiting cycles to process its own database operation.

Example 10-15 illustrates the optimized code for splitting the batch job across threads. Note that it does not illustrate multiprocessing.

Example 10-15 Fully Optimized

/* Read each batch of employees, acquire a Unit of Work and register them. */
targetSession.getLogin().useBatchWriting();
targetSession.getLogin().setSequencePreallocationSize(200);

// Read all the employees from the database, into a stream. This requires 1 SQL call, but none of the rows will be fetched.
ReadAllQuery query = new ReadAllQuery();
query.setReferenceClass(Employee.class);
query.useCursoredStream();
CursoredStream stream;
stream = (CursoredStream) sourceSession.executeQuery(query);
//SQL: Select * from Employee. Process each batch
while (! stream.atEnd()) {
    Vector employees = stream.read(100);
    // Acquire a Unit of Work to register the employees
    UnitOfWork uow = targetSession.acquireUnitOfWork();
    uow.registerAllObjects(employees);
    uow.commit();
}
//SQL: Begin transaction
//SQL: Update Sequence set count = count + 200 where name = 'SEQ'
//SQL: Select count from Sequence where name = 'SEQ'
//SQL: Commit transaction
//SQL: Begin transaction
//BEGIN BATCH SQL: Insert into Address (...) values (...)
//... repeat this 100 times
//Insert into Employee (...) values (...)
//... repeat this 100 times
//END BATCH SQL:
//SQL: Commit transactionJava optimization