5 Using a Centralized Database

This chapter contains the following topics:

Using a Centralized Database and Accessing the Database Server
Using JDBC to Access the Database
Using PL/SQL to Access the Database
Using Perl to Access the Database
Using C/C++ to Access the Database

5.1 Using a Centralized Database and Accessing the Database Server

A centralized Unicode database is a feature of both the monolingual approach and the multilingual approach to developing global Internet applications. Using a centralized database has the following advantages:

It provides a complete view of your data. For example, you can query for the number of customers worldwide or the worldwide inventory level of a product.
It is easier to manage a centralized database than several distributed databases.

The database character set should be Unicode. You can use Unicode to store and manipulate data in several languages. Unicode is a universal character set that defines characters in almost all languages in the world. Oracle databases can store Unicode data in one of the following encoding forms:

UTF-8: Each character is 1 to 4 bytes long.
UTF-16: Each character is either 2 or 4 bytes long.
UTF-32: Each character is 4 bytes long.

See Also:

"Configuring Oracle HTTP Server and OC4J for Global Deployment" in Chapter 6
Oracle Globalization Support Guide in the Oracle Database Documentation Library

There are several methods by which Internet applications can access the database server through Oracle Application Server. Any Java-based Internet applications that use technologies such as Java Servlets, JSPs, and EJBs can use the Oracle JDBC drivers for database connectivity.

Because Java strings are always Unicode-encoded, JDBC transparently converts text data from the database character set to and from Unicode. Java Servlets and JSPs that interact with an Oracle database should ensure the following:

The Java strings returned from the database are converted to the encoding of the HTML page being constructed
Form inputs are converted from the encoding of the HTML form to Unicode before being used in calling the JDBC driver

For non-Java Internet applications that use programming technologies such as Perl, PL/SQL, and C/C++, text data retrieved from or inserted into a database are encoded in the character set specified by the NLS_LANG parameter. The character set used for the POSIX locale should match the NLS_LANG character set so that data from the database can be directly processed with the POSIX locale-sensitive functions in the applications.

For multilingual applications, the NLS_LANG character set and the page encoding should both be UTF-8 to avoid character set conversion and possible data loss.

5.2 Using JDBC to Access the Database

Use the Oracle JDBC drivers provided in Oracle Application Server for Oracle database access when you use JSPs and Java Servlets. Oracle Application Server provides two client-side JDBC drivers that you can deploy with middle-tier applications:

JDBC OCI driver, which requires the Oracle client library
JDBC Thin driver, which is a pure Java driver

Oracle JDBC drivers transparently convert character data from the database character set to Unicode for the SQL CHAR data types and the SQL NCHAR data types. As a result of this transparent conversion, JSPs and Java Servlets calling Oracle JDBC drivers can bind and define database columns with Java strings and fetch data into Java strings from the result set of a SQL execution.

You can use a Java string to bind the NAME and ADDRESS columns of a customer table. Define the columns as VARCHAR2 and NVARCHAR2 data types, respectively. For example:

String cname = request.getParameter("cname")
String caddr = request.getParameter("caddress");
OraclePreparedStatement pstmt = conn.prepareStatement("insert into" + 
        "CUSTOMERS (NAME, ADRESS) values (?, ?) ");
pstmt.setString(1, cname);
pstmt.setFormOfUse(2, OraclePreparedStatement.FORM_NCHAR);
pstmt.setString(2, caddr);
pstmt.execute();

To bind a Java string variable to the ADDRESS column defined as NVARCHAR2, you should call the setFormOfUse() method before the setString() method.

The Oracle JDBC drivers set the values for the NLS_LANGUAGE and NLS_TERRITORY session parameters to the values corresponding to the default Java locale when the database session was initialized. For monolingual applications, the Java default locale is configured to match the user's locale. Hence the database connection is always synchronized with the user's locale.

See Also:

Oracle Database JDBC Developer's Guide and Reference in the Oracle Database Documentation Library

5.3 Using PL/SQL to Access the Database

PL/SQL procedures and PSPs use SQL to access data in the local Oracle database. They can also use SQL and database links to access data from a remote Oracle database.

For example, you can call the following PL/SQL procedure from the mod_plsql module. It inserts a record into a customer table with the customer name column defined as VARCHAR2 and the customer address column defined as NVARCHAR2:

procedure addcustomer( cname varchar2 default NULL, caddress nvarchar2 default NULL) is
begin
   ....
   if (cname is not null) then
        caddr :=TO_NCHAR(address);
        insert into customers (name, address) values (cname, caddr);
        commit;
   end if;
end;

Note that Apache mod_plsql does not support NVARCHAR argument passing. As a result, PL/SQL procedures have to use VARCHAR2 for arguments and convert them to NVARCHAR explicitly before executing the INSERT statement.

The example uses static SQL to access the customer table. You can also use the DBMS_SQL PL/SQL package to access data in the database, using dynamic SQL.

See Also:

PL/SQL Packages and Types Reference 10g Release 1 (10.1) in the Oracle Database Documentation Library

5.4 Using Perl to Access the Database

Perl scripts access Oracle databases using the DBI/DBD driver for Oracle. The DBI/DBD driver is part of Oracle Application Server. It calls Oracle Callable Interface (OCI) to access the databases. The data retrieved from or inserted into the databases is encoded in the NLS_LANG character set. Perl scripts should do the following:

Initialize a POSIX locale with the locale specified in the LC_ALL environment variable
Use a character set equivalent to the NLS_LANG character set

This allows you to process data retrieved from the databases with POSIX string manipulation functions.

The following code shows how to insert a row into a customer table in an Oracle database through the DBI/DBD driver.

Use Apache::DBI;
...
# Connect to the database
$constr = 'host=dlsun1304.us.oracle.com;sid=icachedb;port=1521' ; 
$usr = 'system' ; 
$pwd = 'manager' ; 
$dbh = DBI->connect("dbi:Oracle:$constr", $usr, $pwd, {AutoCommit=>1} ) || 
       $r->print("Failed to connect to Oracle: " . DBI->errstr ); 

# prepare the statement 
$sql = 'insert into customers (name, address) values (:n, :a)'; 
$sth = $dbh->prepare( $sql ); 
$sth->bind_param(':n' , $cname);
$sth->bind_param(':a', $caddress);
$sth->execute(); 
$sth->finish(); 
$dbh->disconnect();

If the target columns are of the SQL NCHAR data types, then you need to specify the form of use flag for each bind variable. For example, if the address column is of NVARCHAR2 datatype, then you need to add the $sth->func() function call before executing the SQL statement:

use DBD::Oracle qw(:ora_forms);
...
$sql = 'insert into customers (name, address) values (:n, :a)';
$sth = $dbh->prepare($sql);
$sth->bind_param(':n', $cname);
$sth->bind_param(':a', $caddress);
$sth->func( { ':a' => ORA_NCHAR }, 'set_form');
$sth->execute();
$sth->finish();
$dbh->disconnect();

To properly process UTF-8 data in a multilingual application, Perl scripts should do the following:

Use a POSIX locale associated with the UTF-8 character set
Use the UTF-8 Perl module to indicate that all strings in the Perl scripts are in UTF-8

5.5 Using C/C++ to Access the Database

C/C++ applications access the Oracle databases with OCI or Pro*C/C++. You can call OCI directly or use the Pro*C/C++ interface to retrieve and store Unicode data in a UTF-8 database and in SQL NCHAR data types.

Generally, data retrieved from and inserted into the database is encoded in the NLS_LANG character set. C/C++ programs should use the same character set for their POSIX locale as the NLS_LANG character set. Otherwise, the POSIX string functions cannot be used on the character data retrieved from the database, and the character data encoded in the POSIX locale may be corrupted when it is inserted into the database.

For multilingual applications, you may want to use the Unicode API provided in the OCI library instead of relying on the NLS_LANG character set. This alternative is good for applications written for platforms such as Microsoft Windows, which implement the wchar_t datatype using UTF-16 Unicode. Using the Unicode API for those platforms bypasses some unnecessary data conversions required when using the regular OCI API.

This section includes the following topics:

Using the OCI API to Access the Database
Using the Unicode API Provided with OCI to Access the Database
Using Unicode Bind and Define in Pro*C/C++ to Access the Database

Note:
OCI libraries are part of Oracle Application Server. You do not need to install the Oracle database client to use them.

5.5.1 Using the OCI API to Access the Database

The following example shows how to bind and define the VARCHAR2 and NVARCHAR2 columns of a customer table in C/C++. It uses OCI and is based on the NLS_LANG character set. Note that the text datatype is a macro for unsigned char.

text *sqlstmt= (text *)"SELECT name, address FROM customers 
                        WHERE id = :cusid"; 
text cname[100];                      /* Customer Name */  
text caddr[200];                      /* Customer Address */ 
text custid[10] = "9876";             /* Customer ID */
ub2 cform = SQLCS_NCHAR;              /* Form of Use for NCHAR types */
... 
OCIStmtPrepare (stmthp, errhp, sqlstmt, 
                (ub4)strlen ((char *)sqlstmt), 
                (ub4)OCI_NTV_SYNTAX, (ub4)OCI_DEFAULT)); 
/* Bind the custid buffer */
OCIBindByName(stmthp, &bnd1p, errhp, (text*)":custid",  
              (sb4)strlen((char *)":custid"),  
              (dvoid *) custid, sizeof(cust_id), SQLT_STR,  
              (dvoid *)&insname_ind, (ub2 *) 0, (ub2 *) 0, 
              (ub4) 0,(ub4 *)0, OCI_DEFAULT);

/* Define the cname buffer for VARCHAR */
OCIDefineByPos (stmthp, &dfn1p, errhp, (ub4)1, (dvoid *)cname, 
                (sb4)sizeof(cname), SQLT_STR, 
                (dvoid *)0, (ub2 *)0, (ub2 *)0, (ub4)OCI_DEFAULT); 

/* Define the caddr buffer for the address column in NVARCHAR2 */
OCIDefineByPos (stmthp, &dfn2p, errhp, (ub4)2, (dvoid *)caddr, 
                (sb4)sizeof(caddr), SQLT_STR, 
                (dvoid *)0, (ub2 *)0, (ub2 *)0, (ub4)OCI_DEFAULT); 
OCIAttrSet((void *) dfn2p, (ub4) OCI_HTYPE_DEFINE, (void *) &cform, (ub4) 0,
           (ub4)OCI_ATTR_CHARSET_FORM, errhp); 
...

5.5.2 Using the Unicode API Provided with OCI to Access the Database

You can use the Unicode API that the OCI library provides for multilingual applications.

Turn on the Unicode API by specifying Unicode mode when you create an OCI environment handle. Any handle inherited from the OCI environment handle is set to Unicode mode automatically. By changing to Unicode mode, all text data arguments to the OCI functions are assumed to be in the Unicode text (utext*) datatype and in UTF-16 encoding. For binding and defining, the data buffers are assumed to be utext buffers in UTF-16 encoding.

The program code for the Unicode API is similar to the code for the non-Unicode OCI API, with the following exceptions:

All text data types are changed to the utext datatype, which is a macro of the unsigned short datatype
All literal strings are changed to Unicode literal strings
All strlen() functions are changed to wcslen() functions to calculate the string length in number of Unicode characters instead of bytes

The following Microsoft Windows program shows how to do these tasks:

Create an OCI environment handle with Unicode mode turned on
Bind and define the name column in VARCHAR2 and the address column in NVARCHAR2 of the customers table

utext *sqlstmt= (text *)L"SELECT name, address FROM customers 
                        WHERE id = :cusid"; 
utext cname[100];                     /* Customer Name */  
utext caddr[200];                     /* Customer Address */ 
text custid[10] = "9876";             /* Customer ID */
ub1 cform = SQLCS_NCHAR;              /* Form of Use for NCHAR types */
... 
/* Use Unicode OCI API by specifying UTF-16 mode */
status = OCIEnvCreate((OCIEnv **)&envhp, OCI_UTF16, (void *)0, 
                      (void *(*) ()) 0, (void *(*) ()) 0, (void(*) ()) 0, 
                      (size_t) 0, (void **)0);
... 
OCIStmtPrepare (stmthp, errhp, sqlstmt, 
                (ub4)wcslen ((char *)sqlstmt), 
                (ub4)OCI_NTV_SYNTAX, (ub4)OCI_DEFAULT)); 

/* Bind the custid buffer */
OCIBindByName(stmthp, &bnd1p, errhp, (constant text*) L":custid",  
              (sb4)wcslen(L":custid"),  
              (dvoid *) custid, sizeof(cust_id), SQLT_STR,  
              (dvoid *)&insname_ind, (ub2 *) 0, (ub2 *) 0, 
              (ub4) 0,(ub4 *)0, OCI_DEFAULT);

/* Define the cname buffer for the name column in VARCHAR2 */
OCIDefineByPos (stmthp, &dfn1p, errhp, (ub4)1, (dvoid *)cname, 
                (sb4)sizeof(cname), SQLT_STR, 
                (dvoid *)0, (ub2 *)0, (ub2 *)0, (ub4)OCI_DEFAULT); 

/* Define the caddr buffer for the address column in NVARCHAR2 */
OCIDefineByPos (stmthp, &dfn2p, errhp, (ub4)2, (dvoid *)caddr, 
                (sb4)sizeof(caddr), SQLT_STR, 
                (dvoid *)0, (ub2 *)0, (ub2 *)0, (ub4)OCI_DEFAULT); 
OCIAttrSet((void *) dfn2p, (ub4) OCI_HTYPE_DEFINE, (void *) &cform, (ub4) 0,
           (ub4)OCI_ATTR_CHARSET_FORM, errhp); 
...

5.5.3 Using Unicode Bind and Define in Pro*C/C++ to Access the Database

You can use Unicode bind and define in Pro*C/C++ for multilingual applications.

Pro*C/C++ lets you specify UTF-16 Unicode buffers for bind and define operations. There are two ways to specify UTF-16 buffers in Pro*C/C++:

Use the utext datatype, which is an alias for the unsigned short datatype in C/C++
Use the uvarchar datatype provided by Pro*C/C++. It will be preprocessed to a struct with a length field and a utext buffer field.

struct uvarchar
{
   ub2 len;         /* length of arr */
   utext arr[1] ;   /* UTF-16 buffer */
};
typedef struct uvarchar uvarchar ;

In the following example, there are two host variables: cname and caddr. The cname host variable is declared as a utext buffer containing 100 UTF-16 code units (unsigned short) for the customer name column in the VARCHAR2 datatype. The caddr host variable is declared as a uvarchar buffer containing 50 UCS2 characters for the customer address column in the NVARCHAR2 datatype. The len and arr fields are accessible as fields of a struct.

#include <sqlca.h>
#include <sqlucs2.h>

main()
{
   ...
   /* Change to STRING datatype:    */
   EXEC ORACLE OPTION (CHAR_MAP=STRING) ;
   utext cname[100] ;              /* unsigned short type */
   uvarchar caddr[200] ;           /* Pro*C/C++ uvarchar type */
   ...
   EXEC SQL SELECT name, address INTO :cname, :caddr FROM customers;
   /* cname is NULL-terminated */
   wprintf(L"ENAME = %s, ADDRESS = %.*s\n", cname, caddr.len, caddr.arr);
   ...
}