Oracle® Application Server Globalization Support Guide
10g Release 2 (10.1.2) B14004-02 |
|
Previous |
Next |
This chapter contains the following topics:
A centralized Unicode database is a feature of both the monolingual approach and the multilingual approach to developing global Internet applications. Using a centralized database has the following advantages:
It provides a complete view of your data. For example, you can query for the number of customers worldwide or the worldwide inventory level of a product.
It is easier to manage a centralized database than several distributed databases.
The database character set should be Unicode. You can use Unicode to store and manipulate data in several languages. Unicode is a universal character set that defines characters in almost all languages in the world. Oracle databases can store Unicode data in one of the following encoding forms:
See Also:
|
There are several methods by which Internet applications can access the database server through Oracle Application Server. Any Java-based Internet applications that use technologies such as Java Servlets, JSPs, and EJBs can use the Oracle JDBC drivers for database connectivity.
Because Java strings are always Unicode-encoded, JDBC transparently converts text data from the database character set to and from Unicode. Java Servlets and JSPs that interact with an Oracle database should ensure the following:
The Java strings returned from the database are converted to the encoding of the HTML page being constructed
Form inputs are converted from the encoding of the HTML form to Unicode before being used in calling the JDBC driver
For non-Java Internet applications that use programming technologies such as Perl, PL/SQL, and C/C++, text data retrieved from or inserted into a database are encoded in the character set specified by the NLS_LANG
parameter. The character set used for the POSIX locale should match the NLS_LANG
character set so that data from the database can be directly processed with the POSIX locale-sensitive functions in the applications.
For multilingual applications, the NLS_LANG
character set and the page encoding should both be UTF-8 to avoid character set conversion and possible data loss.
Use the Oracle JDBC drivers provided in Oracle Application Server for Oracle database access when you use JSPs and Java Servlets. Oracle Application Server provides two client-side JDBC drivers that you can deploy with middle-tier applications:
JDBC OCI driver, which requires the Oracle client library
JDBC Thin driver, which is a pure Java driver
Oracle JDBC drivers transparently convert character data from the database character set to Unicode for the SQL CHAR
data types and the SQL NCHAR
data types. As a result of this transparent conversion, JSPs and Java Servlets calling Oracle JDBC drivers can bind and define database columns with Java strings and fetch data into Java strings from the result set of a SQL execution.
You can use a Java string to bind the NAME and ADDRESS columns of a customer table. Define the columns as VARCHAR2
and NVARCHAR2
data types, respectively. For example:
String cname = request.getParameter("cname") String caddr = request.getParameter("caddress"); OraclePreparedStatement pstmt = conn.prepareStatement("insert into" + "CUSTOMERS (NAME, ADRESS) values (?, ?) "); pstmt.setString(1, cname); pstmt.setFormOfUse(2, OraclePreparedStatement.FORM_NCHAR); pstmt.setString(2, caddr); pstmt.execute();
To bind a Java string variable to the ADDRESS column defined as NVARCHAR2
, you should call the setFormOfUse()
method before the setString()
method.
The Oracle JDBC drivers set the values for the NLS_LANGUAGE
and NLS_TERRITORY
session parameters to the values corresponding to the default Java locale when the database session was initialized. For monolingual applications, the Java default locale is configured to match the user's locale. Hence the database connection is always synchronized with the user's locale.
See Also: Oracle Database JDBC Developer's Guide and Reference in the Oracle Database Documentation Library |
PL/SQL procedures and PSPs use SQL to access data in the local Oracle database. They can also use SQL and database links to access data from a remote Oracle database.
For example, you can call the following PL/SQL procedure from the mod_plsql
module. It inserts a record into a customer table with the customer name column defined as VARCHAR2
and the customer address column defined as NVARCHAR2
:
procedure addcustomer( cname varchar2 default NULL, caddress nvarchar2 default NULL) is begin .... if (cname is not null) then caddr :=TO_NCHAR(address); insert into customers (name, address) values (cname, caddr); commit; end if; end;
Note that Apache mod_plsql
does not support NVARCHAR
argument passing. As a result, PL/SQL procedures have to use VARCHAR2
for arguments and convert them to NVARCHAR
explicitly before executing the INSERT
statement.
The example uses static SQL to access the customer table. You can also use the DBMS_SQL
PL/SQL package to access data in the database, using dynamic SQL.
See Also: PL/SQL Packages and Types Reference 10g Release 1 (10.1) in the Oracle Database Documentation Library |
Perl scripts access Oracle databases using the DBI/DBD driver for Oracle. The DBI/DBD driver is part of Oracle Application Server. It calls Oracle Callable Interface (OCI) to access the databases. The data retrieved from or inserted into the databases is encoded in the NLS_LANG
character set. Perl scripts should do the following:
Initialize a POSIX locale with the locale specified in the LC_ALL
environment variable
Use a character set equivalent to the NLS_LANG
character set
This allows you to process data retrieved from the databases with POSIX string manipulation functions.
The following code shows how to insert a row into a customer table in an Oracle database through the DBI/DBD driver.
Use Apache::DBI; ... # Connect to the database $constr = 'host=dlsun1304.us.oracle.com;sid=icachedb;port=1521' ; $usr = 'system' ; $pwd = 'manager' ; $dbh = DBI->connect("dbi:Oracle:$constr", $usr, $pwd, {AutoCommit=>1} ) || $r->print("Failed to connect to Oracle: " . DBI->errstr ); # prepare the statement $sql = 'insert into customers (name, address) values (:n, :a)'; $sth = $dbh->prepare( $sql ); $sth->bind_param(':n' , $cname); $sth->bind_param(':a', $caddress); $sth->execute(); $sth->finish(); $dbh->disconnect();
If the target columns are of the SQL NCHAR
data types, then you need to specify the form of use flag for each bind variable. For example, if the address column is of NVARCHAR2
datatype, then you need to add the $sth->func()
function call before executing the SQL statement:
use DBD::Oracle qw(:ora_forms); ... $sql = 'insert into customers (name, address) values (:n, :a)'; $sth = $dbh->prepare($sql); $sth->bind_param(':n', $cname); $sth->bind_param(':a', $caddress); $sth->func( { ':a' => ORA_NCHAR }, 'set_form'); $sth->execute(); $sth->finish(); $dbh->disconnect();
To properly process UTF-8 data in a multilingual application, Perl scripts should do the following:
Use a POSIX locale associated with the UTF-8 character set
Use the UTF-8 Perl module to indicate that all strings in the Perl scripts are in UTF-8
C/C++ applications access the Oracle databases with OCI or Pro*C/C++. You can call OCI directly or use the Pro*C/C++ interface to retrieve and store Unicode data in a UTF-8 database and in SQL NCHAR
data types.
Generally, data retrieved from and inserted into the database is encoded in the NLS_LANG
character set. C/C++ programs should use the same character set for their POSIX locale as the NLS_LANG
character set. Otherwise, the POSIX string functions cannot be used on the character data retrieved from the database, and the character data encoded in the POSIX locale may be corrupted when it is inserted into the database.
For multilingual applications, you may want to use the Unicode API provided in the OCI library instead of relying on the NLS_LANG
character set. This alternative is good for applications written for platforms such as Microsoft Windows, which implement the wchar_t
datatype using UTF-16 Unicode. Using the Unicode API for those platforms bypasses some unnecessary data conversions required when using the regular OCI API.
This section includes the following topics:
Using the Unicode API Provided with OCI to Access the Database
Using Unicode Bind and Define in Pro*C/C++ to Access the Database
Note: OCI libraries are part of Oracle Application Server. You do not need to install the Oracle database client to use them. |
The following example shows how to bind and define the VARCHAR2
and NVARCHAR2
columns of a customer table in C/C++. It uses OCI and is based on the NLS_LANG
character set. Note that the text
datatype is a macro for unsigned char
.
text *sqlstmt= (text *)"SELECT name, address FROM customers WHERE id = :cusid"; text cname[100]; /* Customer Name */ text caddr[200]; /* Customer Address */ text custid[10] = "9876"; /* Customer ID */ ub2 cform = SQLCS_NCHAR; /* Form of Use for NCHAR types */ ... OCIStmtPrepare (stmthp, errhp, sqlstmt, (ub4)strlen ((char *)sqlstmt), (ub4)OCI_NTV_SYNTAX, (ub4)OCI_DEFAULT)); /* Bind the custid buffer */ OCIBindByName(stmthp, &bnd1p, errhp, (text*)":custid", (sb4)strlen((char *)":custid"), (dvoid *) custid, sizeof(cust_id), SQLT_STR, (dvoid *)&insname_ind, (ub2 *) 0, (ub2 *) 0, (ub4) 0,(ub4 *)0, OCI_DEFAULT); /* Define the cname buffer for VARCHAR */ OCIDefineByPos (stmthp, &dfn1p, errhp, (ub4)1, (dvoid *)cname, (sb4)sizeof(cname), SQLT_STR, (dvoid *)0, (ub2 *)0, (ub2 *)0, (ub4)OCI_DEFAULT); /* Define the caddr buffer for the address column in NVARCHAR2 */ OCIDefineByPos (stmthp, &dfn2p, errhp, (ub4)2, (dvoid *)caddr, (sb4)sizeof(caddr), SQLT_STR, (dvoid *)0, (ub2 *)0, (ub2 *)0, (ub4)OCI_DEFAULT); OCIAttrSet((void *) dfn2p, (ub4) OCI_HTYPE_DEFINE, (void *) &cform, (ub4) 0, (ub4)OCI_ATTR_CHARSET_FORM, errhp); ...
You can use the Unicode API that the OCI library provides for multilingual applications.
Turn on the Unicode API by specifying Unicode mode when you create an OCI environment handle. Any handle inherited from the OCI environment handle is set to Unicode mode automatically. By changing to Unicode mode, all text data arguments to the OCI functions are assumed to be in the Unicode text (utext*
) datatype and in UTF-16 encoding. For binding and defining, the data buffers are assumed to be utext
buffers in UTF-16 encoding.
The program code for the Unicode API is similar to the code for the non-Unicode OCI API, with the following exceptions:
All text
data types are changed to the utext
datatype, which is a macro of the unsigned short datatype
All literal strings are changed to Unicode literal strings
All strlen()
functions are changed to wcslen()
functions to calculate the string length in number of Unicode characters instead of bytes
The following Microsoft Windows program shows how to do these tasks:
Create an OCI environment handle with Unicode mode turned on
Bind and define the name column in VARCHAR2
and the address column in NVARCHAR2
of the customers
table
utext *sqlstmt= (text *)L"SELECT name, address FROM customers WHERE id = :cusid"; utext cname[100]; /* Customer Name */ utext caddr[200]; /* Customer Address */ text custid[10] = "9876"; /* Customer ID */ ub1 cform = SQLCS_NCHAR; /* Form of Use for NCHAR types */ ... /* Use Unicode OCI API by specifying UTF-16 mode */ status = OCIEnvCreate((OCIEnv **)&envhp, OCI_UTF16, (void *)0, (void *(*) ()) 0, (void *(*) ()) 0, (void(*) ()) 0, (size_t) 0, (void **)0); ... OCIStmtPrepare (stmthp, errhp, sqlstmt, (ub4)wcslen ((char *)sqlstmt), (ub4)OCI_NTV_SYNTAX, (ub4)OCI_DEFAULT)); /* Bind the custid buffer */ OCIBindByName(stmthp, &bnd1p, errhp, (constant text*) L":custid", (sb4)wcslen(L":custid"), (dvoid *) custid, sizeof(cust_id), SQLT_STR, (dvoid *)&insname_ind, (ub2 *) 0, (ub2 *) 0, (ub4) 0,(ub4 *)0, OCI_DEFAULT); /* Define the cname buffer for the name column in VARCHAR2 */ OCIDefineByPos (stmthp, &dfn1p, errhp, (ub4)1, (dvoid *)cname, (sb4)sizeof(cname), SQLT_STR, (dvoid *)0, (ub2 *)0, (ub2 *)0, (ub4)OCI_DEFAULT); /* Define the caddr buffer for the address column in NVARCHAR2 */ OCIDefineByPos (stmthp, &dfn2p, errhp, (ub4)2, (dvoid *)caddr, (sb4)sizeof(caddr), SQLT_STR, (dvoid *)0, (ub2 *)0, (ub2 *)0, (ub4)OCI_DEFAULT); OCIAttrSet((void *) dfn2p, (ub4) OCI_HTYPE_DEFINE, (void *) &cform, (ub4) 0, (ub4)OCI_ATTR_CHARSET_FORM, errhp); ...
You can use Unicode bind and define in Pro*C/C++ for multilingual applications.
Pro*C/C++ lets you specify UTF-16 Unicode buffers for bind and define operations. There are two ways to specify UTF-16 buffers in Pro*C/C++:
Use the utext
datatype, which is an alias for the unsigned short datatype in C/C++
Use the uvarchar
datatype provided by Pro*C/C++. It will be preprocessed to a struct
with a length field and a utext
buffer field.
struct uvarchar { ub2 len; /* length of arr */ utext arr[1] ; /* UTF-16 buffer */ }; typedef struct uvarchar uvarchar ;
In the following example, there are two host variables: cname
and caddr
. The cname
host variable is declared as a utext
buffer containing 100 UTF-16 code units (unsigned short) for the customer name column in the VARCHAR2
datatype. The caddr
host variable is declared as a uvarchar
buffer containing 50 UCS2 characters for the customer address column in the NVARCHAR2
datatype. The len
and arr
fields are accessible as fields of a struct
.
#include <sqlca.h> #include <sqlucs2.h> main() { ... /* Change to STRING datatype: */ EXEC ORACLE OPTION (CHAR_MAP=STRING) ; utext cname[100] ; /* unsigned short type */ uvarchar caddr[200] ; /* Pro*C/C++ uvarchar type */ ... EXEC SQL SELECT name, address INTO :cname, :caddr FROM customers; /* cname is NULL-terminated */ wprintf(L"ENAME = %s, ADDRESS = %.*s\n", cname, caddr.len, caddr.arr); ... }