Oracle9i Database Globalization Support Guide Release 2 (9.2) Part Number A96529-01 |
|
This chapter describes how to use Oracle's database access products with Unicode. It contains the following topics:
Oracle9i offers several database access products for inserting and retrieving Unicode data. Oracle offers database access products for commonly used programming environments such as Java and C/C++. Data is transparently converted between the database and client programs, which ensures that client programs are independent of the database character set and national character set. In addition, client programs are sometimes even independent of the character datatype, such as NCHAR
or CHAR
, used in the database.
To avoid overloading the database server with data conversion operations, Oracle9i always tries to move them to the client side database access products. In a few cases, data must be converted in the database, which affects performance. This chapter discusses details of the data conversion paths.
Oracle Corporation offers a comprehensive set of database access products that allow programs from different development environments to access Unicode data stored in the database. These products are listed in Table 6-1.
Figure 6-1 shows how the database access products can access the database.
The Oracle Call Interface (OCI) is the lowest level API that the rest of the client-side database access products use. It provides a flexible way for C/C++ programs to access Unicode data stored in SQL CHAR
and NCHAR
datatypes. Using OCI, you can programmatically specify the character set (UTF-8, UTF-16, and others) for the data to be inserted or retrieved. It accesses the database through Oracle Net.
Oracle Pro*C/C++ enables you to embed SQL and PL/SQL in your programs. It uses OCI's Unicode capabilities to provide UTF-16 and UTF-8 data access for SQL CHAR
and NCHAR
datatypes.
The Oracle ODBC driver enables C/C++, Visual Basic, and VBScript programs running on Windows platforms to access Unicode data stored in SQL CHAR
and NCHAR
datatypes of the database. It provides UTF-16 data access by implementing the SQLWCHAR
interface specified in the ODBC standard specification.
The Oracle OLE DB driver enables C/C++, Visual Basic, and VBScript programs running on Windows platforms to access Unicode data stored in SQL CHAR
and NCHAR
datatypes. It provides UTF-16 data access through wide string OLE DB datatypes.
Oracle JDBC drivers are the primary Java programmatic interface for accessing an Oracle9i database. Oracle provides two client-side JDBC drivers:
Both drivers support Unicode data access to SQL CHAR
and NCHAR
datatypes in the database.
Oracle SQLJ acts like a preprocessor that translates embedded SQL in a Java program into a Java source file with JDBC calls. It offers you a higher level programmatic interface to access databases. Like JDBC, SQLJ provides Unicode data access to SQL CHAR
and NCHAR
datatypes in the database.
The PL/SQL and SQL engines process PL/SQL programs and SQL statements on behalf of client-side programs such as OCI and server-side PL/SQL stored procedures. They allow PL/SQL programs to declare NCHAR
and NVARCHAR2
variables and access SQL NCHAR
datatypes in the database.
The following sections describe how each of the database access products supports Unicode data access to an Oracle9i database and offer examples for using those products:
SQL is the fundamental language with which all programs and users access data in an Oracle database either directly or indirectly. PL/SQL is a procedural language that combines the data manipulating power of SQL with the data processing power of procedural languages. Both SQL and PL/SQL can be embedded in other programming languages. This section describes Unicode-related features in SQL and PL/SQL that you can deploy for multilingual applications.
This section contains the following topics:
There are three SQL NCHAR
datatypes:
When you define a table column or a PL/SQL variable as the NCHAR
datatype, the length is always specified as the number of characters. For example, the statement
CREATE TABLE table1 (column1 NCHAR(30));
creates a column with a maximum length of 30 characters. The maximum number of bytes for the column is determined as follows:
maximum number of bytes = (maximum number of characters) x (maximum number of bytes per character)
For example, if the national character set is UTF8, the maximum byte length is 30 characters times 3 bytes per character, or 90 bytes.
The national character set, which is used for all NCHAR
datatypes, is defined when the database is created. In Oracle9i, the national character set can be either UTF8 or AL16UTF16. The default is AL16UTF16.
The maximum column size allowed is 2000 characters when the national character set is UTF8 and 1000 when it is AL16UTF16. The actual data is subject to the maximum byte limit of 2000. The two size constraints must be satisfied at the same time. In PL/SQL, the maximum length of NCHAR
data is 32767 bytes. You can define an NCHAR
variable of up to 32767 characters, but the actual data cannot exceed 32767 bytes. If you insert a value that is shorter than the column length, Oracle pads the value with blanks to whichever length is smaller: maximum character length or maximum byte length.
The NVARCHAR2
datatype specifies a variable length character string that uses the national character set. When you create a table with an NVARCHAR2
column, you specify the maximum number of characters for the column. Lengths for NVARCHAR2
are always in units of characters, just as for NCHAR
. Oracle subsequently stores each value in the column exactly as you specify it, if the value does not exceed the column's maximum length. Oracle does not pad the string value to the maximum length.
The maximum column size allowed is 4000 characters when the national character set is UTF8 and 2000 when it is AL16UTF16. The maximum length of an NVARCHAR2
column in bytes is 4000. Both the byte limit and the character limit must be met, so the maximum number of characters that is actually allowed in an NVARCHAR2
column is the number of characters that can be written in 4000 bytes.
In PL/SQL, the maximum length for an NVARCHAR2
variable is 32767 bytes. You can define NVARCHAR2
variables up to 32767 characters, but the actual data cannot exceed 32767 bytes.
The following CREATE TABLE
statement creates a table with one NVARCHAR2
column of with a maximum length of 2000 characters. If the national character set is UTF8, the following will create a column with maximum character length of 2000 and maximum byte length of 4000.
CREATE TABLE table2 (column2 NVARCHAR2(2000));
NCLOB
is a character large object containing multibyte characters, with a maximum size of 4 gigabytes. Unlike BLOBs
, NCLOBs
have full transactional support so that changes made through SQL, the DBMS_LOB
package, or OCI participate fully in transactions.Manipulations of NCLOB
value can be committed and rolled back. Note, however, that you cannot save an NCLOB
locator in a PL/SQL or OCI variable in one transaction and then use it in another transaction or session.
NCLOB
values are stored in the database using the fixed-width AL16UTF16 character set, regardless of the national character set. Oracle translates the stored Unicode value to the character set requested on the client or on the server, which can be fixed-width or variable-width. When you insert data into an NCLOB
column using a variable-width character set, Oracle converts the data into AL16UTF16 before storing it in the database.
See Also:
Oracle9i Application Developer's Guide - Large Objects (LOBs) for more information about |
Oracle supports implicit conversions between SQL NCHAR
datatypes and other Oracle datatypes, such as CHAR
, VARCHAR2
, NUMBER
, DATE
, ROWID
, and CLOB
. Any implicit conversions for CHAR
and VARCHAR2
datatypes are also supported for SQL NCHAR
datatypes. You can use SQL NCHAR
datatypes the same way as SQL CHAR
datatypes.
Keep in mind these points about implicit conversions:
CHAR
datatypes and SQL NCHAR
datatypes may involve character set conversion when the database and national character sets are different. Padding with blanks may occur if the target data is either CHAR
or NCHAR
.CLOB
and NCLOB
datatypes is not possible. You can, however, use Oracle's explicit conversion functions, TO_CLOB
and TO_NCLOB
.
Data loss can occur during datatype conversion when character set conversion is necessary. If a character in the first character set is not defined in the target character set, then a replacement character will be used in its place. For example, if you try to insert NCHAR
data into a regular CHAR
column and the character data in NCHAR
(Unicode) form cannot be converted to the database character set, the character will be replaced by a replacement character defined by the database character set. The NLS_NCHAR_CONV_EXCP
initialization parameter controls the behavior of data loss during character type conversion. When this parameter is set to TRUE
, any SQL statements that result in data loss return an ORA-12713
error and the corresponding operation is aborted. When this parameter is set to FALSE
, data loss is not reported and the unconvertible characters are replaced with replacement characters. The default value is TRUE
. This parameter works for both implicit and explicit conversion.
In PL/SQL, when data loss occurs during conversion of SQL CHAR
and NCHAR
datatypes, the LOSSY_CHARSET_CONVERSION
exception is raised for both implicit and explicit conversion.
In some cases, conversion between datatypes is possible in only one direction. In other cases, conversion in both directions is possible. Oracle defines a set of rules for conversion between datatypes. Table 6-2 contains the rules for conversion between datatypes.
Statement | Rule |
---|---|
|
Values are converted to the datatype of the target database column. |
|
Data from the database is converted to the datatype of the target variable. |
Variable assignments |
Values on the right of the equal sign are converted to the datatype of the target variable on the left of the equal sign. |
Parameters in SQL and PL/SQL functions |
|
Concatenation || operation or |
If one operand is a SQL |
SQL |
Character value is converted to |
SQL |
Character value is converted to |
SQL |
Character datatypes are converted to |
SQL |
Character values are converted to |
SQL |
Character values are converted to |
SQL |
Character values are converted to |
SQL |
Character values are converted to |
SQL |
Comparisons between SQL When When When there is comparison between SQL |
SQL NCHAR
datatypes can be converted to and from SQL CHAR
datatypes and other datatypes using explicit conversion functions. The examples in this section use the table created by the following statement:
CREATE TABLE customers (id NUMBER, name NVARCHAR2(50), address NVARCHAR2(200), birthdate DATE);
The TO_NCHAR
function converts the data at run time, while the N
function converts the data at compilation time.
INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate);
The following statement converts the values of name
from characters in the national character set to characters in the database character set before selecting them according to the LIKE
clause:
SELECT name FROM customers WHERE TO_CHAR(name) LIKE '%Sm%';
You should see the following output:
NAME -------------------------------------- John Smith
Using the N
function shows that either NCHAR
or CHAR
data can be passed as parameters for the TO_DATE
function. The datatypes can mixed because they are converted at run time.
DECLARE ndatestring NVARCHAR2(20) := N'12-SEP-1975'; BEGIN SELECT name into ndstr FROM customers WHERE (birthdate)> TO_DATE(ndatestring, 'DD-MON-YYYY', N'NLS_DATE_LANGUAGE = AMERICAN'); END;
As demonstrated in Example 6-3, SQL NCHAR
data can be passed to explicit conversion functions. SQL CHAR
and NCHAR
data can be mixed together when using multiple string parameters.
See Also:
Oracle9i SQL Reference for more information about explicit conversion functions for SQL |
Most SQL functions can take arguments of SQL NCHAR
datatypes as well as mixed character datatypes. The return datatype is based on the type of the first argument. If a non-string datatype like NUMBER
or DATE
is passed to these functions, it will be converted to VARCHAR2
. The following examples use the customer
table created in "SQL Functions for Unicode Datatypes".
SELECT INSTR(name, N'Sm', 1, 1) FROM customers;
SELECT CONCAT(name,id) FROM customers;
id
is converted to NVARCHAR2
and then concatenated with name
.
SELECT RPAD(name,100,' ') FROM customers;
The following output results:
RPAD(NAME,100,'') ------------------------------------------ John Smith
Space character ' ' is converted to the corresponding character in the NCHAR
character set and then padded to the right of name
until the total display length reaches 100.
You can input Unicode string literals in SQL and PL/SQL as follows:
N
in front of a single quote marked string literal. This explicitly indicates that the following string literal is an NCHAR
string literal. For example, N'12-SEP-1975'
is an NCHAR
string literal.NCHAR
datatypes, a string literal is converted to a SQL NCHAR
datatype wherever necessary.
NCHR(
n
)
SQL function, which returns the character having the binary equivalent to n
in the national character set, which is AL32UTF8 or AL16UTF16. The result of concatenating several NCHR(
n
)
functions is NVARCHAR2
data. In this way, you can bypass the client and server character set conversions and create an NVARCHAR2
string directly. For example, NCHR(32)
represents a blank character.
Because NCHR(
n
)
is associated with the national character set, portability of the resulting value is limited to applications that run in that national character set. If this is a concern, then use the UNISTR
function to remove portability limitations.
UNISTR
(string
) SQL function. UNISTR
(string
) takes a string and converts it to Unicode. The result is in the national character set for the database. You can embed escape \
bbbb
inside the string. The escape represents the value of a UTF-16 code point with hex number 0xbbbb
. For example, UNISTR('G\0061ry')
represents 'Gary'
.The last two methods can be used to encode any Unicode string literals.
The UTL_FILE
package has been enhanced in Oracle9i to handle Unicode national character set data. The following functions and procedures have been added:
FOPEN_NCHAR
This function opens a file in Unicode for input or output, with the maximum line size specified. With this function, you can read or write a text file in Unicode instead of in the database character set.
GET_LINE_NCHAR
This procedure reads text from the open file identified by the file handle and places the text in the output buffer parameter. With this procedure, you can read a text file in Unicode instead of in the database character set.
PUT_NCHAR
This procedure writes the text string stored in the buffer parameter to the open file identified by the file handle. With this procedure, you can write a text file in Unicode instead of in the database character set.
PUT_LINE_NCHAR
This procedure writes the text string stored in the buffer parameter to the open file identified by the file handle. With this procedure, you can write a text file in Unicode instead of in the database character set.
PUTF_NCHAR
This procedure is a formatted PUT_NCHAR
procedure. With this procedure, you can write a text file in Unicode instead of in the database character set.
See Also:
Oracle9i Supplied PL/SQL Packages and Types Reference for more information about the |
OCI is the lowest-level API for accessing a database, so it offers the best possible performance. When using Unicode with OCI, consider these topics:
The OCIEnvNlsCreate()
function is used to specify a SQL CHAR
character set and a SQL NCHAR
character set when the OCI environment is created. It is an enhanced version of the OCIEnvCreate()
function and has extended arguments for two character set IDs. The OCI_UTF16ID UTF-16 character set ID replaces the Unicode mode introduced in Oracle9i release 1 (9.0.1). For example:
OCIEnv *envhp; status = OCIEnvNlsCreate((OCIEnv **)&envhp, (ub4)0, (void *)0, (void *(*) ()) 0, (void *(*) ()) 0, (void(*) ()) 0, (size_t) 0, (void **)0, (ub2)OCI_UTF16ID, /* Metadata and SQL CHAR character set */ (ub2)OCI_UTF16ID /* SQL NCHAR character set */);
The Unicode mode, in which the OCI_UTF16 flag is used with the OCIEnvCreate()
function, is deprecated.
When OCI_UTF16ID is specified for both SQL CHAR
and SQL NCHAR
character sets, all metadata and bound and defined data are encoded in UTF-16. Metadata includes SQL statements, user names, error messages, and column names. Thus, all inherited operations are independent of the NLS_LANG
setting, and all metatext data parameters (text*
) are assumed to be Unicode text datatypes (utext*
) in UTF-16 encoding.
To prepare the SQL statement when the OCIEnv()
function is initialized with the OCI_UTF16ID character set ID, call the OCIStmtPrepare()
function with a (utext*)
string. The following example runs on the Windows platform only. You may need to change wchar_t
datatypes for other platforms.
const wchar_t sqlstr[] = L"SELECT * FROM ENAME=:ename"; ... OCIStmt* stmthp; sts = OCIHandleAlloc(envh, (void **)&stmthp, OCI_HTYPE_STMT, 0, NULL); status = OCIStmtPrepare(stmthp, errhp,(const text*)sqlstr, wcslen(sqlstr), OCI_NTV_SYNTAX, OCI_DEFAULT);
To bind and define data, you do not have to set the OCI_ATTR_CHARSET_ID
attribute because the OCIEnv()
function has already been initialized with UTF-16 character set IDs. The bind variable names must be also UTF-16 strings.
/* Inserting Unicode data */ OCIBindByName(stmthp1, &bnd1p, errhp, (const text*)L":ename", (sb4)wcslen(L":ename"), (void *) ename, sizeof(ename), SQLT_STR, (void *)&insname_ind, (ub2 *) 0, (ub2 *) 0, (ub4) 0, (ub4 *)0, OCI_DEFAULT); OCIAttrSet((void *) bnd1p, (ub4) OCI_HTYPE_BIND, (void *) &ename_col_len, (ub4) 0, (ub4)OCI_ATTR_MAXDATA_SIZE, errhp); ... /* Retrieving Unicode data */ OCIDefineByPos (stmthp2, &dfn1p, errhp, (ub4)1, (void *)ename, (sb4)sizeof(ename), SQLT_STR, (void *)0, (ub2 *)0, (ub2*)0, (ub4)OCI_DEFAULT);
The OCIExecute()
function performs the operation.
Unicode character set conversions take place between an OCI client and the database server if the client and server character sets are different. The conversion occurs on either the client or the server depending on the circumstances, but usually on the client side.
You can lose data during conversion if you call an OCI API inappropriately. If the server and client character sets are different, you can lose data when the destination character set is a smaller set than the source character set. You can avoid this potential problem if both character sets are Unicode character sets (for example, UTF8 and AL16UTF16).
When you bind or define SQL NCHAR
datatypes, you should set the OCI_ATTR_CHARSET_FORM
attribute to SQLCS_NCHAR
. Otherwise, you can lose data because the data is converted to the database character set before converting to or from the national character set. This occurs only if the database character set is not Unicode.
Redundant data conversions can cause performance degradation in your OCI applications. These conversions occur in two cases:
CHAR
datatypes and set the OCI_ATTR_CHARSET_FORM
attribute to SQLCS_NCHAR
, data conversions take place from client character set to the national database character set, and from the national character set to the database character set. No data loss is expected, but two conversions happen, even though it requires only one.NCHAR
datatypes and do not set OCI_ATTR_CHARSET_FORM
, data conversions take place from client character set to the database character set, and from the database character set to the national database character set. In the worst case, data loss can occur if the database character set is smaller than the client's.To avoid performance problems, you should always set OCI_ATTR_CHARSET_FORM
correctly, based on the datatype of the target columns. If you do not know the target datatype, you should set the OCI_ATTR_CHARSET_FORM
attribute to SQLCS_NCHAR
when binding and defining.
Table 6-3 contains information about OCI character set conversions.
Data conversion can result in data expansion, which can cause a buffer to overflow. For binding operations, you need to set the OCI_ATTR_MAXDATA_SIZE
attribute to a large enough size to hold the expanded data on the server. If this is difficult to do, you need to consider changing the table schema. For defining operations, client applications need to allocate enough buffer space for the expanded data. The size of the buffer should be the maximum length of the expanded data. You can estimate the maximum buffer length with the following calculation:
This method is the simplest and quickest way, but it may not be accurate and can waste memory. It is applicable to any character set combination. For example, for UTF-16 data binding and defining, the following example calculates the client buffer:
ub2 csid = OCI_UTF16ID; oratext *selstmt = "SELECT ename FROM emp"; counter = 1; ... OCIStmtPrepare(stmthp, errhp, selstmt, (ub4)strlen((char*)selstmt), OCI_NTV_SYNTAX, OCI_DEFAULT); OCIStmtExecute ( svchp, stmthp, errhp, (ub4)0, (ub4)0, (CONST OCISnapshot*)0, (OCISnapshot*)0, OCI_DESCRIBE_ONLY); OCIParamGet(stmthp, OCI_HTYPE_STMT, errhp, &myparam, (ub4)counter); OCIAttrGet((void*)myparam, (ub4)OCI_DTYPE_PARAM, (void*)&col_width, (ub4*)0, (ub4)OCI_ATTR_DATA_SIZE, errhp); ... maxenamelen = (col_width + 1) * sizeof(utext); cbuf = (utext*)malloc(maxenamelen); ... OCIDefineByPos(stmthp, &dfnp, errhp, (ub4)1, (void *)cbuf, (sb4)maxenamelen, SQLT_STR, (void *)0, (ub2 *)0, (ub2*)0, (ub4)OCI_DEFAULT); OCIAttrSet((void *) dfnp, (ub4) OCI_HTYPE_DEFINE, (void *) &csid, (ub4) 0, (ub4)OCI_ATTR_CHARSET_ID, errhp); OCIStmtFetch(stmthp, errhp, 1, OCI_FETCH_NEXT, OCI_DEFAULT); ...
You can use UTF8 and AL32UTF8 by setting NLS_LANG
for OCI client applications. If you do not need supplementary characters, then it does not matter whether you choose UTF8 or AL32UTF8. However, if your OCI applications might handle supplementary characters, then you need to make a decision. Because UTF8 can require up to three bytes for each character, one supplementary character is represented in two code points, totalling six bytes. In AL32UTF8, one supplementary character is represented in one code point, totalling four bytes.
Do not set NLS_LANG
to AL16UTF16, because AL16UTF16 is the national character set for the server. If you need to use UTF-16, then you should specify the client character set to OCI_UTF16ID,
using the OCIAttrSet()
function when binding or defining data.
To specify a Unicode character set for binding and defining data with SQL CHAR
datatypes, you may need to call the OCIAttrSet()
function to set the appropriate character set ID after OCIBind()
or OCIDefine()
APIs. There are two typical cases:
OCIBind()
or OCIDefine()
followed by OCIAttrSet
() to specify UTF-16 Unicode character set encoding. For example:
... ub2 csid = OCI_UTF16ID; utext ename[100]; /* enough buffer for ENAME */ ... /* Inserting Unicode data */ OCIBindByName(stmthp1, &bnd1p, errhp, (oratext*)":ENAME", (sb4)strlen((char *)":ENAME"), (void *) ename, sizeof(ename), SQLT_STR, (void *)&insname_ind, (ub2 *) 0, (ub2 *) 0, (ub4) 0, (ub4 *)0, OCI_DEFAULT); OCIAttrSet((void *) bnd1p, (ub4) OCI_HTYPE_BIND, (void *) &csid, (ub4) 0, (ub4)OCI_ATTR_CHARSET_ID, errhp); OCIAttrSet((void *) bnd1p, (ub4) OCI_HTYPE_BIND, (void *) &ename_col_len, (ub4) 0, (ub4)OCI_ATTR_MAXDATA_SIZE, errhp); ... /* Retrieving Unicode data */ OCIDefineByPos (stmthp2, &dfn1p, errhp, (ub4)1, (void *)ename, (sb4)sizeof(ename), SQLT_STR, (void *)0, (ub2 *)0, (ub2*)0, (ub4)OCI_DEFAULT); OCIAttrSet((void *) dfn1p, (ub4) OCI_HTYPE_DEFINE, (void *) &csid, (ub4) 0, (ub4)OCI_ATTR_CHARSET_ID, errhp); ...
If bound buffers are of the utext
datatype, you should add a cast (text
*) when OCIBind()
or OCIDefine()
is called. The value of the OCI_ATTR_MAXDATA_SIZE
attribute is usually determined by the column size of the server character set because this size is only used to allocate temporary buffer space for conversion on the server when you perform binding operations.
OCIBind()
or OCIDefine()
with the NLS_LANG
character set specified as UTF8 or AL32UTF8.
UTF8 or AL32UTF8 can be set in the NLS_LANG
environment variable. You call OCIBind()
and OCIDefine()
in exactly the same manner as when you are not using Unicode. Set the NLS_LANG
environment variable to UTF8 or AL32UTF8 and run the following OCI program:
... oratext ename[100]; /* enough buffer size for ENAME */ ... /* Inserting Unicode data */ OCIBindByName(stmthp1, &bnd1p, errhp, (oratext*)":ENAME", (sb4)strlen((char *)":ENAME"), (void *) ename, sizeof(ename), SQLT_STR, (void *)&insname_ind, (ub2 *) 0, (ub2 *) 0, (ub4) 0, (ub4 *)0, OCI_DEFAULT); OCIAttrSet((void *) bnd1p, (ub4) OCI_HTYPE_BIND, (void *) &ename_col_len, (ub4) 0, (ub4)OCI_ATTR_MAXDATA_SIZE, errhp); ... /* Retrieving Unicode data */ OCIDefineByPos (stmthp2, &dfn1p, errhp, (ub4)1, (void *)ename, (sb4)sizeof(ename), SQLT_STR, (void *)0, (ub2 *)0, (ub2*)0, (ub4)OCI_DEFAULT); ...
Oracle Corporation recommends that you access SQL NCHAR
datatypes using UTF-16 binding or defining when using OCI. Starting from Oracle9i, SQL NCHAR
datatypes are Unicode datatypes with an encoding of either UTF8 or AL16UTF16. To access data in SQL NCHAR
datatypes, set the OCI_ATTR_CHARSET_FORM
attribute to SQLCS_NCHAR
between binding or defining and execution so that it performs an appropriate data conversion without data loss. The length of data in SQL NCHAR
datatypes is always in the number of Unicode code points.
The following program is a typical example of inserting and fetching data against an NCHAR
data column:
... ub2 csid = OCI_UTF16ID; ub1 cform = SQLCS_NCHAR; utext ename[100]; /* enough buffer for ENAME */ ... /* Inserting Unicode data */ OCIBindByName(stmthp1, &bnd1p, errhp, (oratext*)":ENAME", (sb4)strlen((char *)":ENAME"), (void *) ename, sizeof(ename), SQLT_STR, (void *)&insname_ind, (ub2 *) 0, (ub2 *) 0, (ub4) 0, (ub4 *)0, OCI_DEFAULT); OCIAttrSet((void *) bnd1p, (ub4) OCI_HTYPE_BIND, (void *) &cform, (ub4) 0, (ub4)OCI_ATTR_CHARSET_FORM, errhp); OCIAttrSet((void *) bnd1p, (ub4) OCI_HTYPE_BIND, (void *) &csid, (ub4) 0, (ub4)OCI_ATTR_CHARSET_ID, errhp); OCIAttrSet((void *) bnd1p, (ub4) OCI_HTYPE_BIND, (void *) &ename_col_len, (ub4) 0, (ub4)OCI_ATTR_MAXDATA_SIZE, errhp); ... /* Retrieving Unicode data */ OCIDefineByPos (stmthp2, &dfn1p, errhp, (ub4)1, (void *)ename, (sb4)sizeof(ename), SQLT_STR, (void *)0, (ub2 *)0, (ub2*)0, (ub4)OCI_DEFAULT); OCIAttrSet((void *) dfn1p, (ub4) OCI_HTYPE_DEFINE, (void *) &csid, (ub4) 0, (ub4)OCI_ATTR_CHARSET_ID, errhp); OCIAttrSet((void *) dfn1p, (ub4) OCI_HTYPE_DEFINE, (void *) &cform, (ub4) 0, (ub4)OCI_ATTR_CHARSET_FORM, errhp); ...
In order to write (bind) and read (define) UTF-16 data for CLOB
or NCLOB
columns, the UTF-16 character set ID must be specified as OCILobWrite()
and OCILobRead()
. When you write UTF-16 data into a CLOB
column, call OCILobWrite()
as follows:
... ub2 csid = OCI_UTF16ID; err = OCILobWrite (ctx->svchp, ctx->errhp, lobp, &amtp, offset, (void *) buf, (ub4) BUFSIZE, OCI_ONE_PIECE, (void *)0, (sb4 (*)()) 0, (ub2) csid, (ub1) SQLCS_IMPLICIT);
The amtp
parameter is the data length in number of Unicode code points. The offset
parameter indicates the offset of data from the beginning of the data column. The csid
parameter must be set for UTF-16 data.
To read UTF-16 data from CLOB
columns, call OCILobRead()
as follows:
... ub2 csid = OCI_UTF16ID; err = OCILobRead(ctx->svchp, ctx->errhp, lobp, &amtp, offset, (void *) buf, (ub4)BUFSIZE , (void *) 0, (sb4 (*)()) 0, (ub2)csid, (ub1) SQLCS_IMPLICIT);
The data length is always represented in the number of Unicode code points. Note one Unicodesupplementary character is counted as two code points, because the encoding is UTF-16. After binding or defining LOB
column, you can measure the data length stored in the LOB
column using OCILobGetLength()
. The returning value is the data length in the number of code points if you bind or define as UTF-16.
err = OCILobGetLength(ctx->svchp, ctx->errhp, lobp, &lenp);
If you are using an NCLOB
, you must set OCI_ATTR_CHARSET_FORM
to SQLCS_NCHAR
.
Pro*C/C++ provides the following ways to insert or retrieve Unicode data into or from the database:
VARCHAR
Pro*C/C++ datatype or the native C/C++ text
datatype, a program can access Unicode data stored in SQL CHAR
datatypes of a UTF8 or AL32UTF8 database. Alternatively, a program could use the C/C++ native text
type.UVARCHAR
Pro*C/C++ datatype or the native C/C++ utext
datatype, a program can access Unicode data stored in NCHAR
datatypes of a database.NVARCHAR
Pro*C/C++ datatype, a program can access Unicode data stored in NCHAR
datatypes. The difference between UVARCHAR
and NVARCHAR
in a Pro*C/C++ program is that the data for the UVARCHAR
datatype is stored in a utext
buffer while the data for the NVARCHAR
datatype is stored in a text
datatype.Pro*C/C++ does not use the Unicode OCI API for SQL text. As a result, embedded SQL text must be encoded in the character set specified in the NLS_LANG
environment variable.
This section contains the following topics:
Data conversion occurs in the OCI layer, but it is the Pro*C/C++ preprocessor that instructs OCI which conversion path should be taken based on the datatypes used in a Pro*C/C++ program. Table 6-4 illustrates the conversion paths:
The Pro*C/C++ VARCHAR
datatype is preprocessed to a struct with a length
field and text
buffer field. The following example uses the C/C++ text
native datatype and the VARCHAR
Pro*C/C++ datatypes to bind and define table columns.
#include <sqlca.h> main() { ... /* Change to STRING datatype: */ EXEC ORACLE OPTION (CHAR_MAP=STRING) ; text ename[20] ; /* unsigned short type */ varchar address[50] ; /* Pro*C/C++ uvarchar type */ EXEC SQL SELECT ename, address INTO :ename, :address FROM emp; /* ename is NULL-terminated */ printf(L"ENAME = %s, ADDRESS = %.*s\n", ename, address.len, address.arr); ... }
When you use the VARCHAR
datatype or native text
datatype in a Pro*C/C++ program, the preprocessor assumes that the program intends to access columns of SQL CHAR
datatypes instead of SQL NCHAR
datatypes in the database. The preprocessor generates C/C++ code to reflect this fact by doing a bind or define using the SQLCS_IMPLICIT
value for the OCI_ATTR_CHARSET_FORM
attribute. As a result, if a bind or define variable is bound to a column of SQL NCHAR
datatypes in the database, implicit conversion happens in the database server to convert the data from the database character set to the national database character set and vice versa. During the conversion, data loss occurs when the database character set is a smaller set than the national character set.
The Pro*C/C++ NVARCHAR
datatype is similar to the Pro*C/C++ VARCHAR
datatype. It should be used to access SQL NCHAR
datatypes in the database. It tells Pro*C/C++ preprocessor to bind or define a text buffer to the column of SQL NCHAR
datatypes. The preprocessor will specify the SQLCS_NCHAR
value for the OCI_ATTR_CHARSET_FORM
attribute of the bind or define variable. As a result, no implicit conversion occurs in the database.
If the NVARCHAR
buffer is bound against columns of SQL CHAR
datatypes, the data in the NVARCHAR
buffer (encoded in the NLS_LANG
character set) is converted to or from the national character set in OCI, and the data is then converted to the database character set in the database server. Data can be lost when the NLS_LANG
character set is a larger set than the database character set.
The UVARCHAR
datatype is preprocessed to a struct with a length
field and utext
buffer field. The following example code contains two host variables, ename
and address
. The ename
host variable is declared as a utext
buffer containing 20 Unicode characters. The address
host variable is declared as a uvarchar
buffer containing 50 Unicode characters, the len
and arr
fields are accessible as fields of a struct.
#include <sqlca.h> #include <sqlucs2.h> main() { ... /* Change to STRING datatype: */ EXEC ORACLE OPTION (CHAR_MAP=STRING) ; utext ename[20] ; /* unsigned short type */ uvarchar address[50] ; /* Pro*C/C++ uvarchar type */ EXEC SQL SELECT ename, address INTO :ename, :address FROM emp; /* ename is NULL-terminated */ wprintf(L"ENAME = %s, ADDRESS = %.*s\n", ename, address.len, address.arr); ... }
When you use the UVARCHAR
datatype or native utext
datatype in Pro*C/C++ programs, the preprocessor assumes that the program intends to access SQL NCHAR
datatypes. The preprocessor generates C/C++ code by binding or defining using the SQLCS_NCHAR
value for OCI_ATTR_CHARSET_FORM
attribute. As a result, if a bind or define variable is bound to a column of a SQL NCHAR
datatype, an implicit conversion of the data from the national character set occurs in the database server. However, there is no data lost in this scenario because the national character set is always a larger set than the database character set.
Oracle provides three JDBC drivers for Java programs to access Unicode data in the database:
Java programs can insert or retrieve Unicode data to and from columns of SQL CHAR
and NCHAR
datatypes. Specifically, JDBC enables Java programs to bind or define Java strings to SQL CHAR
and NCHAR
datatypes. Because Java's string
datatype is UTF-16 encoded, data retrieved from or inserted into the database must be converted from UTF-16 to the database character set or the national character set and vice versa. The SQLJ preprocessor enables Java programs to embed SQL statements to simplify database access code. It translates the embedded SQL statements of a Java program to the corresponding JDBC calls. Similar to JDBC, SQLJ enables programs to bind or define Java strings to a SQL CHAR
or NCHAR
column. JDBC and SQLJ also allow you to specify the PL/SQL and SQL statements in Java strings so that any non-ASCII schema object names can be referenced in Java programs.
This section contains the following topics:
Oracle JDBC drivers allow you to access SQL CHAR
datatypes in the database using Java string bind or define variables. The following code illustrates how to bind or define a Java string to a CHAR
column:
int empno = 12345; String ename = "Joe" PreparedStatement pstmt = conn.prepareStatement("INSERT INTO" + "emp (ename, empno) VALUES (?, ?)"); pstmt.setString(1, ename); pstmt.setInt(2, empno); pstmt.execute(); /* execute to insert into first row */ empno += 1; /* next employee number */ ename = "\uFF2A\uFF4F\uFF45"; /* Unicode characters in name */ pstmt.setString(1, ename); pstmt.setInt(2, empno); pstmt.execute(); /* execute to insert into second row */
For binding or defining Java string variables to SQL NCHAR
datatypes, Oracle extends the JDBC specification to add the PreparedStatement
.setFormOfUse()
method through which you can explicitly specify the target column of a bind variable to be a SQL NCHAR
datatype. The following code illustrates how to bind a Java string to an NCHAR
column:
int empno = 12345; String ename = "Joe" oracle.jdbc.OraclePreparedStatement pstmt = (oracle.jdbc.OraclePreparedStatement) conn.prepareStatement("INSERT INTO emp (ename, empno) VALUES (?, ?)"); pstmt.setFormOfUse(1, oracle.jdbc.OraclePreparedStatement.FORM_NCHAR); pstmt.setString(1, ename); pstmt.setInt(2, empno); pstmt.execute(); /* execute to insert into first row */ empno += 1; /* next employee number */ ename = "\uFF2A\uFF4F\uFF45"; /* Unicode characters in name */ pstmt.setString(1, ename); pstmt.setInt(2, empno); pstmt.execute(); /* execute to insert into second row */
You can bind or define a Java string against an NCHAR
column without explicitly specifying the form of use argument. This implies the following:
setString()
method, JDBC assumes that the bind or define variable is for the SQL CHAR
column. As a result, it tries to convert them to the database character set. When the data gets to the database, the database implicitly converts the data in the database character set to the national character set. During this conversion, data can be lost when the database character set is a subset of the national character set. Because the national character set is either UTF8 or AL16UTF16, data loss would happen if the database character set is not UTF8 or AL32UTF8.CHAR
to SQL NCHAR
datatypes happens in the database, database performance is degraded.In addition, if you bind or define a Java string for a column of SQL CHAR
datatypes but specify the form of use argument, performance of the database will be degraded. However, data should not be lost because the national character set is always a larger set than the database character set.
Because Java strings are always encoded in UTF-16, JDBC drivers transparently convert data from the database character set to UTF-16 or the national character set.The conversion paths taken are different for the three JDBC drivers:
For the OCI driver, the SQL statements are always converted to the database character set by the driver before it is sent to the database for processing. For Java string
bind or define variables, Table 6-5 summarizes the conversion paths taken for different scenarios:
For the thin driver, SQL statements are always converted to either the database character set or to UTF-8 by the driver before they are sent to the database for processing. The thin driver also notifies the database that a SQL statement requires further conversion before being processed. The database, in turn, converts the SQL statement to the database character set. For Java string
bind and define variables, the conversion paths shown in Table 6-6 are taken for the thin driver:
The JDBC server-side internal driver runs in the server. All conversions are done in the database server. SQL statements specified as Java strings are converted to the database character set. Java string
bind or define variables are converted to the database character sets if the form of use argument is not specified. Otherwise, they are converted to the national character set.
You should use Oracle's ODBC and OLE DB drivers to access Oracle9i when using a Windows platform. This section describes how these drivers support Unicode. It includes the following topics:
Oracle's ODBC and OLE DB drivers can handle Unicode data properly without data loss. For example, you can run a Unicode ODBC application containing Japanese data on English Windows if you install Japanese fonts and an input method editor for entering Japanese characters.
In Oracle9i, Oracle provides Windows platform-specific ODBC and OLE DB drivers only. For Unix platforms, contact your vendor.
OCI Unicode binding and defining features are used by the ODBC and OLE DB drivers to handle Unicode data. OCI Unicode data binding and defining features are independent from NLS_LANG
. This means Unicode data is handled properly, irrespective of the NLS_LANG
setting on the platform.
In general, no redundant data conversion occurs unless you specify a different client datatype from that of the server. If you bind Unicode buffer SQL_C_WCHAR
with a Unicode data column like NCHAR
, for example, ODBC and OLE DB drivers bypass it between the application and OCI layer.
If you do not specify datatypes before fetching, but call SQLGetData
with the client datatypes instead, then the conversions in Table 6-7 occur.
You must specify the datatype for inserting and updating operations.
The datatype of the ODBC client buffer is given when you call SQLGetData
but not immediately. Hence, SQLFetch
does not have the information.
Because the ODBC driver guarantees data integrity, if you perform implicit bindings, redundant conversion may result in performance degradation. Your choice is the trade-off between performance with explicit binding or usability with implicit binding.
Unlike ODBC, OLE DB only enables you to perform implicit bindings for inserting, updating, and fetching data. The conversion algorithm for determining the intermediate character set is the same as the implicit binding cases of ODBC.
In ODBC Unicode applications, use SQLWCHAR
to store Unicode data. All standard Windows Unicode functions can be used for SQLWCHAR
data manipulations. For example, wcslen
counts the number of characters of SQLWCHAR
data:
SQLWCHAR sqlStmt[] = L"select ename from emp"; len = wcslen(sqlStmt);
Microsoft's ODBC 3.5 specification defines three Unicode datatype identifiers for the SQL_C_WCHAR
, SQL_C_WVARCHAR
, and SQL_WLONGVARCHAR
clients; and three Unicode datatype identifiers for servers SQL_WCHAR
, SQL_WVARCHAR
, and SQL_WLONGVARCHAR
.
For binding operations, specify datatypes for both client and server using SQLBindParameter
. The following is an example of Unicode binding, where the client buffer Name
indicates that Unicode data (SQL_C_WCHAR
) is bound to the first bind variable associated with the Unicode column (SQL_WCHAR
):
SQLBindParameter(StatementHandle, 1, SQL_PARAM_INPUT, SQL_C_WCHAR, SQL_WCHAR, NameLen, 0, (SQLPOINTER)Name, 0, &Name);
Table 6-9 represents the datatype mappings of the ODBC Unicode datatypes for the server against SQL NCHAR
datatypes.
ODBC Datatype | Oracle Datatype |
---|---|
|
|
|
|
|
|
According to ODBC specifications, SQL_WCHAR
, SQL_WVARCHAR
, and SQL_WLONGVARCHAR
are treated as Unicode data, and are therefore measured in the number of characters instead of the number of bytes.
OLE DB offers the wchar_t
*, BSTR
, and OLESTR
datatypes for the Unicode client C datatype. In practice, wchar_t
is the most common datatype and the others are for specific purposes. The following example assigns a static SQL statement:
wchar_t *sqlStmt = OLESTR("SELECT ename FROM emp");
The OLESTR
macro works exactly like an "L" modifier to indicate the Unicode string. If you need to allocate Unicode data buffer dynamically using OLESTR
, use the IMalloc
allocator (for example, CoTaskMemAlloc
). However, using OLESTR
is not the normal method for variable length data; use wchar_t
* instead for generic string types. BSTR
is similar. It is a string with a length prefix in the memory location preceding the string. Some functions and methods can accept only BSTR
Unicode datatypes. Therefore, BSTR
Unicode string must be manipulated with special functions like SysAllocString
for allocation and SysFreeString
for freeing memory.
Unlike ODBC, OLE DB does not allow you to specify the server datatype explicitly. When you set the client datatype, the OLE DB driver automatically performs data conversion if necessary.
Table 6-10 illustrates OLE DB datatype mapping.
OLE DB Datatype | Oracle Datatype |
---|---|
|
|
If DBTYPE_BSTR
is specified, it is assumed to be DBTYPE_WCHAR
because both are Unicode strings.
ADO is a high-level API to access database with the OLE DB and ODBC drivers. Most database application developers use the ADO interface on Windows because it is easily accessible from Visual Basic, the primary scripting language for Active Server Pages (ASP) for the Internet Information Server (IIS). To OLE DB and ODBC drivers, ADO is simply an OLE DB consumer or ODBC application. ADO assumes that OLE DB and ODBC drivers are Unicode-aware components; hence, it always attempts to manipulate Unicode data.
|
Copyright © 1996, 2002 Oracle Corporation. All Rights Reserved. |
|