Skip Headers
Oracle® Application Server Containers for J2EE Support for JavaServer Pages Developer's Guide
10g Release 2 (10.1.2)
B14014-02
  Go To Documentation Library
Library
Go To Product List
Product
Go To Table Of Contents
Contents
Go To Index
Index

Previous
Previous
Next
Next
 

9 JSP Globalization Support

The JSP container in OC4J provides standard globalization support (also known as National Language Support, or NLS) according to the JSP specification, and also offers extended support for servlet environments that do not support multibyte parameter encoding.

Standard Java support for localized content depends on the use of Unicode for uniform internal representation of text. Unicode is used as the base character set for conversion to alternative character sets. (The Unicode version depends on the JDK version. You can find the Unicode version through the Sun Microsystems Javadoc for the java.lang.Character class.)

This chapter describes key aspects of JSP support for globalization and internationalization. The following sections are included:


Note:

For detailed information about Oracle Application Server Globalization Support, see the Oracle Application Server Globalization Guide.

Content Type Settings

The following sections cover standard ways to statically or dynamically specify the content type for a JSP page. There is also discussion of an Oracle extension method that enables you to specify a non-IANA (Internet Assigned Numbers Authority) character set for the JSP writer object.

Content Type Settings in the page Directive

The page directive has two attributes, pageEncoding and contentType, that affect the character encoding of the JSP page source (during translation) or response (during runtime). The contentType attribute also affects the MIME type of the response. The function of each attribute is as follows:

  • You can use contentType to set the character encoding of the page source and response, and the MIME type of the response.

  • You can use pageEncoding to set the character encoding of the page source. The main purpose of this attribute, which was introduced in the JSP 1.2 specification, is to allow you to set a page source character encoding that is different than the response character encoding. However, this setting also acts as a default for the response character encoding if there is no contentType attribute that specifies a character set.

There is more information about the relationship between contentType and pageEncoding later in this section.

Use the following syntax for contentType:

contentType="TYPE; charset=character_set"

Alternatively, to set the MIME type while using the default character set:

contentType="TYPE"

Use the following syntax for pageEncoding:

pageEncoding="character_set"

Use the following syntax to set everything:

<%@ page ... contentType="TYPE; charset=character_set" 
             pageEncoding="character_set" ... %>

TYPE is an IANA MIME type; character_set is an IANA character set. When specifying a character set through the contentType attribute, the space after the semicolon is optional.

Here are some examples of contentType and pageEncoding settings:

<%@ page language="java" contentType="text/html" %>

or:

<%@ page language="java" contentType="text/html; charset=ISO-8859-1" %>

or:

<%@ page language="java" contentType="text/html; charset=ISO-8859-1" 
                         pageEncoding="US-ASCII" %>

Without any page directive settings, default settings are as follows:

  • The default MIME type is text/html for traditional JSP pages; it is text/xml for JSP XML documents.

  • The default for the page source character encoding (for translation) is ISO-8859-1 (also known as Latin-1) for traditional JSP pages; it is UTF-8 or UTF-16 for JSP XML documents.

  • The default for the response character encoding is ISO-8859-1 for traditional JSP pages; it is UTF-8 or UTF-16 for JSP XML documents.

The determination of UTF-8 versus UTF-16 is according to "Autodetection of Character Encodings" in the XML specification, at the following location:

http://www.w3.org/TR/REC-xml.html

Be aware, however, that there is a relationship between pageEncoding and contentType regarding character encodings, as documented in Table 9-1.

Table 9-1 Effect of pageEncoding and contentType on Character Encodings

pageEncoding Status contentType Status Page Source Encoding Status Response Encoding Status

Specified

Specified

According to pageEncoding

According to contentType

Specified

Not specified

According to pageEncoding

According to pageEncoding

Not specified

Specified

According to contentType

According to contentType

Not specified

Not specified

According to default

According to default


Be aware of the following important usage notes.

  • A page directive that sets contentType or pageEncoding should appear as early as possible in the JSP page.

  • When a page is a JSP XML document, any pageEncoding setting is ignored. The JSP container will instead use the XML encoding declaration of the document. Consider the following example:

    <?xml version="1.0" encoding="EUC-JP" ?>
    <jsp:root xmlns:jsp="http://java.sun.com/JSP/Page" version="1.2">
    <jsp:directive.page contentType="text/html;charset=Shift_Jis" />
    <jsp:directive.page pageEncoding="UTF-8" />
    ...
    
    

    The effective page encoding would be EUC-JP, not UTF-8.

  • You should use pageEncoding only for pages where the byte sequence represents legal characters in the target character set.

  • You should use contentType only for pages or response output where the byte sequence represents legal characters in the target character set.

  • The target character set of the response output (as specified by contentType, for example) should be a superset of the character set of the page source. For example, UTF-8 is the superset of Big5, but ISO-8859-1 is not.

  • The parameters of a page directive are static. If a page discovers during execution that a different character set specification is necessary for the response, it can do one of the following:

    or:

    • Forward the request to another JSP page or to a servlet.

  • A traditional JSP page source (not a JSP XML document) written in a character set other than ISO-8859-1 must set the appropriate character set in a page directive (through the contentType or pageEncoding attribute). The character set for the page encoding cannot be set dynamically, because the JSP container has to be aware of the setting during translation.

  • This manual, for simplicity, assumes the typical case that the page text, request parameters, and response parameters all use the same encoding (although other scenarios are technically possible). Request parameter encoding is controlled by the browser, although Netscape and Internet Explorer browsers follow the setting you specify for the response parameters.

The IANA maintains a registry of MIME types. See the following site for a list of types:

http://www.iana.org/assignments/media-types-parameters

The IANA maintains a registry of character encodings at the following site. Use the indicated "preferred MIME name" if one is listed:

http://www.iana.org/assignments/character-sets

You should use only character sets from the IANA list, except for any additional Oracle extensions as described in "Oracle Extension for the Character Set of the JSP Writer Object".

Dynamic Content Type Settings

For situations where the appropriate content type for the HTTP response is not known until runtime, you can set it dynamically in the JSP page. The standard javax.servlet.ServletResponse interface specifies the following method for this purpose:

void setContentType(java.lang.String contenttype)


Important:

To use dynamic content type settings in an OC4J environment, you must enable the JSP static_text_in_chars configuration parameter. See "JSP Configuration Parameters" for a description.

The implicit response object of a JSP page is a javax.servlet.http.HttpServletResponse instance, where the HttpServletResponse interface extends the ServletResponse interface.

The setContentType() method input, like the contentType setting in a page directive, can include a MIME type only, or both a character set and a MIME type. For example:

response.setContentType("text/html; charset=UTF-8");

or:

response.setContentType("text/html");

As with a page directive, the default MIME type is text/html for traditional JSP pages or text/xml for JSP XML documents, and the default character encoding is ISO-8859-1.

Set the content type as early as possible in the page, before writing any output to the JspWriter object.

The setContentType() method has no effect on interpreting the text of the JSP page during translation. If a particular character set is required during translation, that must be specified in a page directive, as described in "Content Type Settings in the page Directive".


Note:

In servlet 2.2 and higher environments, such as OC4J, the response object has a setLocale() method that takes a java.util.Locale object as input and sets the character set based on the specified locale. For example, the following method call results in a character set of Shift_JIS:
response.setLocale(new Locale("ja", "JP"));

For dynamic specification of the character set, the most recent call to setContentType() or setLocale() takes precedence.


Oracle Extension for the Character Set of the JSP Writer Object

In standard usage, the character set of the content type of the response object, as determined by the page directive contentType parameter or the response.setContentType() method, automatically becomes the character set of the JSP writer object as well. The JSP writer object is a javax.servlet.jsp.JspWriter instance.

There are some character sets, however, that are not recognized by IANA and therefore cannot be used in a standard content type setting. For this reason, OC4J provides the static setWriterEncoding() method of the oracle.jsp.util.PublicUtil class:

static void setWriterEncoding(JspWriter out, String encoding)

You can use this method to specify the character set of the JSP writer directly, overriding the character set of the response object. The following example uses Big5 as the character set of the content type, but specifies MS950, a non-IANA Hong Kong dialect of Big5, as the character set of the JSP writer:

<%@ page contentType="text/html; charset=Big5" %>
<% oracle.jsp.util.PublicUtil.setWriterEncoding(out, "MS950"); %>


Note:

Use the setWriterEncoding() method as early as possible in the JSP page.

JSP Support for Multibyte Parameter Encoding

The servlet specification has a method, setCharacterEncoding(), in the javax.servlet.ServletRequest interface. This method is useful in case the default encoding of the servlet container is not suitable for multibyte request parameters and bean property settings, such as for a getParameter() call in Java code or a jsp:setProperty tag to set a bean property in JSP code.

The setCharacterEncoding() method and equivalent Oracle extensions affect parameter names and values, specifically:

These topics are covered in the following sections:

Standard setCharacterEncoding() Method

Beginning with the servlet 2.3 specification, the setCharacterEncoding() method is specified in the javax.servlet.ServletRequest interface as the standard mechanism for specifying a nondefault character encoding for reading HTTP requests. The signature of this method is as follows:

void setCharacterEncoding(java.lang.String enc)
                          throws java.io.UnsupportedEncodingException

The enc parameter is a string specifying the name of the desired character encoding and overrides the default character encoding. Call this method before reading request parameters or reading input through the getReader() method, which is also specified in the ServletRequest interface.

There is also a corresponding getter method:

String getCharacterEncoding()

Overview of Oracle Extensions for Older Servlet Environments

In pre-2.3 servlet environments, the setCharacterEncoding() method is not available. For such environments, Oracle provides two alternative mechanisms:

  • oracle.jsp.util.PublicUtil.setReqCharacterEncoding() static method (preferred)

  • translate_params configuration parameter (or equivalent code)