Skip Headers
Oracle® XML Developer's Kit Programmer's Guide
10g Release 2 (10.1.2)
Part No. B14033-01
  Go To Table Of Contents
Contents
Go To Index
Index

Previous
Previous
Next
Next
 

3 XML Parser for Java

This chapter contains these topics:

XML Parser for Java Overview

Oracle provides XML parsers for Java, C, C++, and PL/SQL. This chapter discusses the parser for Java only. Each of these parsers is a standalone XML component that parses an XML document (and possibly also a standalone document type definition (DTD) or XML Schema) so that they can be processed by your application. In this chapter, the application examples presented are written in Java.

XML Schema is a W3C XML recommendation effort to introduce the concept of data types to XML documents and replace the syntax of DTDs with one which is based on XML. The process of checking the syntax of XML documents against a DTD or XML Schema is called validation.

To use an external DTD, include a reference to the DTD in your XML document. Without it there is no way for the parser to know what to validate against. Including the reference is the XML standard way of specifying an external DTD. Otherwise you need to embed the DTD in your XML Document.

Figure 3-1 shows an XML document as input to the XML Parser for Java. The DOM or SAX parser interface parses the XML document. The parsed XML is then transferred to the application for further processing.

The XML Parser for Java includes an integrated XSL Transformation (XSLT) Processor for transforming XML data using XSL stylesheets. Using the XSLT Processor, you can transform XML documents from XML to XML, XML to HTML, or to virtually any other text-based format.

If a stylesheet is used, the DOM or SAX interface also parses and outputs the XSL commands. These are sent together with the parsed XML to the XSLT Processor where the selected stylesheet is applied and the transformed (new) XML document is then output. Figure 3-1 shows a simplified view of the XML Parser for Java.

Figure 3-1 XML Parser for Java

Description of adxdk002.gif follows
Description of the illustration adxdk002.gif

The XML Parser for Java processor reads XML documents and provides access to their content and structure. An XML processor does its work on behalf of another module, your application. This parsing process is illustrated in Figure 3-2.

Figure 3-2 XML Parsing Process

Description of adxdk040.gif follows
Description of the illustration adxdk040.gif


See Also:


Namespace Support

The XML Parser for Java also supports XML Namespaces. Namespaces are a mechanism to resolve or avoid name collisions between element types (tags) or attributes in XML documents.

This mechanism provides "universal" namespace element types and attribute names. Such tags are qualified by uniform resource identifiers (URIs), such as:

<oracle:EMP xmlns:oracle="http://www.oracle.com/xml"/>

For example, namespaces can be used to identify an Oracle <EMP> data element as distinct from another company's definition of an <EMP> data element. This enables an application to more easily identify elements and attributes it is designed to process.

The XML Parser for Java can parse universal element types and attribute names, as well as unqualified "local" element types and attribute names.


See Also:


XML Parser for Java Validation Modes

Validation involves checking whether or not the attribute names and element tags are legal, whether nested elements belong where they are, and so on.

The DTD file defined in the <!DOCTYPE> declaration must be relative to the location of the input XML document. Otherwise, you need to use the setBaseURL(url) functions to set the base URL to resolve the relative address of the DTD if the input is coming from InputStream.

If you are parsing an InputStream, the parser does not know where that InputStream came from, so it cannot find the DTD in the same directory as the current file. The solution is to setBaseURL() on DOMParser() to give the parser the URL hint information to be able to derive the rest when it goes to get the DTD.

XML documents are made up of storage units called entities, which contain either parsed or unparsed data. Parsed data is made up of characters, some of which form character data, and some of which form markup.

Markup encodes a description of the document's storage layout and logical structure. XML provides a mechanism to impose constraints on the storage layout and logical structure.

The parser method setValidationMode(mode) parses XML in the mode values shown in Table 3-1.

Table 3-1 XML Parser for Java Validation Modes

Name of Mode Mode Value in Java Description
Non-Validating Mode NONVALIDATING The parser verifies that the XML is well-formed and parses the data into a tree of objects that can be manipulated by the DOM API.
DTD Validating Mode DTD_VALIDATION The parser verifies that the XML is well-formed and validates the XML data against the DTD (if any).
Partial Validation Mode PARTIAL_VALIDATION Partial validation validates all or part of the input XML document according to the DTD or XML Schema, if one is present. If one is not present, the mode is set to Non-Validating Mode. With this mode, the schema validator locates and builds schemas and validates the whole or a part of the instance document based on the schemaLocation and noNamespaceSchemaLocation attributes. See code exampleXSDSample.java in directory /xdk/demo/java/schema.
Schema Validation Mode SCHEMA_VALIDATION The XML Document is validated according to the XML Schema specified for the document.
Lax Validation SCHEMA_LAX_VALIDATION The validator tries to validate part or all of the instance document as long as it can find the schema definition. It does not raise an error if it cannot find the definition. This is shown in the sample XSDLax.java in the schema directory.
Strict Validation SCHEMA_STRICT_VALIDATION The validator tries to validate the whole instance document, raising errors if it cannot find the schema definition or if the instance does not conform to the definition.
Auto Validation Mode See description. If a DTD is available, the mode value is set to DTD_VALIDATION, if a Schema is present then it is set to SCHEMA_VALIDATION. If neither is available, it is set to NONVALIDATING mode value, which is the default.

In addition to the validator to build the schema itself, you can use XSDBuilder to build schemas and set it to the validator using setXMLSchema() method. See code example XSDSetSchema.java. By using the setXMLSchema() method, the validation mode is automatically set to SCHEMA_STRICT_VALIDATION, and both schemaLocation and noNamespaceSchemaLocation attributes are ignored. You can also change the validation mode to SCHEMA_LAX_VALIDATION.

Using DTDs with the XML Parser for Java

The following is a discussion of the use of DTDs. It contains the sections:

Enabling DTD Caching

DTD caching is optional and is not enabled automatically.

The XML Parser for Java provides for validating and non-validating DTD caching through the setDoctype() function. After you set the DTD using this function, XMLParser will cache this DTD for further parsing.

If your application has to parse several XML documents with the same DTD, after you parse the first XML document, you can get the DTD from parser and set it back:

dtd = parser.getDoctype();
parser.setDoctype(dtd);

The parser will cache this DTD and use it for parsing the following XML documents.

Set the DOMParser.USE_DTD_ONLY_FOR_VALIDATION attribute, if the cached DTD Object is used only for validation by:

parser.setAttribute(DOMParser.USE_DTD_ONLY_FOR_VALIDATION,Boolean.TRUE);

Otherwise, the XML parser will copy the DTD object and add it to the result DOM tree.

The method to set the DTD is setDoctype(). Here is an example:

// Test using InputSource 
parser = new DOMParser(); 
parser.setErrorStream(System.out); 
parser.showWarnings(true); 
 
FileReader r = new FileReader(args[0]); 
InputSource inSource = new InputSource(r); 
inSource.setSystemId(createURL(args[0]).toString()); 
parser.parseDTD(inSource, args[1]); 
dtd = (DTD)parser.getDoctype(); 
 
r = new FileReader(args[2]); 
inSource = new InputSource(r); 
inSource.setSystemId(createURL(args[2]).toString()); 
// ********************
parser.setDoctype(dtd); 
// ********************
parser.setValidationMode(DTD_VALIDATION); 
parser.parse(inSource); 
 
doc = (XMLDocument)parser.getDocument(); 
doc.print(new PrintWriter(System.out)); 

Recognizing External DTDs

To recognize external DTDs, the XML Parser for Java has the setBaseURL() method.

The way to redirect the DTD is by using resolveEntity():

  1. Parse your External DTD using a DOM parser's parseDTD() method.

  2. Call getDoctype() to get an instance of oracle.xml.parser.v2.DTD.

  3. On the document where you want to set your DTD programmatically, use the call setDoctype(yourDTD). Use this technique to read a DTD out of your product's JAR file.

Loading External DTDs from a JAR File

The parser supports a base URL method (setBaseURL()), but that just points to a place where all the DTDs are exposed.

Do the following steps:

  1. Load the DTD as an InputStream:

    InputStream is =      YourClass.class.getResourceAsStream("/foo/bar/your.dtd");
    
    

    This opens ./foo/bar/your.dtd in the first relative location on the CLASSPATH that it can be found, including out of your JAR if it is in the CLASSPATH.

  2. Parse the DTD:

    DOMParser d = new DOMParser();
    d.parseDTD(is, "rootelementname");
    d.setDoctype(d.getDoctype());
    
    
  3. Parse your document:

    d.parse("yourdoc");
    

Checking the Correctness of Constructed XML Documents

No validation is done while creating the DOM tree using DOM APIs. So setting the DTD in the document does not help validate the DOM tree that is constructed. The only way to validate an XML file is to parse the XML document using the DOM parser or the SAX parser.

Parsing a DTD Object Separately from an XML Document

The parseDTD() method enables you to parse a DTD file separately and get a DTD object. Here is some sample code to do this:

DOMParser domparser = new DOMParser();
domparser.setValidationMode(DTD_VALIDATION); 
/* parse the DTD file */
domparser.parseDTD(new FileReader(dtdfile));
DTD dtd = domparser.getDoctype();

XML Parsers Case-Sensitivity

XML is inherently case-sensitive, therefore the parsers enforce case sensitivity in order to be compliant. When you run in non-validation mode only well-formedness counts. However <test></Test> signals an error even in non-validation mode.

Allowed File Extensions in External Entities

The file extension for external entities is unimportant so you can change it to any convenient extension, including no extension.

Creating a DOCUMENT_TYPE_NODE

There is no way to create a new DOCUMENT_TYPE_NODE object using the DOM APIs. The only way to get a DTD object is to parse the DTD file or the XML file using the DOM parser, and then use the getDocType() method.

The following statement does not create a DTD object. It creates an XMLNode object with the type set to DOCUMENT_TYPE_NODE, which in fact is not allowed. The ClassCastException is raised because appendChild expects a DTD object (based on the type).

appendChild(New XMLNode("test",Node.DOCUMENT_TYPE_NODE));

Standard DTDs That Can be Used for Orders, Shipments, and So On

Basic, standard DTDs to build on for orders, shipments, and acknowledgements are found on this Web site, which has been set up for that purpose:

http://www.xml.org/

About DOM and SAX APIs

XML APIs for parsing are of two kinds:

Consider the following simple XML document:

<?xml version="1.0"?>
  <EMPLIST>
    <EMP>
     <ENAME>MARY</ENAME>
    </EMP>
    <EMP>
     <ENAME>SCOTT</ENAME>
    </EMP>
  </EMPLIST>

DOM: Tree-Based API

A tree-based API (such as DOM) builds an in-memory tree representation of the XML document. It provides classes and methods for an application to navigate and process the tree.

In general, the DOM interface is most useful for structural manipulations of the XML tree, such as reordering elements, adding or deleting elements and attributes, renaming elements, and so on. For example, for the immediately preceding XML document, the DOM creates an in-memory tree structure as shown inFigure 3-3.

SAX: Event-Based API

An event-based API (such as SAX) uses calls to report parsing events to the application. Your Java application deals with these events through customized event handlers. Events include the start and end of elements and characters.

Unlike tree-based APIs, event-based APIs usually do not build in-memory tree representations of the XML documents. Therefore, in general, SAX is useful for applications that do not need to manipulate the XML tree, such as search operations, among others. The preceding XML document becomes a series of linear events as shown in Figure 3-3.

Figure 3-3 Comparing DOM (Tree-Based) and SAX (Event-Based) APIs

Description of adxdk041.gif follows
Description of the illustration adxdk041.gif

Guidelines for Using DOM and SAX APIs

Here are some guidelines for using the DOM and SAX APIs:

DOM

  • Use the DOM API when you need to use random access.

  • Use DOM when you are performing XSL Transformations.

  • Use DOM when you are calling XPath. SAX does not support it.

  • Use DOM when you want to have tree iterations and need to walk through the entire document tree.

  • Customize DOM tree building: org.w3c.dom.Is.DOMBuilderFilter.

  • Avoid parsing external DTDs if no validation is required: DOMParser.set.Attribute(DOMParsser.STANDALONE, Boolean.TRUE);.

  • Avoid including the DTD object in DOM unless necessary: DOMParser.setAttribute(DOMParser.USE_DTD_ONLY_FOR_VALIDATION, Boolean.TRUE);.

  • Use DTD caching for DTD validations: DOMParser.setDoctype(dtd);.

  • Build DOM asynchronously using DOM 3.0 Load and Save: DOMImplementationLS.MODE_ASYNCHRONOUS.

  • A unified DOM API supports both XMLType columns and XML documents.

  • When using the DOM interface, use more attributes than elements in your XML to reduce the pipe size.


    See Also:

    "DOM Specifications" for information on what is supported for this release

SAX

  • Use the SAX API when your data is mostly streaming data.

  • Use SAX to save memory. DOM consumes more memory.

  • To increase the speed of retrieval of XML documents from a database, use the SAX interface instead of DOM. Make sure to select the COUNT(*) of an indexed column (the more selective the index the better). This way the optimizer can satisfy the count query with a few inputs and outputs of the index blocks instead of a full-table scan.

  • Use SAX 2.0, because SAX 1.0 is deprecated.

  • There are output options for SAX: print formats, XML declaration, CDATA, DTD.

  • Multi-task the SAX processing to improve throughput (using multi-handlers and enabling multiple processing in callbacks). Multiple handler registrations per SAX parsing: oracle.xml.parser.V2.XMLMultiHandler.

  • Use the built-in XML serializer to simplify output creation: oracle.xml.parser.V2.XMLSAXSerializer.

About XML Compressor

The XML Compressor supports binary compression of XML documents. The compression is based on tokenizing the XML tags. The assumption is that any XML document has a repeated number of tags and so tokenizing these tags gives a considerable amount of compression. Therefore the compression achieved depends on the type of input document; the larger the tags and the lesser the text content, then the better the compression.

The goal of compression is to reduce the size of the XML document without losing the structural and hierarchical information of the DOM tree. The compressed stream contains all the "useful" information to create the DOM tree back from the binary format. The compressed stream can also be generated from the SAX events.

XML Parser for Java can also compress XML documents. Using the compression feature, an in-memory DOM tree or the SAX events generated from an XML document are compressed to generate a binary compressed output. The compressed stream generated from DOM and SAX are compatible, that is, the compressed stream generated from SAX can be used to generate the DOM tree and vice versa.

As with XML documents in general, you can store the compressed XML data output as a BLOB (Binary Large Object) in the database.

Sample programs to illustrate the compression feature are described in Table 3-2, "XML Parser for Java Sample Programs".

XML Serialization and Compression

An XML document is compressed into a binary stream by means of the serialization of an in-memory DOM tree. When a large XML document is parsed and a DOM tree is created in memory corresponding to it, it may be difficult to satisfy memory requirements and this can affect performance. The XML document is compressed into a byte stream and stored in an in-memory DOM tree. This can be expanded at a later time into a DOM tree without performing validation on the XML data stored in the compressed stream.

The compressed stream can be treated as a serialized stream, but the information in the stream is more controlled and managed, compared to the compression implemented by Java's default serialization.

There are two kinds of XML compressed streams:

  • DOM based compression: The in-memory DOM tree, corresponding to a parsed XML document, is serialized, and a compressed XML output stream is generated. This serialized stream regenerates the DOM tree when read back.

  • SAX based compression: The compressed stream is generated when an XML file is parsed using a SAX parser. SAX events generated by the SAX parser are handled by the SAX compression utility, which handles the SAX events to generate a compressed binary stream. When the binary stream is read back, the SAX events are generated.


    Note:

    Oracle Text cannot search a compressed XML document. Decompression reduces performance. If you are transferring files between client and server, then HTTP compression can be easier.

    Compression is supported only in the XDK Java components.


Running the Sample Applications for XML Parser for Java

The directory demo/java/parser contains some sample XML applications to show how to use the XML Parser for Java. The following are the sample Java files in its subdirectories (common, comp, dom, jaxp, sax, xslt):

Table 3-2 XML Parser for Java Sample Programs

Sample Program Purpose
XSLSample A sample application using XSL APIs
DOMSample A sample application using DOM APIs
DOMNamespace A sample application using Namespace extensions to DOM APIs
DOM2Namespace A sample application using DOM Level 2.0 APIs
DOMRangeSample A sample application using DOM Range APIs
EventSample A sample application using DOM Event APIs
NodeIteratorSample A sample application using DOM Iterator APIs
TreeWalkerSample A sample application using DOM TreeWalker APIs
SAXSample A sample application using SAX APIs
SAXNamespace A sample application using Namespace extensions to SAX APIs
SAX2Namespace A sample application using SAX 2.0
Tokenizer A sample application using XMLToken interface APIs
DOMCompression A sample application to compress a DOM tree
DOMDeCompression A sample to read back a DOM from a compressed stream
SAXCompression A sample application to compress the SAX output from a SAX Parser
SAXDeCompression A sample application to regenerate the SAX events from the compressed stream
JAXPExamples Samples using the JAXP 1.1 API

The Tokenizer application implements XMLToken interface, which you must register using the setTokenHandler() method. A request for the XML tokens is registered using the setToken() method. During tokenizing, the parser does not validate the document and does not include or read internal or external utilities.

To run the sample programs:

  1. Use make (for UNIX) or make.bat (for Windows) in the directory xdk/demo/java to generate .class files.

  2. Add xmlparserv2.jar and the current directory to the CLASSPATH.

The following list does not have to be done in order, except for decompressing:

Using XML Parser for Java: DOMParser Class

To write DOM-based parser applications you can use the following classes:

Since DOMParser extends XMLParser, all methods of XMLParser are also available to DOMParser. Figure 3-4, "XML Parser for Java: DOMParser()" shows the main steps you need when coding with the DOMParser class.

Without DTD Input

In some applications, it is not necessary to validate the XML document. In this case, a DTD is not required.

  1. A new DOMParser() is called. Some of the methods to use with this object are:

    • setValidateMode()

    • setPreserveWhiteSpace()

    • setDoctype()

    • setBaseURL()

    • showWarnings()

  2. The results of DOMParser() are passed to XMLParser.parse() along with the XML input. The XML input can be a file, a string buffer, or URL.

  3. Use the XMLParser.getDocument() method.

  4. Optionally, you can apply other DOM methods such as:

    • print()

    • DOMNamespace() methods

  5. The Parser outputs the DOM tree XML (parsed) document.

  6. Optionally, use DOMParser.reset() to clean up any internal data structures, once the DOM API has finished building the DOM tree.

With a DTD Input

If validation of the input XML document is required, a DTD is used.

  1. A new DOMParser() is called. The methods to apply to this object are:

    • setValidateMode()

    • setPreserveWhiteSpace()

    • setDocType()

    • setBaseURL()

    • showWarnings()

  2. The results of DOMParser() are passed to XMLParser.parseDTD() method along with the DTD input.

  3. XMLParser.getDocumentType()method sends the resulting DTD object back to the new DOMParser() and the process continues until the DTD has been applied.

Figure 3-4 XML Parser for Java: DOMParser()

Description of adxdk055.gif follows
Description of the illustration adxdk055.gif

Comments on Example 1: DOMSample.java

These comments are for Example 1: DOMSample.java which follows immediately after this section.

  1. Declare a new DOMParser()instance:

    DOMParser parser = new DOMParser();
    
    
  2. The XML input is a URL generated from the input filename:

    URL url = DemoUtil.createURL(argv[0]);
    
    
  3. The DOMParser class has several methods you can use. The example uses:

    parser.setErrorStream(System.err);
    parser.setValidationMode(DTD_VALIDATION);
    parser.showWarnings(true);
    
    
  4. The input document is parsed:

    parser.parse(url);
    
    
  5. The DOM tree document is obtained:

    XMLDocument doc = parser.getDocument();
    
    
  6. This program applies the node class methods:

    • getElementsByTagName()

    • getTagName()

    • getAttributes()

    • getNodeName()

    • getNodeValue()

  7. The attributes of each element are printed.


    Note:

    No DTD input is shown in DOMSample.java.

XML Parser for Java Example 1: DOMSample.java

This example shows the Java code that uses the preceding steps.

/* Copyright (c) Oracle Corporation 2000, 2001. All Rights Reserved. */

/**
 * DESCRIPTION
 * This file demonstates a simple use of the parser and DOM API.
 * The XML file that is given to the application is parsed and the
 * elements and attributes in the document are printed.
 * The use of setting the parser options is demonstrated.
 */

import java.net.URL;

import org.w3c.dom.Node;
import org.w3c.dom.Element;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
import org.w3c.dom.NamedNodeMap;

import oracle.xml.parser.v2.DOMParser;
import oracle.xml.parser.v2.XMLDocument;

public class DOMSample
{
   static public void main(String[] argv)
   {
      try
      {
         if (argv.length != 1) 
         {
            // Must pass in the name of the XML file.
            System.err.println("Usage: java DOMSample filename");
            System.exit(1);
         }

         // Get an instance of the parser
         DOMParser parser = new DOMParser();

// Generate a URL from the filename.
         URL url = DemoUtil.createURL(argv[0]);

         // Set various parser options: validation on,
         // warnings shown, error stream set to stderr.
         parser.setErrorStream(System.err);
         parser.setValidationMode(DOMParser.DTD_VALIDATION);
         parser.showWarnings(true);

// Parse the document.
         parser.parse(url);

         // Obtain the document.
         XMLDocument doc = parser.getDocument();

         // Print document elements
         System.out.print("The elements are: ");
         printElements(doc);

         // Print document element attributes
         System.out.println("The attributes of each element are: ");
         printElementAttributes(doc);
      }
      catch (Exception e)
      {
         System.out.println(e.toString());
      }
   }

   static void printElements(Document doc)
   {
      NodeList nl = doc.getElementsByTagName("*");
      Node n;
         
      for (int i=0; i<nl.getLength(); i++)
      {
         n = nl.item(i);
         System.out.print(n.getNodeName() + " ");
      }

      System.out.println();
   }

   static void printElementAttributes(Document doc)
   {
      NodeList nl = doc.getElementsByTagName("*");
      Element e;
      Node n;
      NamedNodeMap nnm;

      String attrname;
      String attrval;
      int i, len;

      len = nl.getLength();

      for (int j=0; j < len; j++)
      {
         e = (Element)nl.item(j);
         System.out.println(e.getTagName() + ":");
         nnm = e.getAttributes();

         if (nnm != null)
         {
            for (i=0; i<nnm.getLength(); i++)
            {
               n = nnm.item(i);
               attrname = n.getNodeName();
               attrval = n.getNodeValue();
               System.out.print(" " + attrname + " = " + attrval);
            }
         }
         System.out.println();
      }
   }
}

Using XML Parser for Java: DOMNamespace Class

Figure 3-3 illustrates the main processes involved when parsing an XML document using the DOM interface. The following example illustrates how to use the DOMNamespace class:

XML Parser for Java Example 2: Parsing a URL — DOMNamespace.java

See the comments in this source code for a guide to the use of methods. The program begins with these comments:

/**
 * DESCRIPTION
 * This file demonstates a simple use of the parser and Namespace
 * extensions to the DOM APIs. 
 * The XML file that is given to the application is parsed and the
 * elements and attributes in the document are printed.
 */

The methods used on XMLElement from the NSName interface, which provides Namespace support for element and attribute names, are:

  • getQualifiedName() returns the qualified name

  • getLocalName() returns the local name

  • getNamespace() returns the resolved Namespace for the name

  • getExpandedName() returns the fully resolved name.

Here is a how they are used later in the code:

         // Use the methods getQualifiedName(), getLocalName(), getNamespace()
         // and getExpandedName() in NSName interface to get Namespace
         // information.
         
         qName = nsElement.getQualifiedName();
         System.out.println("  ELEMENT Qualified Name:" + qName);
         
         localName = nsElement.getLocalName();
         System.out.println("  ELEMENT Local Name    :" + localName);
         
         nsName = nsElement.getNamespace();
         System.out.println("  ELEMENT Namespace     :" + nsName);
         
         expName = nsElement.getExpandedName();
         System.out.println("  ELEMENT Expanded Name :" + expName);
      }
      

For the attributes, the method getNodeValue() returns the value of this node, depending on its type. Here is another excerpt from later in this program:

         nnm = e.getAttributes();

         if (nnm != null)
         {
            for (i=0; i < nnm.getLength(); i++)
            {
               nsAttr = (XMLAttr) nnm.item(i);

               // Use the methods getExpandedName(), getQualifiedName(),
               // getNodeValue() in NSName 
               // interface to get Namespace information.

               attrname = nsAttr.getExpandedName();
               attrqname = nsAttr.getQualifiedName();
               attrval = nsAttr.getNodeValue();

No DTD is input is shown in DOMNameSpace.java.

Using XML Parser for Java: SAXParser Class

Applications can register a SAX handler to receive notification of various parser events. XMLReader is the interface that an XML parser's SAX2 driver must implement. This interface enables an application to set and query features and properties in the parser, to register event handlers for document processing, and to initiate a document parse.

All SAX interfaces are assumed to be synchronous: the parse methods must not return until parsing is complete, and readers must wait for an event-handler callback to return before reporting the next event.

This interface replaces the (now deprecated) SAX 1.0 Parser interface. The XMLReader interface contains two important enhancements over the old parser interface:

Table 3-3 lists the SAXParser methods.

Table 3-3 SAXParser Methods

Method Description
getContentHandler() Returns the current content handler
getDTDHandler() Returns the current DTD handler
getEntityResolver() Returns the current entity resolver
getErrorHandler() Returns the current error handler
getFeature(java.lang.String name) Looks up the value of a feature
getProperty(java.lang.String name) Looks up the value of a property
setContentHandler(ContentHandler handler) Enables an application to register a content event handler
setDocumentHandler(DocumentHandler handler) Deprecated as of SAX2.0; replaced by setContentHandler()
setDTDHandler(DTDHandler handler) Enables an application to register a DTD event handler
setEntityResolver(EntityResolver resolver) Enables an application to register an entity resolver
setErrorHandler(ErrorHandler handler) Enables an application to register an error event handler
setFeature(java.lang.String name, boolean value) Sets the state of a feature
setProperty(java.lang.String name, java.lang.Object value) Sets the value of a property

Figure 3-5 shows the main steps for coding with the SAXParser class.

  1. Create a new handler for the parser:

    SAXSample sample = new SAXSample();
    
    
  2. Declare a new SAXParser() object. Table 3-3 lists all the available methods.

    Parser parser = new SAXParser;
    
    
  3. Set validation mode as DTD_VALIDATION.

  4. Convert the input file to URL and parse:

    parser.parse(DemoUtil.createURL(argv[0].toString());
    
    
  5. Parse methods return when parsing completes. Meanwhile the process waits for an event-handler callback to return before reporting the next event.

  6. The parsed XML document is available for output by this application. Interfaces used are:

    • DocumentHandler

    • EntityResolver

    • DTDHandler

    • ErrorHandler

Figure 3-5 Using SAXParser Class

Description of adxdk052.gif follows
Description of the illustration adxdk052.gif

XML Parser for Java Example 3: Using the Parser and SAX API (SAXSample.java)

This example illustrates how you can use SAXParser class and several handler interfaces. See the comments in this source code for a guide to the use of methods.

SAX is a standard interface for event-based XML parsing. The parser reports parsing events directly through callback functions such as setDocumentLocator() and startDocument(). This application uses handlers to deal with the different events.

/* Copyright (c) Oracle Corporation 2000, 2001. All Rights Reserved. */

/**
 * DESCRIPTION
 * This file demonstates a simple use of the parser and SAX API.
 * The XML file that is given to the application is parsed and 
 * prints out some information about the contents of this file.
 */

import java.net.URL;

import org.xml.sax.Parser;
import org.xml.sax.Locator;
import org.xml.sax.AttributeList;
import org.xml.sax.HandlerBase;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;

import oracle.xml.parser.v2.SAXParser;

public class SAXSample extends HandlerBase
{
   // Store the locator
   Locator locator;

   static public void main(String[] argv)
   {
      try
      {
         if (argv.length != 1)
         {
            // Must pass in the name of the XML file.
            System.err.println("Usage: SAXSample filename");
            System.exit(1);
         }
         // Create a new handler for the parser
         SAXSample sample = new SAXSample();

         // Get an instance of the parser
         Parser parser = new SAXParser();
         
         // set validation mode
         ((SAXParser)parser).setValidationMode(SAXParser.DTD_VALIDATION);
         // Set Handlers in the parser
         parser.setDocumentHandler(sample);
         parser.setEntityResolver(sample);
         parser.setDTDHandler(sample);
         parser.setErrorHandler(sample);
    
         // Convert file to URL and parse
         try
         {
            parser.parse(DemoUtil.createURL(argv[0]).toString());
         }
         catch (SAXParseException e) 
         {
            System.out.println(e.getMessage());
         }
         catch (SAXException e) 
         {
            System.out.println(e.getMessage());
         }  
      }
      catch (Exception e)
      {
         System.out.println(e.toString());
      }
   }

   //////////////////////////////////////////////////////////////////////
   // Sample implementation of DocumentHandler interface.
   //////////////////////////////////////////////////////////////////////

   public void setDocumentLocator (Locator locator)
   {
      System.out.println("SetDocumentLocator:");
      this.locator = locator;
   }

   public void startDocument() 
   {
      System.out.println("StartDocument");
   }

   public void endDocument() throws SAXException 
   {
      System.out.println("EndDocument");
   }
      
   public void startElement(String name, AttributeList atts) 
                                                  throws SAXException 
   {
      System.out.println("StartElement:"+name);
      for (int i=0;i<atts.getLength();i++)
      {
         String aname = atts.getName(i);
         String type = atts.getType(i);
         String value = atts.getValue(i);

         System.out.println("   "+aname+"("+type+")"+"="+value);
      }
      
   }

   public void endElement(String name) throws SAXException 
   {
      System.out.println("EndElement:"+name);
   }

   public void characters(char[] cbuf, int start, int len) 
   {
      System.out.print("Characters:");
      System.out.println(new String(cbuf,start,len));
   }

   public void ignorableWhitespace(char[] cbuf, int start, int len) 
   {
      System.out.println("IgnorableWhiteSpace");
   }
   
   
   public void processingInstruction(String target, String data) 
              throws SAXException 
   {
      System.out.println("ProcessingInstruction:"+target+" "+data);
   }
   

      
   //////////////////////////////////////////////////////////////////////
   // Sample implementation of the EntityResolver interface.
   //////////////////////////////////////////////////////////////////////


   public InputSource resolveEntity (String publicId, String systemId)
                      throws SAXException
   {
      System.out.println("ResolveEntity:"+publicId+" "+systemId);
      System.out.println("Locator:"+locator.getPublicId()+" "+
                  locator.getSystemId()+
                  " "+locator.getLineNumber()+" "+locator.getColumnNumber());
      return null;
   }

   //////////////////////////////////////////////////////////////////////
   // Sample implementation of the DTDHandler interface.
   //////////////////////////////////////////////////////////////////////

   public void notationDecl (String name, String publicId, String systemId)
   {
      System.out.println("NotationDecl:"+name+" "+publicId+" "+systemId);
   }

   public void unparsedEntityDecl (String name, String publicId,
         String systemId, String notationName)
   {
      System.out.println("UnparsedEntityDecl:"+name + " "+publicId+" "+
                         systemId+" "+notationName);
   }

   //////////////////////////////////////////////////////////////////////
   // Sample implementation of the ErrorHandler interface.
   //////////////////////////////////////////////////////////////////////


   public void warning (SAXParseException e)
              throws SAXException
   {
      System.out.println("Warning:"+e.getMessage());
   }

   public void error (SAXParseException e)
              throws SAXException
   {
      throw new SAXException(e.getMessage());
   }


   public void fatalError (SAXParseException e)
              throws SAXException
   {
      System.out.println("Fatal error");
      throw new SAXException(e.getMessage());
   }

}

XML Parser for Java Example 4: (SAXNamespace.java)

See the comments in this source code for use of the SAX APIs.

/* Copyright (c) Oracle Corporation 2000, 2001. All Rights Reserved. */

/**
 * DESCRIPTION
 * This file demonstrates a simple use of the Namespace extensions to 
 * the SAX 1.0 APIs.
 */

import java.net.URL;

import org.xml.sax.HandlerBase;
import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;

// Extensions to the SAX Interfaces for Namespace support.
import oracle.xml.parser.v2.XMLDocumentHandler;
import oracle.xml.parser.v2.DefaultXMLDocumentHandler;
import oracle.xml.parser.v2.NSName;
import oracle.xml.parser.v2.SAXAttrList;

import oracle.xml.parser.v2.SAXParser;

public class SAXNamespace {

  static public void main(String[] args) {
     
     String fileName;

     //Get the file name
     
     if (args.length == 0)
     {
        System.err.println("No file Specified!!!");
        System.err.println("USAGE: java SAXNamespace <filename>");
        return;
     }
     else
     {
        fileName = args[0];
     }
     

     try {

        // Create handlers for the parser

        // Use the XMLDocumentHandler interface for namespace support 
        // instead of org.xml.sax.DocumentHandler
        XMLDocumentHandler xmlDocHandler = new XMLDocumentHandlerImpl();

        // For all the other interface use the default provided by
        // Handler base
        HandlerBase defHandler = new HandlerBase();

        // Get an instance of the parser
        SAXParser parser = new SAXParser();
           
        // set validation mode
        ((SAXParser)parser).setValidationMode(SAXParser.DTD_VALIDATION);

        // Set Handlers in the parser
        // Set the DocumentHandler to XMLDocumentHandler
        parser.setDocumentHandler(xmlDocHandler);

        // Set the other Handler to the defHandler
        parser.setErrorHandler(defHandler);
        parser.setEntityResolver(defHandler);
        parser.setDTDHandler(defHandler);
           
        try 
        {
           parser.parse(DemoUtil.createURL(fileName).toString());
        }
        catch (SAXParseException e) 
        {
           System.err.println(args[0] + ": " + e.getMessage());
        }
        catch (SAXException e) 
        {
           System.err.println(args[0] + ": " + e.getMessage());
        }  
     }
     catch (Exception e) 
     {
        System.err.println(e.toString());
     }
  }

}

/***********************************************************************
 Implementation of XMLDocumentHandler interface. Only the new
 startElement and endElement interfaces are implemented here. All other
 interfaces are implemented in the class HandlerBase.
**********************************************************************/

class XMLDocumentHandlerImpl extends DefaultXMLDocumentHandler
{

   public void XMLDocumentHandlerImpl()
   {
   }

      
   public void startElement(NSName name, SAXAttrList atts) throws SAXException 
   {

      // Use the methods getQualifiedName(), getLocalName(), getNamespace()
      // and getExpandedName() in NSName interface to get Namespace
      // information.

      String qName;
      String localName;
      String nsName;
      String expName;

      qName = name.getQualifiedName();
      System.out.println("ELEMENT Qualified Name:" + qName);

      localName = name.getLocalName();
      System.out.println("ELEMENT Local Name    :" + localName);

      nsName = name.getNamespace();
      System.out.println("ELEMENT Namespace     :" + nsName);

      expName = name.getExpandedName();
      System.out.println("ELEMENT Expanded Name :" + expName);

      for (int i=0; i<atts.getLength(); i++)
      {

      // Use the methods getQualifiedName(), getLocalName(), getNamespace()
      // and getExpandedName() in SAXAttrList interface to get Namespace
      // information.

         qName = atts.getQualifiedName(i);
         localName = atts.getLocalName(i);
         nsName = atts.getNamespace(i);
         expName = atts.getExpandedName(i);

         System.out.println(" ATTRIBUTE Qualified Name   :" + qName);
         System.out.println(" ATTRIBUTE Local Name       :" + localName);
         System.out.println(" ATTRIBUTE Namespace        :" + nsName);
         System.out.println(" ATTRIBUTE Expanded Name    :" + expName);


         // You can get the type and value of the attributes either
         // by index or by the Qualified Name.

         String type = atts.getType(qName);
         String value = atts.getValue(qName);

         System.out.println(" ATTRIBUTE Type             :" + type);
         System.out.println(" ATTRIBUTE Value            :" + value);

         System.out.println();

      }      
   }

  public void endElement(NSName name) throws SAXException 
   {
      // Use the methods getQualifiedName(), getLocalName(), getNamespace()
      // and getExpandedName() in NSName interface to get Namespace
      // information.

      String expName = name.getExpandedName();
      System.out.println("ELEMENT Expanded Name  :" + expName);
   }
   
}

Using the XML Parser for Java

Here are some helpful hints for using the XML Parser for Java. This section contains these topics:

Using DOM and SAX APIs for Java

Here is some further information about the DOM and SAX APIs.

Using the DOM API to Count Tagged Elements

To get the number of elements in a particular tag using the parser, you can use the getElementsByTagName() method that returns a node list of all descent elements with a given tag name. You can then find out the number of elements in that node list to determine the number of the elements in the particular tag.

Creating a Node with a Value to Be Set Later

If you check the DOM specification, referring to the table discussing the node type, you will find that if you are creating an element node, its node value is null, and cannot be set. However, you can create a text node and append it to the element node. You can then put the value in the text node.

Traversing the XML Tree Using XPATH

You can traverse the tree by using the DOM API. Alternately, you can use the selectNodes() method which takes XPath syntax to navigate through the XML document. selectNodes() is part of oracle.xml.parser.v2.XMLNode.

Finding the First Child Node Element Value

Here is how to efficiently obtain the value of first child node of the element without going through the DOM tree. If you do not need the entire tree, use the SAX interface to return the desired data. Since it is event-driven, it does not have to parse the whole document.

Using the XMLNode.selectNodes() Method

The selectNodes() method is used in XMLElement and XMLDocument nodes. This method is used to extract contents from the tree or subtree based on the select patterns allowed by XSL. The optional second parameter of selectNodes, is used to resolve namespace prefixes (that is, it returns the expanded namespace URL given a prefix). XMLElement implements NSResolver, so it can be sent as the second parameter. XMLElement resolves the prefixes based on the input document. You can use the NSResolver interface, if you need to override the namespace definitions. The following sample code uses selectNodes.

public class SelectNodesTest  {
public static void main(String[] args) throws Exception {
String pattern = "/family/member/text()";
String file    = args[0];

if (args.length == 2)
  pattern = args[1];

DOMParser dp = new DOMParser();

dp.parse(createURL(file));  // Include createURL from DOMSample
XMLDocument xd = dp.getDocument();
XMLElement e = (XMLElement) xd.getDocumentElement();
NodeList nl = e.selectNodes(pattern, e);
for (int i = 0; i < nl.getLength(); i++) {
   System.out.println(nl.item(i).getNodeValue());
    }
  }
}

> java SelectNodesTest family.xml
Sarah
Bob
Joanne
Jim

> java SelectNodesTest family.xml //member/@memberid
m1
m2
m3
m4

Generating an XML Document from Data in Variables

Here is an example of XML document generation starting from information contained in simple variables, such as when a client fills in a Java form and wants to obtain an XML document.

If you have two variables in Java:

String firstname = "Gianfranco";
String lastname = "Pietraforte";

The two ways to get this information into an XML document are as follows:

  1. Make an XML document in a string and parse it:

    String xml = "<person><first>"+firstname+"</first>"+
         "<last>"+lastname+"</last></person>";
    DOMParser d = new DOMParser();
    d.parse( new StringReader(xml));
    Document xmldoc = d.getDocument();
    
    
  2. Use DOM APIs to construct the document and append it together:

    Document xmldoc = new XMLDocument();
    Element e1 = xmldoc.createElement("person");
    xmldoc.appendChild(e1);
    Element e2 = xmldoc.createElement("first");
    e1.appendChild(e2);
    Text t = xmldoc.createText(firstname);
    e2.appendChild(t);
    

Using the DOM API to Print Data in the Element Tags

For DOM, <name>macy</name> is actually an element named name with a child node (Text Node) of value macy. The sample code is:

String value = myElement.getFirstChild().getNodeValue();

Building XML Files from Hash Table Value Pairs

If you have a hash table key = value name = george zip = 20000:

<key>value</key><name>george</name><zip>20000</zip>

  1. Get the enumeration of keys from your hash table.

  2. Loop while enum.hasMoreElements().

  3. For each key in the enumeration, use the createElement() on DOM document to create an element by the name of the key with a child text node with the value of the value of the hash table entry for that key.

DOM Exception WRONG_DOCUMENT_ERR on Node.appendChild()

If you have the following code snippet:

  Document doc1 = new XMLDocument();
  Element element1 = doc1.creatElement("foo");
  Document doc2 = new XMLDocument();
  Element element2 = doc2.createElement("bar");
  element1.appendChild(element2);  

You will get a DOM exception of WRONG_DOCUMENT_ERR on calling the appendChild() routine, since the owner document of element1 is doc1 while that of element2 is doc2. AppendChild() only works within a single tree and the example uses two different ones. You need to use importNode() or adoptNode() instead

Getting DOMException when Setting Node Value

If you create an element node, its nodeValue is null and hence cannot be set. You get the following error:

oracle.xml.parser.XMLDOMException: Node cannot be modified while trying to set
 the value of a newly created node as below:
  String eName="Mynode";
  XMLNode aNode = new XMLNode(eName, Node.ELEMENT_NODE);
  aNode.setNodeValue(eValue);

Extracting Embedded XML from a CDATA Section

Here is an example to extract XML from the CDATA section of a DTD, which is:

<PAYLOAD>
<![CDATA[<?xml version = '1.0' encoding = 'ASCII' standalone = 'no'?>
<ADD_PO_003>
   <CNTROLAREA>
      <BSR>
         <VERB value="ADD">ADD</VERB>
         <NOUN value="PO">PO</NOUN>
         <REVISION value="003">003</REVISION>
      </BSR>
   </CNTROLAREA>
</ADD_PO_003>]]>
</PAYLOAD>
Extracting PAYLOAD to do Extra Processing

You cannot use a different encoding on the nested XML document included as text inside the CDATA, so having the XML declaration of the embedded document seems of little value. If you do not need the XML declaration, then embed the message as real elements into the <PAYLOAD> instead of as a text chunk, which is what CDATA does for you.

Use the following code:

String s = YourDocumentObject.selectSingleNode("/OES_MESSAGE/PAYLOAD");

The data is not parsed because it is in a CDATA section when you select the value of PAYLOAD.

You have asked for it to be a big text chunk, which is what it will give you. You must parse the text chunk yourself (another benefit of not using the CDATA approach) this way:

YourParser.parse( new StringReader(s));

where s is the string you got in the previous step.

Using Character Sets with the XML Parser for Java

Here are hints about character sets:

Reading a Unicode XML File

When reading an XML document stored in an operating system file, do not use the FileReader class. Instead, use the XML Parser for Java to automatically detect the character encoding of the document. Given a binary input stream with no external encoding information, the parser automatically figures out the character encoding based on the byte order mark and encoding declaration of the XML document. Any well-formed document in any supported encoding can be successfully parsed using the following sample code:

import java.io.*; 
import oracle.xml.parser.v2.*; 
public class I18nSafeXMLFileReadingSample 
{ 
public static void main(String[] args) throws Exception 
{ 
// create an instance of the xml file 
File file = new File("myfile.xml"); 
// create a binary input stream 
FileInputStream fis = new FileInputStream(file); 
// buffering for efficiency 
BufferedInputStream in = new BufferedInputStream(fis); 
// get an instance of the parser 
DOMParser parser = new DOMParser(); 
// parse the xml file 
parser.parse(in); 
}

Writing an XML File in UTF-8

FileWriter class should not be used in writing XML files because it depends on the default character encoding of the runtime environment. The output file can suffer from a parsing error or data loss if the document contains characters that are not available in the default character encoding.

UTF-8 encoding is popular for XML documents, but UTF-8 is not usually the default file encoding of Java. Using a Java class that assumes the default file encoding can cause problems. The following example shows how to avoid these problems:

mport java.io.*;
import oracle.xml.parser.v2.*;

public class I18nSafeXMLFileWritingSample
{
  public static void main(String[] args) throws Exception
  {
    // create a test document
    XMLDocument doc = new XMLDocument();
    doc.setVersion("1.0");
    doc.appendChild(doc.createComment("This is a test empty document."));
    doc.appendChild(doc.createElement("root"));

    // create a file
    File file = new File("myfile.xml");

    // create a binary output stream to write to the file just created
    FileOutputStream fos = new FileOutputStream(file);

    // create a Writer that converts Java character stream to UTF-8 stream
    OutputStreamWriter osw = new OutputStreamWriter( fos,"UTF8");

    // buffering for efficiency
    Writer w = new BufferedWriter(osw);

    // create a PrintWriter to adapt to the printing method
    PrintWriter out = new PrintWriter(w);

    // print the document to the file through the connected objects
    doc.print(out);
  }
}

Writing Parsing XML Stored in NCLOB with UTF-8 Encoding

The following problem with parsing XML stored in an NCLOB column using UTF-8 encoding was reported.

An XML sample that is loaded into the database contains two UTF-8 multibyte characters: The text is supposed to be:

G(0xc2,0x82)otingen, Br(0xc3,0xbc)ck_W

A Java stored function was written that uses the default connection object to connect to the database, runs a select query, gets the OracleResultSet, calls the getCLOB() method and calls the getAsciiStream() method on the CLOB object. Then it executes the following code to get the XML into a DOM object:

DOMParser parser = new DOMParser();
parser.setPreserveWhitespace(true);
parser.parse(istr); 
// istr getAsciiStreamXMLDocument xmldoc = parser.getDocument();

The code throws an exception stating that the XML contains an invalid UTF-8 encoding. The character (0xc2, 0x82) is valid UTF-8. The character can be distorted when getAsciiStream() is called.

To solve this problem, use getUnicodeStream() and getBinaryStream() instead of getAsciiStream().

If this does not work, try to print out the characters to make sure that they are not distorted before they are sent to the parser in step: parser.parse(istr)

Parsing a Document Containing Accented Characters

This is the way to parse a document containing accented characters:

DOMParser parser=new DOMParser(); 
parser.setPreserveWhitespace(true); 
parser.setErrorStream(System.err); 
parser.setValidationMode(false); 
parser.showWarnings(true);
parser.parse ( new FileInputStream(new File("PruebaA3Ingles.xml")));

Storing Accented Characters in an XML Document

If you have stored accented characters, for example, an é, in your XML file and then attempt to parse the XML file with the XML Parser for Java, the parser may throw the following exception:

'Invalid UTF-8 encoding' 

You can read in accented characters in their hex or decimal format within the XML document, for example:

&#xe9; 

but if you prefer not to do this, set the encoding based on the character set you were using when you created the XML file. Try setting the encoding to ISO-8859-1 (Western European ASCII). Use that encoding or something different, depending on the tool or operating system you are using.

If you explicitly set the encoding to UTF-8 (or do not specify it at all), the parser interprets your accented character (which has an ASCII value > 127) as the first byte of a UTF-8 multibyte sequence. If the subsequent bytes do not form a valid UTF-8 sequence, you get an error.

This error just means that your editor is not saving the file with UTF-8 encoding. For example, it might be saving it with ISO-8859-1 encoding. The encoding is a particular scheme used to write the Unicode character number representation to disk. Just adding this string to the top of the document does not cause your editor to write out the bytes representing the file to disk using UTF-8 encoding:

<?xml version="1.0" encoding="UTF-8"?>

Notepad uses UTF-8 on Windows systems.

You Cannot Dynamically Set the Encoding for an Input XML File

You need to include the proper encoding declaration in your document according to the specification. You cannot use setEncoding() to set the encoding for your input document. SetEncoding() is used with oracle.xml.parser.v2.XMLDocument to set the correct encoding for the printing.

Using System.out.println() and Special Characters

You cannot use System.out.println(). You need to use an output stream which is encoding aware (for example, OutputStreamWriter). You can construct an OutputStreamWriter and use the write(char[], int, int) method to print.

/* Example */
OutputStreamWriter out = new OutputStreamWriter
(System.out, "8859_1");
/* Java enc string for ISO8859-1*/

General Questions About XML Parser for Java

These are general questions:

Including Binary Data in an XML Document

There is no way to directly include binary data within the document; however, there are two ways to work around this:

  • Binary data can be referenced as an external unparsed entity that resides in a different file.

  • Binary data can be uuencoded (meaning converted into ASCII data by UUENCODE program) and be included in a CDATA section. The limitation on the encoding technique is to ensure that it only produces legal characters for the CDATA section.

  • base64 is a command line utility which encodes and decodes files in a format used by MIME-encoded documents.

Displaying an XML Document

If you are using IE5 as your browser you can display the XML document directly. Otherwise, you can use the Oracle XSLT Processor version 2 to create the HTML document using an XSL Stylesheet. The XDK JavaBeans also enable you to view your XML document.

Including an External XML File in Another XML File

IE 5.0 will parse an XML file and show the parsed output. Just load the file as you load an HTML page.

The following works, both browsing it in IE5 as well as parsing it with the XML Parser for Java:

File: a.xml
<?xml version="1.0" ?>
<!DOCTYPE a [<!ENTITY b SYSTEM "b.xml">]>
 <a>&b;</a>

File: b.xml
 <ok/>

When you browse and parse a.xml you get the following:

<a>
  <ok/>
</a>

You Do Not Need Oracle9i or Higher to Run XML Parser for Java

XML Parser for Java can be used with any of the supported version Java VMs. The only difference with Oracle9i or higher, is that you can load it into the database and use Oracle9i JVM which is an internal JVM. For other database versions or servers, you simply run it in an external JVM and as necessary connect to a database through JDBC.

Inserting Characters <, >, ', ", and & into XML Documents

You must use the entity references:

  • &gt; for greater than (>)

  • &lt; for less than (<)

  • &apos; for an apostrophe or a single quote (')

  • &quot; for straight double quotes (")

  • &amp; for ampersand (&)

Invalid Special Characters in Tags

If you have a tag in XML <COMPANYNAME> and use A&B, the parser gives an error with invalid character.

Special characters such as &, $, and #, and so on are not allowed to be used. If you are creating an XML document from scratch, you can use a workaround by using only valid NameChars. For example, <A_B>, <AB>, <A_AND_B> and so on. They are still readable.

If you are generating XML from external data sources such as database tables, then this is a problem which XML 1.0 does not address.

The datatype XMLType addresses this problem by offering a function which maps SQL names to XML names. The SQL to XML name mapping function will escape invalid XML NameChar in the format of _XHHHH_ where HHHH is a Unicode value of the invalid character. For example, table name V$SESSION will be mapped to XML name V_X0024_SESSION.

Finally, escaping invalid characters is a workaround to give people a way to serialize names so that they can reload them somewhere else.

Parsing XML from Data of Type String

Currently there is no method that can directly parse an XML document contained within a string. You need to convert the string into an InputStream or InputSource before parsing. An easy way is to create a ByteArrayInputStream using the bytes in the string. For example:

/* xmlDoc is a String of xml */
byte aByteArr [] = xmlDoc.getBytes();
ByteArrayInputStream bais = new ByteArrayInputStream (aByteArr, 0, aByteArr.length);
domParser.parse(bais);

Extracting Data from an XML Document into a String

Here is an example to do this:

XMLDocument Your Document;
/* Parse and Make Mods */
:
StringWriter sw = new StringWriter();
PrintWriter  pw = new PrintWriter(sw);
YourDocument.print(pw);
String YourDocInString = sw.toString();

Illegal Characters in XML Documents

If you limit it to 8-bit, then #x0-#x8; #xB, #xC, #xE, and #xF are not legal.

Using Entity References with the XML Parser for Java

If the XML Parser for Java does not expand entity references, such as &[whatever] and instead, all values are null, how can you fix this?

You probably have a simple error defining or using your entities, since Oracle has regression tests that handle entity references without error. A simple example is: ]> Alpha, then &status.

Merging XML Documents

This is done either using DOM or XSLT.

The XML Parser for Java Does Not Need a Utility to View the Parsed Output

The parsed external entity only needs to be a well-formed fragment. The following program (with xmlparser.jar from version 1) in your CLASSPATH shows parsing and printing the parsed document. It's parsing here from a string but the mechanism is no different for parsing from a file, given its URL.

import oracle.xml.parser.*;
import java.io.*;
import java.net.*;
import org.w3c.dom.*;
import org.xml.sax.*;
/*
** Simple Example of Parsing an XML File from a String
** and, if successful, printing the results.
**
** Usage: java ParseXMLFromString <hello><world/></hello>
*/
public class ParseXMLFromString {
  public static void main( String[] arg ) throws IOException, SAXException {
    String theStringToParse =
       "<?xml version='1.0'?>"+
       "<hello>"+
       "  <world/>"+
       "</hello>";
    XMLDocument theXMLDoc = parseString( theStringToParse );
    // Print the document out to standard out
    theXMLDoc.print(System.out);
  }
  public static XMLDocument parseString( String xmlString ) throws
   IOException, SAXException {
   XMLDocument theXMLDoc     = null;
    // Create an oracle.xml.parser.v2.DOMParser to parse the document.
    XMLParser theParser = new XMLParser();
    // Open an input stream on the string
    ByteArrayInputStream theStream =
         new ByteArrayInputStream( xmlString.getBytes() );
    // Set the parser to work in non-Validating mode
    theParser.setValidationMode(DTD_validation);
    try {
      // Parse the document from the InputStream
      theParser.parse( theStream );
      // Get the parsed XML Document from the parser
      theXMLDoc = theParser.getDocument();
    }
    catch (SAXParseException s) {
      System.out.println(xmlError(s));
      throw s;
    }
    return theXMLDoc;
  }
  private static String xmlError(SAXParseException s) {
     int lineNum = s.getLineNumber();
     int  colNum = s.getColumnNumber();
     String file = s.getSystemId();
     String  err = s.getMessage();
     return "XML parse error in file " + file +
            "\n" + "at line " + lineNum + ", character " + colNum +
            "\n" + err;
  }
}

Support for Hierarchical Mapping

About the relational mapping of parsed XML data: some users prefer hierarchical storage of parsed XML data. Will XMLType address this concern?

Many customers initially have this concern. It depends on what kind of XML data you are storing. If you are storing XML datagrams that are really just encoding of relational information (for example, a purchase order), then you will get much better performance and much better query flexibility (in SQL) by storing the data contained in the XML documents in relational tables, then reproduce on-demand an XML format when any particular data needs to be extracted.

If you are storing documents that are mixed-content, like legal proceedings, chapters of a book, reference manuals, and so on, then storing the documents in chunks and searching them using Oracle Text's XML search capabilities is the best bet.

The book, Building Oracle XML Applications, by Steve Muench, covers both of these storage and searching techniques with lots of examples.

Support for Ambiguous Content Mode

Are there plans to add an ambiguous content mode to the XDK Parser for Java?

The XML Parser for Java implements all the XML 1.0 standard, and the XML 1.0 standard requires XML documents to have unambiguous content models. Therefore, there is no way a compliant XML 1.0 parser can implement ambiguous content models.

Generating an XML Document Based on Two Tables

If you want to generate an XML document based on two tables with a master detail relationship. Suppose you have two tables:

  • PARENT with columns: ID and PARENT_NAME (Key = ID)

  • CHILD with columns: PARENT_ID, CHILD_ID, CHILD_NAME (Key = PARENT_ID + CHILD_ID)

There is a master detail relationship between PARENT and CHILD. How can you generate a document that looks like this?

<?xml version = '1.0'?> 
  <ROWSET> 
     <ROW num="1"> 
       <parent_name>Bill</parent_name> 
         <child_name>Child 1 of 2</child_name> 
         <child_name>Child 2 of 2</child_name> 
      </ROW> 
      <ROW num="2"> 
       <parent_name>Larry</parent_name> 
         <child_name>Only one child</child_name> 
      </ROW> 
  </ROWSET>

Use an object view to generate an XML document from a master-detail structure. In your case, use the following code:

create type child_type is object 
(child_name <data type child_name>) ; 
/ 
create type child_type_nst 
is table of child_type ; 
/ 

create view parent_child 
as 
select p.parent_name 
, cast 
  ( multiset 
    ( select c.child_name 
      from   child c 
      where  c.parent_id = p.id 
    ) as child_type_nst 
  ) child_type 
from parent p 
/ 

A SELECT * FROM parent_child, processed by an SQL to XML utility generates a valid XML document for your parent child relationship. The structure does not look like the one you have presented, though. It looks like this:

<?xml version = '1.0'?> 
<ROWSET> 
   <ROW num="1"> 
      <PARENT_NAME>Bill</PARENT_NAME> 
      <CHILD_TYPE> 
         <CHILD_TYPE_ITEM> 
            <CHILD_NAME>Child 1 of 2</CHILD_NAME> 
         </CHILD_TYPE_ITEM> 
         <CHILD_TYPE_ITEM> 
            <CHILD_NAME>Child 2 of 2</CHILD_NAME> 
         </CHILD_TYPE_ITEM> 
      </CHILD_TYPE> 
  </ROW> 
   <ROW num="2"> 
      <PARENT_NAME>Larry</PARENT_NAME> 
      <CHILD_TYPE> 
         <CHILD_TYPE_ITEM> 
            <CHILD_NAME>Only one child</CHILD_NAME> 
         </CHILD_TYPE_ITEM> 
      </CHILD_TYPE> 
  </ROW> 
</ROWSET> 

Using JAXP

The Java API for XML Processing (JAXP) enables you to use the SAX, DOM, and XSLT processors from your Java application. JAXP enables applications to parse and transform XML documents using an API that is independent of a particular XML processor implementation.

JAXP has a pluggability layer that enables you to plug in an implementation of a processor. The JAXP APIs have an API structure consisting of abstract classes providing a thin layer for parser pluggability. Oracle has implemented JAXP based on the Sun Microsystems reference implementation.

The sample programs JAXPExamples.java and ora.ContentHandler.java in the directory xdk/demo/java/parser/jaxp demonstrate various ways that the JAXP API can be used to transform any one of the classes of the interface Source:

into any one of the classes of the interface Result:

DOMResult class

StreamResult class

SAXResult class

These transformations use XML documents as sample input, optional stylesheets as input, and, optionally, a ContentHandler class defined in the file oraContentHandler.java. For example, one method, identity, does an identity transformation where the output XML document is the same as the input XML document. Another method, xmlFilterChain(), applies three stylesheets in a chain.

Among the drawbacks of JAXP are the additional interface cost, features that are behind "native" Parsers, and the fact that a DOM cannot be shared by processing components.


See Also:

More examples can be found at:

oraxml: XML Parser for Java Command-line

oraxml is a command-line interface to parse an XML document. It checks for well-formedness and validity.

To use oraxml ensure that the following is true:

Table 3-4 lists the oraxml command line options.

Table 3-4 oraxml: Command Line Options

Option Purpose
-comp fileName Compresses the input XML file
-decomp fileName Decompresses the input compressed file
-dtd fileName Validates the input file with DTD Validation
-enc fileName Prints the encoding of the input file
-help Prints the help message
-log logfile Writes the errors to the output log file
-novalidate fileName Checks whether the input file is well-formed
-schema fileName Validates the input file with Schema Validation
-version Prints the release version
-warning Show warnings