Web Developer's Virtual Library: Encyclopedia of Web Design Tutorials, Articles and Discussions


WDVL Newsletter

Active Server Pages
JSP/Java Servlets
Microsoft SQL Server
Daily Backup
Dedicated Servers
Streaming Audio/Video
24-hour Support    

jobs.webdeveloper.com

Hiermenus


e-commerce
Partner With Us















Developer Channel
FlashKit.com
JavaScript.com
JavaScriptSource
Developer Jobs
ScriptSearch
StreamingMediaWorld
Web Developer's Journal
Web Developer's Virtual Library
WebDeveloper.com
Webreference
Web Hosts
XMLfiles.com

internet.com
IT
Developer
Internet News
Small Business
Personal Technology

Search internet.com
Advertise
Corporate Info
Newsletters
Tech Jobs
E-mail Offers


XML and Java: Definitions

November 16, 1998

To understand the relative pros and cons of the diverse Java XML software discussed in Parts 2 and 3 of this article, there are several terms which must be clarified.

APIs

APIs are Application Programming Interfaces. APIs describe how a programmer must use the software written by others. In Java, an API specifies the class name and usually its superclass (parent), the return types, the methods (functions), and the arguments (parameters) to the methods. In some languages, this is referred to as the signature of a function. The following API example is from the startElement method of SAX:

    public void startElement(String name, 
			     AttributeList attributes) throws SAXException

In Java, APIs are described using javadoc. For example, see the JDK 1.1 API Documentation or the JDK 1.2 API Documentation.

Document Object Model

The DOM, an October 1, 1998 W3C Recommendation, specifies a standard tree-based API for both XML and HTML documents. The DOM provides "a platform- and language-neutral interface that allows programs and scripts to dynamically access and update the content, structure and style of documents."

To quote from our own WDVL DOM page, "The goal of the DOM specification is to define a programmatic interface for XML and HTML. It defines the logical structure of documents and the way a document is accessed and manipulated. This specification defines the foundation of a platform- and language-neutral interface to access and update dynamically a document's content, structure, and style. Programmers can build documents, navigate their structure, and add, modify, or delete elements and content. Anything found in an HTML or XML document can be accessed, changed, deleted, or added using the Document Object Model, with a few exceptions."

Since early 1998, a number of APIs and tools have emerged that support the DOM Recommendation or its earlier Working Drafts. The Java API to the DOM, called the Java Language Binding, describes the Java DOM interface which we will examine in Part 2.

Parsing

Parsing is the process of splitting up a stream of information into its constituent pieces (often called tokens). In the context of XML, parsing refers to scanning an XML document (which need not be a physical file -- it can be a data stream) in order to split it into its various elements (tags) and their attributes. XML parsing reveals the structure of the information since the nesting of elements implies a hierarchy. It is possible for an XML document to fail to parse completely if it does not follow the well-formedness rules described in the XML 1.0 Recommendation. A successfully parsed XML document may be either well-formed (at a minimum) or valid.

Non-validating Parser

A non-validating parser is the minimal case. The parser does not check a document against any DTD (Document Type Definition); it only checks that the document is well-formed (that it is properly markedup according to XML syntax rules). However, a non-validating parser is typically smaller than a validating one, so it may be more appropriate for use in a Java applet.

Validating Parser

In addition to checking well-formedness, a validating parser verifies that the document conforms to a specific DTD (either internal or external to the XML file being parsed). Although a validating parser is generally larger than a non-validating one, its rigor is necessary in cases where the structural integrity of the XML data is important, such as in database and eCommerce applications. It is likely that web browsers will need to include validating parsers.

Note that for an XML document to be valid, it must either contain or refer to a DTD. Authors of XML documents will provide DTDs in situations where a group (company or industry) wants to standardize on a particular set of elements. A DTD is also necessary to supply default values for attributes and to designate binary entities (CDATA).

Event-based Parsing (e.g., SAX)

Event-based parsers provide a data-centric view of XML. When an element is encountered, process it and then forget about it. The event-based parser returns the element, its list of attributes, and the content. This is more efficient for many types of applications, especially searches. It requires less code and less memory since there is no need to build a large tree in memory as you are scanning for a particular element, attribute, and/or content sequence in an XML document..

In What is an Event-Based Interface?, David Megginson, the SAX proponent, wrote:

"An event-based API.... reports parsing events (such as the start and end of elements) directly to the application through callbacks, and does not usually build an internal tree. The application implements handlers to deal with the different events, much like handling events in a graphical user interface....[A]n event-based API provides a simpler, lower-level access to an XML document: you can parse documents much larger than your available system memory, and you can construct your own data structures using your callback event handlers."

Tree-based Parsing (e.g., DOM)

On the other hand, tree-based parsers provide a document-centric view of XML. In tree-based parsing, an in-memory tree is created for the entire document (extremely memory-intensive for large documents). All elements and attributes are available at once, but not until the entire document has been parsed. This technique is useful if you need to navigate around the document and perhaps change various document chunks, which is precisely why it is useful for the Document Object Model (DOM), the aim of which is to manipulate documents via scripting languages or Java.

According to David Megginson,

"A tree-based API compiles an XML document into an internal tree structure, then allows an application to navigate that tree. The Document Object Model (DOM) working group at the World-Wide Web consortium is developing a standard tree-based API for XML and HTML documents....Tree-based APIs are useful for a wide range of applications, but they often put a great strain on system resources, especially if the document is large (under very controlled circumstances, it is possible to construct the tree in a lazy fashion to avoid some of this problem). Furthermore, some applications need to build their own, different data trees, and it is very inefficient to build a tree of parse nodes, only to map it onto a new tree."


Coming Next Month

Visit WDVL in mid December for Part 2: "XML and Java: A Perfect Pair: APIs" in which we will discuss a number of Java APIs for XML: Sun's XML Library, W3C's DOM, Coins, DOM SDK, SAXON, Koala XML Serialization, JPython, and XML Testbed.

XML and Java: Why These Two
XML and Java: The Perfect Pair: Part 1

Enter your email address to receive mail when this XML section is updated:


Brought to you by NetMind,
home of URL-minder: Your Own Personal Web Robot!



Up to => Home / Authoring / Languages / XML / Java




Jupiter Online Media: internet.comearthweb.comDevx.commediabistro.comGraphics.com

Search:

Jupitermedia Corporation has two divisions: Jupiterimages and Jupiter Online Media

Jupitermedia Corporate Info


Legal Notices, Licensing, & Permissions, Privacy Policy.

Web Hosting | Newsletters | Tech Jobs | Shopping | E-mail Offers