How to Create a Vocabulary
July 24, 2000
So, how do you go about defining a new vocabulary? There are actually three
ways, but the traditional way is to create a Document Type Definition (DTD).
A DTD defines a language by listing the elements that are permitted. Every
markup language has one. Well-written HTML files contain a declaration that
tells the browser which DTD to use. For example:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
This is a fixed, public DTD which defines version 4.0 of HTML.
XML lets you create your own DTDs, using Declaration Syntax. Here's an
example, quoted from Peter Flynn's excellent
XML FAQ:
<!ELEMENT List (Item)+>
<!ELEMENT Item (#PCDATA)>
This fragment defines a list as an element type containing one or more
items (that's the plus sign), and items as element types containing just
text (Parsed Character Data, ie text with no more markup left in it).
The example shows a section of a DTD which defines two elements:
<LIST>, which can contain one or more ITEMs, and <ITEM>,
which may contain only text. If we call this DTD "example", and store it
here at the WDVL (this is only an example, not a real DTD), then an XML
document based on this DTD would begin like this:
<?xml version="1.0"?>
<!DOCTYPE Example SYSTEM "http://wdvl.com/dtds/example.dtd">
A fragment of an XML document using these elements might look like this:
<LIST>
<ITEM>
Apple
</ITEM>
<ITEM>
Pear
</ITEM>
<ITEM>
Banana
</ITEM>
</LIST>
Looks like HTML, doesn't it? Note, however, that unlike HTML, XML requires
closing tags for all non-empty elements. Also, remember that DTDs are
written in Declaration Syntax, while XML documents are written in Instance
Syntax. Yes, it does get a bit complicated.
DTDs can be very lengthy, as they must specify every single element that
can be used. Knowing that many Web developers are too lazy to write a
proper DTD, the XML holy men provided for a way to create XML documents
without one.
An XML document with no DTD is referred to as "DTDless", while a document
that does have a DTD is referred to (somewhat confusingly) as "valid".
A DTDless document must begin with a Standalone Document Declaration (SDD),
which declares that it is a "standalone" or DTDless document, for example:
<?xml version="1.0" standalone="yes"?>
When a client encounters a DTDless XML document, it must infer the meaning
of each element from its position and usage. In the above LIST and ITEM
example, a reasonably intelligent browser should be able to figure out that
a LIST is meant to contain ITEMs, simply from the fact that ITEMs are
nested within the LIST.
The rules for writing DTDless documents differ somewhat from those for
writing documents that do have DTDs. Also, some features of XML are not
available with DTDless documents.
As powerful as DTDs are, we endlessly creative (or is it "pain in the
neck"?) developers have already come up against some limitations, leading
to the development of a more sophisticated tool called a "schema".
As a markup language, XML does not define any data types - anything
contained within an element is interpreted as simple text. While a DTD
specifies what is valid syntax, it cannot specify criteria for what is
valid content.
For example, consider an email address, which must contain both a "@"
(which our German-speaking friends call a "monkey's tail") and a "." In
order to minimize errors, a software application (or a Web page with
scripting) can include a validation routine which gives the user an error
message if she or he enters a string that doesn't contain both of these
items. Similar validation routines exist for phone numbers, zip codes, and
any other type of data that has a certain required format.
The XML Schema proposal
allows data types to be specified for the content of particular elements,
so that validation routines can be used to reduce errors. Another advantage
of Schemas is that they are written in XML Instance Syntax, eliminating the
need for Declaration Syntax. Schemas are as yet only a "proposal", not a
formal "recommendation", and it remains to be seen whether they will replace
DTDs or be used as a complement to them.
Examples of XML Vocabularies
Building Languages with XML
Why create a new vocabulary?
|