Introducing the Valid XML Document and the DTD
May 3, 1999
In the
last section, we reviewed the process of creating a
"well-formed" XML document.
As you saw,
there are many rules you must follow in order to assure
that your
XML
document is well-formed. But even when you
write well-formed XML documents, you're not quite out of
the woods! Making your document well-formed is only half
the battle. You must also make sure that the document is
valid.
A valid document by definition, is a well-formed XML document.
But validity goes one step further. A valid XML document is
also a well-formed
SGML
document, and as such, can
be read and interpreted as one.
To pass the SGML validity test, an XML document must
conform to the specifications defined
by a
Document Type Definition (DTD). You can think of the DTD as
defining the overall structure and syntax of the document.
The DTD is in fact the meat of the "meta-markup" concept.
The DTD defines the grammar and vocabulary of a markup language.
In short, the DTD specifies everything a parser needs to know
in order for that parser to interpret a well-formed XML document.
This "specification" can be as simple as listing all the
valid elements (such as elements, tags, attributes, entities)
that an XML document may contain, or can be as complex as
specifying relationships between those elements (such as element
X must contain either Element Y or Element Z but never both).
We've emphasized in other parts of this tutorial that XML is case
sensitive, something that is difficult to remember for HTML
veterans. The XML processing instruction
<?xml version="1.0" ... ?>
must be all lowercase (regardless of what appears in some books).
But keywords in DTDs must be all UPPERCASE, such as ELEMENT,
ATTLIST, #REQUIRED, #IMPLIED, NMTOKEN, ID, etc. However, your
own elements and attributes may be any case you choose, as
long as you are consistent. So if you name an element "BOOKS",
that will not the same as "Books". This tutorial uses UPPERCASE
for all made up elements and attributes. However, this is neither
the rule nor the community-wide convention since the author is
free to decide. Another useful pattern would be:
ElementName attributeName
as in:
<!ELEMENT BookList (Book)+ >
<!ATTLIST BookList
genre NMTOKEN #IMPLIED
listAuthor NMTOKEN #REQUIRED
lastUpdated NMTOKEN #REQUIRED >
-Ken Sall
|
For example, do you remember our
CONTACT XML document from
previous sections?
A CONTACT DTD might specify that every CONTACT has an
<ADDRESS> element that must define <STREET>, <CITY>,
<STATE>, and <ZIP> elements, in that particular order.
Further, the DTD could specify that <ADDRESS> elements
may contain multiple <STREET> elements (though they must at
least contain one).
|
To help you get a feel for the difference between well-formed XML
and valid XML, consider the following well-formed English:
brown jumped
the The fox.
quick over dog
lazy
As you can see, all the words and punctuation represent
well-formed elements of English. However, unless you are
into absurdist poetry, the words and punctuation are virtually
meaningless, and difficult to interpret (especially by a computer).
To be valid English, the words must conform to a standard
grammatical structure. For example:
The quick brown fox jumped over the lazy dog.
In the case of the markup languages defined by XML, the DTD
provides the grammatical structure to bring order to the
elements of the language.
|
To specify grammatical rules, DTDs take advantage of a set of
regular expressions that match for specified patterns within
the XML document in order to determine whether or not the
document is valid. Matching is done conservatively so that
anything not specifically allowed by the DTD is forbidden.
|
See Ken Sall's
Doing it With XML for examples that include a
discussion of the DTD,
when to use elements vs. attributes, XML validation, and
viewing XML in IE5.
|
Okay, enough about what DTDs are....let's look at how you'll
build them.
Introduction to XML For Web Developers
Introduction to XML For Web Developers | Table of Contents
The Prolog and The Body
|