Element Type Declarations (ETDs)
May 3, 1999
As we mentioned parenthetically, the previous DTD is "not
quite valid". The DTD really only says that the parser
should expect a document with a root element named CONTACTS.
It does not say anything about the contents or
structure of
that document. However, to be valid, a document's DTD must
specify every detail of its structure!
To specify the structure, we must populate the
"[ELEMENT_DEFINITIONS_GO_HERE]" portion of the DTD with a
Document Type Definition. Document Type Definitions declare
all of the valid document elements using Element Type
Declarations (ETDs).
ETDs specify the name of elements and whether or not those
elements may have any children. Elements may have several
types of children ranging from none, to plain parsed
character data, to other elements, to other elements with their
own children, to any of the above.
ETD's follow the generic syntax of:
<!ELEMENT ELEMENT_NAME CHILDREN_NAMES>
In the case of our CONTACTS element we might see something
like the following:
<?xml version = "1.0" encoding="UTF-8" standalone = "yes"?>
<!DOCTYPE CONTACTS [
<!ELEMENT CONTACTS ANY>
]>
<CONTACTS>
</CONTACTS>
In this case, the DTD defines an XML document containing a single
root element named CONTACTS, (don't forget XML is case sensitive),
that may contain ANY (case sensitive) type of child, including
parsed character data or other elements.
Note however, that though CONTACTS "could" contain other elements,
no element other than CONTACTS is actually allowed by the DTD
since no other elements are defined. All elements in an XML
document must be defined in the DTD. Thus, the following XML,
though well-formed, is invalid!
<?xml version = "1.0" encoding="UTF-8" standalone = "yes"?>
<!DOCTYPE CONTACTS [
<!ELEMENT CONTACTS ANY>
]>
<CONTACTS>
<CONTACT>
<NAME>Roger Kaplan</NAME>
</CONTACT>
</CONTACTS>
|
NOTE: Unlike elements, parsed character data within an "ANY"
declaration, does not need to
be defined...thus, the following XML document would be valid:
<?xml version = "1.0" encoding="UTF-8" standalone = "yes"?>
<!DOCTYPE CONTACTS [
<!ELEMENT CONTACTS ANY>
]>
<CONTACTS>
<CONTACT>
Here is some plain parsed character data.
</CONTACT>
</CONTACTS>
|
For the document to be valid, you must also define the
<CONTACT> and <NAME> elements.
<?xml version = "1.0" encoding="UTF-8" standalone = "yes"?>
<!DOCTYPE CONTACTS [
<!ELEMENT CONTACTS ANY>
<!ELEMENT CONTACT (NAME)>
<!ELEMENT NAME (#PCDATA)>
]>
<CONTACTS>
<CONTACT>
<NAME>Roger Kaplan</NAME>
</CONTACT>
</CONTACTS>
In this case, we see that we have defined an XML document with a
single root element named CONTACTS. CONTACTS may contain
parsed character data or child elements (ANY). In particular,
CONTACTS may contain the child element CONTACT. CONTACT
contains its own child element named NAME (NAME) and
NAME contains parsed character data (#PCDATA)!
|
NOTE: It is bad form to use the ANY keyword for any element
other than the root element. Generally, you should try to be
as conservative as the DTD allows you to be. Think in terms
of everything being denied except what you specifically allow.
Also, note that the order in which you specify
ETDs does not matter. Thus,
<!ELEMENT NAME (#PCDATA)>
<!ELEMENT CONTACTS ANY>
<!ELEMENT CONTACT (NAME)>
would work just as well as
<!ELEMENT CONTACTS ANY>
<!ELEMENT CONTACT (NAME)>
<!ELEMENT NAME (#PCDATA)>
Finally, note that you may not specify elements with
the same name but with different definitions such as:
<!ELEMENT CONTACTS ANY>
<!ELEMENT CONTACT (NAME)>
<!ELEMENT CONTACT (EMAIL)>
<!ELEMENT NAME (#PCDATA)>
The double definition of CONTACT would cause an error.
|
The ANY and #PCDATA keywords are pretty straightforward. And in
this case, the definition of the NAME element as a child of
CONTACT was pretty simple as well.
|
NOTE:
Elements should begin with either a letter, an underscore
(_) or a colon (:) followed by some combination of letters,
numbers, periods (.), colons, underscores, or hyphens (-)
but no white space, with the exception that no tags
should begin with any form of "xml". It is also a good
idea to not use colons as the first character in a tag name
even if it is legal. Using a colon first could be confusing.
Further, though the XML 1.0 standard specifies names of any
length, actual XML processors may limit the length of
markup names.
|
However, as we mentioned before, the regular expression
functionality offered through DTDs allows you to get very
flexible with the definition/declaration of elements and their
children.
Let's take a look.....
The Basic DTD
Introduction to XML For Web Developers | Table of Contents
Defining Elements and their Children
|