|
Reprinted from
What is XML - March 8, 1999
Like HTML,
XML
(also known as Extensible Markup Language) is a markup
language which relies on the concept of rule-specifying tags
and the use of a tag-processing application that knows how
to deal with the tags.
|
"The correct title of this specification, and the correct
full name of XML, is "Extensible Markup Language".
"eXtensible Markup Language" is just a spelling error.
However, the abbreviation "XML" is not only correct but,
appearing as it does in the title of the specification,
an official name of the Extensible Markup Language.
The name and abbreviation were invented by James Clark; other
options under consideration had included MGML, (Minimal
Generalized Markup Language), MAGMA (Minimal Architecture For
Generalized Markup Applications), and SLIM (Structured
Language for Internet Markup)" -
Extensible Markup Language (XML) 1.0 Specs, The Annotated
Version.
|
However, XML is far more powerful than HTML.
This is because of the "X". XML is "eXtensible".
Specifically, rather than providing a set of pre-defined tags,
as in the case of HTML, XML specifies the standards with
which you can define your own markup languages with their
own sets of tags. XML is a meta-markup language which allows
you to define an infinite number of markup languages
based upon the standards defined by XML.
"The design goals for XML are:
- XML shall be straightforwardly usable over the Internet.
- XML shall support a wide variety of applications.
- XML shall be compatible with SGML.
- It shall be easy to write programs which process XML documents.
- The number of optional features in XML is to be kept to the absolute
minimum, ideally zero.
- XML documents should be human-legible and reasonably clear.
- The XML design should be prepared quickly.
- The design of XML shall be formal and concise.
- XML documents shall be easy to create.
- Terseness in XML markup is of minimal importance."
-
Extensible Markup Language (XML) 1.0 Specs,
The Annotated Version.
|
Let's consider a very simple example. Let's create a new
markup language called SCLML (Selena's Client List Markup
Language). This language will define tags to represent
contact people and information about contact people.
The set of tags will be simple. However, they will be
expressive. Unlike <UL> and <LI> XML tags can
be immediately understood just by reading the document.
<CONTACT>
<NAME>Gunther Birznieks</NAME>
<ID>001</ID>
<COMPANY>Bob's Fish Store</COMPANY>
<EMAIL>gunther@bobsfishstore.com</EMAIL>
<PHONE>662-9999</PHONE>
<STREET>1234 4th St.</STREET>
<CITY>New York</CITY>
<STATE>New York</STATE>
<ZIP>Zip: 10024</ZIP>
</CONTACT>
<CONTACT>
<NAME>Susan Czigany</NAME>
<ID>002</ID>
<COMPANY>Netscape</COMPANY>
<EMAIL>susan@eudora.org</EMAIL>
<PHONE>555-1234</PHONE>
<STREET>9876 Hazen Blvd.</STREET>
<CITY>San Jose</CITY>
<STATE>California</STATE>
<ZIP>90034</ZIP>
</CONTACT>
|
Note that the use of XML is not limited to text markup. The
very extensibility of XML means that it could just as easily
be applied to sound markup or image markup. A tag such as
<EMPHASIZE> might be displayed textually as being bold
but audibly as a louder voice!
|
What you see above is a very simple "XML document". As you
can see, it looks pretty similar to an HTML document.
But don't forget, as we said before, it is not enough to
simply encode (markup) the data. For the data to be decoded
by someone or something else, the encoding markup languages
must follow standard rules including:
- The syntax for marking up
- The meaning behind the markup
In other words, a processing application must know what a
valid markup is (perhaps a tag) and what to do with it if it
is valid? After all, how would Netscape know what to do with
the above document? What in the world is a <PHONE> tag?
Is it a legal tag? How should it be displayed? Our markup
language must somehow communicate the syntax of the markup
so that the processing application will know
what to do with it.
In XML, the definition of a valid markup is handled by a
Document Type Definition (DTD) which communicates the
structure of the markup language. The DTD specifies what it
means to be a valid tag (the syntax for marking up).
We'll discuss the details of DTDs later. For now, just get
comfortable with the idea of a DTD as a separate component
to the equation.
Yet we must also communicate the meaning of the markup as
well as the syntax.
To specify what valid tags mean, XML documents are also
associated with "style sheets" which provide GUI instructions
for a processing application like a web browser. A style
sheet, the details of which we will discuss later, might
specify display instructions such as:
- Anytime you see a <CONTACT>, display it using a <UL>
tag. Similarly, </CONTACT> tags should be converted to </UL>
- All <NAME> tags can be substituted for <LI> tags and
</NAME> tags should be ignored.
- All <EMAIL> tags can be substituted for <LI> tags and
</EMAIL> tags should be ignored.
etc.....
In this example, the style sheet utilizes the functionality
of HTML to define the formatting of SCLML. But if the XML
document was being processed by a program other than a web
browser, the HTML translation step might be bypassed.
Processing applications combine the logic of the style sheet,
the DTD, and the data of the SCLML document and display it
according to the rules and the data.
But wait, isn't this quite complex? Now instead of a single
HTML document which defines the data and the rules to display
the data, we have an SCLML document, a DTD, AND a style sheet.
That's three pieces as opposed to just one.
Further, we need a processing agent that can do the work of
putting the DTD, style sheet, and SCLML document together.
Remember, web browsers are made to read a specific
markup language (like HTML), not any markup language.
That means we have three documents to pull together plus one
processing program to write or buy. What a mess.
Actually however, though there are a few more hurdles to
jump in order to use XML, there are several reasons why all
this is worth it. Let's take a look at them. . . .
|