|
Reprinted from
Advantages of XML: Moving Beyond Format - March 8, 1999
However cool the idea of escaping the limitations of
a basic tag set (like HTML) sounds, it isn't even close
to the best thing about XML?
The real power of XML comes from the fact that with XML, not
only can you define your own set of tags, but the rules
specified by those tags need not be limited to formatting
rules. XML allows you to define all sorts of tags with all
sorts of rules, such as tags representing business rules or
tags representing data description or data relationships.
Consider again the case of the contact list in SCLML.
Using standard HTML, a developer might use something like
the following:
<UL>
<LI>Gunther Birznieks
<UL>
<LI>Client ID: 001
<LI>Company: Bob's Fish Store
<LI>Email: gunther@bobsfishstore.com
<LI>Phone: 662-9999
<LI>Street Address: 1234 4th St.
<LI>City: New York
<LI>State: New York
<LI>Zip: 10024
</UL>
<LI>Susan Czigany
<UL>
<LI>Client ID: 002
<LI>Company: Netscape
<LI>Email: susan@eudora.org
<LI>Phone: 555-1234
<LI>Street Address: 9876 Hazen Blvd.
<LI>City: San Jose
<LI>State: California
<LI>Zip: 90034
</UL>
</UL>
While this may be an acceptable way to store and
display your data, it is hardly the most efficient or
powerful. As you are probably aware, there are many potential
problems associated with marking up your data using HTML.
Three particularly serious problems come to mind:
- The GUI is embedded in the data. What happens if
you decide that you like a table-based presentation better
than a list-based presentation? In order to change to a
table-based presentation, you must recode all your HTML!
This could mean editing many of pages.
- Searching for information in the data is tough.
How would you get a quick list of only the clients in
California? Certainly, some type of script would be
necessary. But how would that script work? It would
probably have to search through the file word for word
looking for the string "California". And even if it
found matches, it would have no way of knowing that
California might have a relationship to "New York" - that
they are both states. Forget about the relationships
between pieces of data which are crucial to power
searching.
- The data is tied to the logic and language of HTML.
What happens if you want to present your data in a Java
applet? Well, unfortunately, your Java applet would have
to parse through the HTML document stripping out tags
and reformat the data. Non-HTML processing applications
should not be burdened with extraneous work.
With XML, these problems and similar problems are solved.
In XML, the same page would look like the following:
<CLIENT>
<NAME>Gunther Birznieks</NAME>
<ID>001</ID>
<COMPANY>Bob's Fish Store</COMPANY>
<EMAIL>gunther@bobsfishstore.com</EMAIL>
<PHONE>662-9999</PHONE>
<STREET>1234 4th St.</STREET>
<CITY>New York</CITY>
<STATE>New York</STATE>
<ZIP>Zip: 10024</ZIP>
</CLIENT>
<CLIENT>
<NAME>Susan Czigany</NAME>
<ID>002</ID>
<COMPANY>Netscape</COMPANY>
<EMAIL>susan@eudora.org</EMAIL>
<PHONE>555-1234</PHONE>
<STREET>9876 Hazen Blvd.</STREET>
<CITY>San Jose</CITY>
<STATE>California</STATE>
<ZIP>90034</ZIP>
</CLIENT>
As you can see, custom tags are used to bring meaning to the
data being displayed. When stored this way, data becomes
extremely portable because it carries with it its description
rather than its display. Display is "extracted" from the data
and as we will see later, incorporated into a "style sheet".
Let's consider some of the benefits.
- With XML, the GUI is extracted. Thus, changes to display
do not require futzing with the data. Instead, a separate
style sheet will specify a table display or a list display.
- Searching the data is easy and efficient. Search engines
can simply parse the description-bearing tags rather than
muddling in the data. Tags provide the search engines
with the intelligence they lack.
- Complex relationships like trees and inheritance can be
communicated.
- The code is much more legible to a person coming into the
environment with no prior knowledge. In the above example,
it is obvious that <ID>002</ID> represents
an ID whereas <LI>002 might not.
XML is self-describing.
|