Wednesday, February 20, 2008

Why do we prefer XML in Webpages?

The Extensible Markup Language (XML) is a way of specifying the content elements of a page to a Web browser. XML is syntactically similar to HTML. In fact, XML can be used in many of the places in which HTML is used today. Here's an example. Imagine that the JDC Tech Tip index was stored in XML instead of HTML
In each case, the document is organized as a hierarchy of elements, where each element is demarcated by angle brackets. As is true for most HTML elements, each XML element consists of a start tag, followed by some data, followed by an end tag: element data
Also as in HTML, XML elements can be annotated with attributes. In the XML example above, each element has several attributes. The 'title' attribute is the name of the tip, the 'author' attribute gives a short form of the author's name, and the 'htmlURL' and 'textURL' attributes contain links to different archived formats of the tip.

The similarities between the two markup languages is an important advantage as the world moves to XML, because hard-earned HTML skills continue to be useful. However, it does beg the question "Why bother to switch to XML at all?" To answer this question, look again at the XML example above, and this time consider the semantics instead of the syntax. Where HTML tells you how to format a document, XML tells you about the content of the document. This capability is very powerful. In an XML world, clients can reorganize data in a way most useful to them. They are not restricted to the presentation format delivered by the server. Importantly, the XML format has been designed for the convenience of parsers, without sacrificing readability. XML imposes strong guarantees about the structure of documents. To name a few: begin tags must have end tags, elements must nest properly, and all attributes must have values. This strictness makes parsing and transforming XML much more reliable than attempting to manipulate HTML.

No comments: