LIS 525 - XML

XML (Extensible Markup Language) is a simplified version (subset) of SGML (Standard Generalized Markup Language).

Here is an example of the contents of an actual XML file, xsl-mappings.xml, a file that specifies default style sheets for output in the Windows Management Instrumentation (WMI) infrastructure in Microsoft Windows XP Embedded:


Unlike HTML, XML does not have a default way of displaying the data in the file. Display formats have to be defined in separate style sheets, written, for example, in Cascading Style Sheet Language (CSS) or eXtensible Style Language (XSL). In XML the style sheet is referenced, not through a link element as in HTML, but through a declaration of the form

<?xml-stylesheet href="URL" type="text/css"?>
or, for XSL, the same with type "text/xml".

In addition, an XML document may or may not be linked to a Document Type Definition (DTD). Examples of XML code linking documents to DTDs are

<!DOCTYPE toolbar:toolbarlayouts PUBLIC "-// OfficeDocument 1.0//EN" "toolbar.dtd">
which uses a Formal Public Identifier (FPI) and a filename to identify the DTD and
<!DOCTYPE VocLoadFile SYSTEM "">
which uses a System Identifier (which, in XML, can be a URL or other URI, thus rather nullifying the main advantage of bothering with an FPI); the first string after "DOCTYPE" (toolbar:toolbarlayouts, VocLoadFile) identifies the root element of the document's structure.

The DTD can also be included in the XML file itself, just before the body.

A more recent alternative to DTDs is XML schemas, which follow a standard intended to improve on DTD limitations and allow specifying document types in XML itself. An XML schema automatically generated for the XML file xsl-mappings.xml can be seen in the file 525xml.xml.

Optionally, an XML file may begin with an XML declaration, identifying the version of XML and possibly other things, such as the encoding scheme; for example,

<?xml version="1.0"?>

Tags in XML are case sensitive (again unlike HTML).

Web browsers deal differently with XML files. If the XML file does not specify a style, Netscape Navigator and Firefox display the full XML text with syntax highlighting. Internet Explorer opens it with the default application. Depending on the preferences for the file type, Opera either displays only the text outside the tags or opens the file with the default application.

If the XML file does specify a style in a stylesheet, Netscape Navigator, Firefox, and Opera show the contents according to the stylesheet. Internet Explorer opens it with the default application; if this happens to be Internet Explorer, then the stylesheet is applied.

Some Markup Languages Written in XML

XHTML (Extensible Hypertext Markup Language) is a hybrid between HTML and XML written in XML. Major differences from standard HTML: all HTML tags have to be lower-case; all elements have to be terminated explicitly, either with a </element-name> tag or with a slash before the > of the initial tag; and the id attribute has to be used instead of, or as well as, the name attribute for named anchors. Many Web pages are actually written in XHTML, but are given the extension .htm or .html and delivered by servers as text/html so that browsers will handle them appropriately.

SMIL (Synchronized Multimedia Integration Language) is intended to enable Web developers to divide multimedia content into separate files and streams, send these individually to a user's computer, and then have them displayed together as if they were a single multimedia stream. This is supposed to use less bandwidth than transmitting presentations as complete units. SMIL defines commands specifying whether the various multimedia components should be played together or in sequence. The SMIL format can be played with RealPlayer.

VocML (Vocabulary [Products] MarkUp Language) is a DTD under development to support structured representation of a wide range of knowledge organization resources, "including authority files, hierarchical thesauri (including those with polyhierarchies), classification schemes, digital gazetteers, and subject heading lists."

There is also the somewhat simpler ZThes DTD, designed specifically for thesaurus data.

MathML (Mathematical Markup Language) is an XML application for capturing both structure and content of mathematical notation.

WML (Wireless Markup Language) is also written in XML.

DocBook is a DTD widely adopted by authors of various kinds of books and supported by various commercial tools.

For More Information


Last updated October 29, 2007.
This page maintained by Prof. Tim Craven
E-mail (text/plain only):
Faculty of Information and Media Studies
University of Western Ontario,
London, Ontario
Canada, N6A 5B7