Encyclopedia > XML

  Article Content


XML (eXtensible Markup Language) is a standard maintained by the World Wide Web Consortium for creating special-purpose markup languages. It is general enough that XML-based languages can be used to describe a number of different kinds of data as well as text. Its primary purpose is to facilitate the sharing of structured text and information across the Internet. Although based on SGML, it is greatly simplified, despite including enhancements for portability. Languages based on XML (for example, RDF, SMIL, MathML, and SVG) are themselves described in a formal way, allowing programs to modify and validate documents in these languages without prior knowledge.

Table of contents

Strengths and Weaknesses The features of the XML format that make it particularly appropriate for data transfer are :

  • compatability with web and internet protocols
  • simultaneously human and machine readable format
  • support for Unicode representing all current and many historical character sets
  • the ability to represent the most general computer science data structures (records, lists and trees)
  • the format is self-documenting in that it describes the structure and field names as well as specific values
  • strict syntax makes the necessary parsing algorithms fast and efficient.

The weaknesses of the format relates to matters of efficiency, since the XML

  • is not compressed
  • still requires further parsing to extract individual values.

For matters of generic, loosely-bound, data transfer the strengths outweigh weaknesses, and in many neutral applications where efficiency is not a particular concern an XML format is also coming to be adopted simply because tools to manipulate XML are now conveniently on-hand.

Syntax rules for an XML file It should be noted that XML files themselves are simple text files. The encoding is specified in the first statement. The default encoding is UTF-8 which includes, but is not limited to, ASCII.

Unlike, for example, HTML, XML is highly dependent upon structure, content and integrity for its efficacy. In order for a document to be considered "well-formed", i.e. fully XML-compliant, an XML file must conform (at the very least) to the following:

  • A valid XML document must have one (and only one) root element.
  • Elements which contain entries must possess both an opening and a closing tag. (In the case of an empty tag which looks like this: <example/>, the tag is taken both to open and close itself in a self-contained manner. It merely serves to save having to code <example></example> to preserve well-formedness).
  • All attribute values must be enquoted.
  • Tags may be nested but may not overlap.

It should be noted that elements in an XML context are case-sensitive: for example <Example> and </Example> are a well-formed matching pair whereas <Example> and </example> are not.

Also, again unlike HTML, XML tags explain what the data means rather than how simply to display it.

As a concrete example, a simple recipe expressed in an XML representation might be:

        <Recipe name="bread" prep_time="5 mins" cook_time="3 hours" >
            <ingredient amount="3 cups" >Flour</ingredient>
            <ingredient amount="0.25 ounce" >Yeast</ingredient>
            <ingredient amount="1 1/4 cups" >Warm Water</ingredient>
            <ingredient amount="1 teaspoon" >Salt</ingredient>
            <Instructions>Mix all ingredients together, and knead thoroughly.
                 Cover with a cloth, and leave for one hour in warm room.
                 Knead again, place in a tin, and then bake in the oven.

Document Type Definition

Before the advent of generalised data description languages such as SGML and XML, software designers had to define special file formats or small languages to share data between programs. This required writing detailed specifications and special purpose parsers and writers. For a language based on XML, however, the software designer can specify the basic syntax by writing a DTD, or a more detailed description using an XML Schema. There are readily available (and in some cases free) tools which understand these descriptions -- XML parsers and writers (http://www.w3.org/XML/#software). This may significantly reduce life-cycle development cost.

When an XML file is both compliant with the rules for well-formedness and is also in concordance with the DTD or Schema which it refers to, then the XML file is considered a "valid document".

Displaying XML files on the web

As a further adjunct to XML is the stylesheet language XSL, which allows users to describe visual properties and transformations of XML data without embedding those instructions into the data itself. The resulting file is then an HTML file which uses CSS for rendering.

An XML file may also be rendered directly in some browsers such as e.g. Internet Explorer 5 or Mozilla with the stylesheet language CSS. This process is still not yet stable as of January 2003. The XML files must then include a reference to a style sheet:

 <?xml-stylesheet type="text/css" class=encyclopedia href="myStyleSheet.css"?>

While browser-based XML rendering develops, the alternative is conversion into HTML or PDF or other formats on the server. Programs like [Cocoon (http://xml.apache.org/cocoon/index)] process an XML file against a stylesheet (and can perform other processing as well) and send the output back to the user's browser without the user needing to be aware of what has been going on in the background.

Components of XML

  • XPath It is possible to refer to individual components of an XML document using XPath. This allows stylesheets in (eg) XSL or XSLT to dynamically "cherry-pick" pieces of a document in any sequence needed in order to compose the required output (ie documents do not need to be processed sequentially).
  • XML query language is to XML what SQL is to relational databases.
  • Namespaces enable the same document to contain XML elements and attributes taken from different vocabularies, without any naming collisions occurring.
It is not compatible with DTDs (Schemas must be used).

Processing XML files

The APIs widely used in processing XML data by programming languages are Simple API for XML and DOM[?]. SAX is used for serial processing whereas DOM is used for random access processing.

An XSL processor may be used to render an XML file for displaying or printing. XSL itself is intended for creating PDF files. XSLT is for transforming to other formats, including HTML, other vocabularies of XML, and any other plain-text format.

The native file format of OpenOffice.org is XML. Some parts of Microsoft Office-11 will also be able to edit XML files with a user-supplied Schema (but not a DTD). There are dozens of other XML editors available.

Versions of XML

The first version of XML was XML 1.0.

The latest official version of XML is 1.1. XML 1.1 (also known as Blueberry) extends XML 1.0 by adding support for new characters in Unicode 3.0, and by fixing an oversight which led to XML not supporting EBCDIC end of line conventions.

There are also discussions on an XML 2.0, although it remains to be seen if such will ever come about. XML-SW (SW for skunk works), authored by one of the original developers of XML, contains some proposals for what an XML 2.0 might look like: elimination of DTDs from syntax, integration of Namespaces, XML Base and XML Information Set into the base standard.

See also

External links:

Anti-XML links:

  • Xml Sucks (http://c2.com/cgi/wiki?XmlSucks) expresses relatively cautious XML discontented views

All Wikipedia text is available under the terms of the GNU Free Documentation License

  Search Encyclopedia

Search over one million articles, find something about almost anything!
  Featured Article
Bayern Munich

... 2003 European Champions' Cup/League titles: 1974, 1975, 1976, 2001 European Cup Winners' Cup title: 1967 UEFA Cup title: 1996 Famous players: Franz ...