Encyclopedia > Markup language

  Article Content

Markup language

A markup language is a kind of text encoding that represents text as well as details about the structure and appearance of the text. The name is derived from the traditional publishing practice of "marking up" a manuscript, that is, adding printer's instructions in the margins of a paper manuscript. Markup languages are used, for example, by the publishing industry to communicate printed works among authors, editors, and printers. It is said that this idea was first presented by William W. Tunnicliffe in 1967.

Table of contents


Some early examples of markup languages available outside the publishing industry can be found in typesetting tools on Unix systems such as "troff" and "nroff". In these systems, formatting commands were inserted into the document text so that typesetting software could format the text according to the editor's specifications. It was a trial and error iterative process to get a document printed correctly. Availability of WYSIWYG ("what you see is what you get") publishing software supplanted much use of these languages among casual users, though serious publishing work still uses markup to specify the non-visual structure of texts.

After a time it was seen that most markup languages shared many features in common, and it became generally agreed that markup should focus on the structural aspects of a document and leave the visual presentation of that structure to the interpreter. This led (eventually) to the creation of SGML (Standard Generalized Markup Language), which specified a syntax for including the markup in documents, as well as another system (a so-called "metalanguage") for separately describing what the markup meant. This allowed authors to create and use any markup they wished, selecting tags that made the most sense to them. Examples of such markup languages based on the SGML system are TEI[?] and DocBook.

However SGML was generally found to be cumbersome, a side effect of attempting to do too much and be too flexible. It appeared that it would be limited to niche uses while WYSIWYG tools would take over the vast majority of document processing.

This changed dramatically when Tim Berners-Lee used some of the SGML syntax, without the meta-language, to create HTML (Hypertext Markup Language). HTML may be the most used document format in the world today.

Another, newer, markup language that is currently growing in importance is XML (Extensible Markup Language). Unlike HTML which uses a set of "known" tags, XML allows you to create any tag you wish (thus it's extensible) and then describe those tags in a meta-language known as the "DTD" (Document Type Definition). XML is similar to the concept of SGML, and in fact XML is a subset of SGML in general terms. The main purpose of XML (as opposed to using SGML) is to keep the system simpler by focussing on a particular problem--documents on the internet. By doing so they hope to avoid the feature-creep that complicated SGML.


A common feature of many markup languages is that they intermix the text of a document with markup instructions in the same data stream or file. Here, for example, is a small section of text marked up in HTML:

 <h1> Anatidae </h1>
 The family <i>Anatidae</i> includes ducks, geese, and swans,
 but <em>not</em> the closely-related screamers.

The codes enclosed in angle-brackets <like this> are markup instructions, while the text between these instructions is the actual text of the document. The codes "h1", "p", and "em" are examples of structural markup, in that they describe the intended purpose or meaning of the text they include. Specifically, "h1" means "this is a first-level heading", "p" means "this is a paragraph", and "em" means "this is an emphasized word". A device reading such structural markup may apply its own rules or styles for presenting it, using larger type, boldface, indentation, or whatever style it prefers. The "i" instruction is an example of presentational markup. It specifies the exact appearance of the text (in this case, the use of an italic typeface) without specifying the reason for that appearance.

For the humanities, the Text Encoding Initiative (TEI) has published some guidelines how to encode texts.

See also


References TEI guidelines (http://www.tei-c.org/Guidelines2/index)

All Wikipedia text is available under the terms of the GNU Free Documentation License

  Search Encyclopedia

Search over one million articles, find something about almost anything!
  Featured Article
London Borough of Enfield

... issue for the British Army for many years. Enfield includes the areas: Bowes Park[?] Bulls Cross[?] Bush Hill Park[?] Edmonton Enfield Town[?] Enfield ...