Encyclopedia > Compiling

Article Content

Compiler

Redirected from Compiling

A compiler is a computer program that translates a computer program written in one computer language (called the source language) into a program written in another computer language (called the output or the target language).

Usually the translation is from a source code (generally a high level language) to a target code (generally a low level object code or machine language) that may be directly executed by a computer or a virtual machine. However a compiler from a low level language to a high level one is also possible; this is normally known as a decompiler[?] if it is reconstructing a high level language which (could have) generated the low level language. Compilers also exist which translate from one high level language to another, or sometimes to an intermediate language that still needs further processing; these are sometimes known as Cascaders[?].

Typical compilers output so-called objects that basically contain machine code augmented by information about the name and location of entry points and external calls (to functions not contained in the object). A set of object files, which need not have all come from a single compiler provided that the compilers used share a common output format, may then be linked together to create the final executable which can be run directly by a user.

In the past, compilers were divided into many passes to save space. When each pass is finished, the compiler can free the space needed during that pass.

Table of contents

1 Compiler design

2 Compiler frontend

3 Compiler backend

4 Compiled vs. Interpreted languages

5 Further reading

Compiler design

Modern compilers share a common 'two stage' design. The first stage, the 'compiler frontend' translates the source language into an intermediate representation. The second stage, the 'compiler backend' works with the internal representation to produce code in the output language. While compiler design is believed to be a complex task, this approach allows the designers to exchange either the frontend or backend to retarget the compiler's source or output language respectively. This way, modern compilers are often portable and allow multiple dialects of a language to be compiled.

Compiler frontend

The compiler frontend consists of multiple phases itself, each informed by formal language theory:

Lexical analysis - breaking the source code text into small pieces ('tokens' or 'terminals'), each representing a single atomic unit of the language, for instance a keyword, identifier or symbol names. The token language is typically a regular language, so a finite state automaton constructed from a regular expression can be used to recognize it. This phase is also called lexing or scanning.
Syntax analysis - Identifying syntactic structures of source code. It only focuses on the structure. In other words, it identifies the order of tokens and understand hierarchical structures in code.
Semantic analysis is to recognize the meaning of program code and start to prepare for output. In that phrase, type checking is done and most of compiler errors show up.
Intermediate language generation[?] - an equivalent to the original program is created in an intermediate language.

Compiler backend

While there are applications where only the compiler frontend is necessary, such as static language verification tools, a real compiler hands the intermediate representation generated by the frontend to the backend, which produces a functional equivalent program in the output language. This is done in multiple steps:

Optimization - the intermediate language representation is transformed into functionaly equivalent but faster (or smaller) forms.
Code generation - the transformed intermediate language is translated into the output language, usually the native machine language of the system. This involves resource and storage decisions, such as deciding which variables to fit into registers and memory and the selection and scheduling of appropriate machine instructions[?].

Compiled vs. Interpreted languages

Many people divide higher level programming languages into two categories: compiled languages and interpreted languages. However, in fact most of these languages can be implemented either through compilation or interpretation, the categorisation merely reflecting which method is most commonly used to implement that language. (Some interpreted languages, however, cannot easily be implemented through compilation, especially those which allow self-modifying code.)

Compilers: Principles, Techniques and Tools by Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman is considered to be the standard authority on compiler basics, and makes a good primer for the techniques mentioned above. (It is often called the Dragon book because of the picture on its cover showing a Knight of Programming fighting the Dragon of Compiler Design.) [1] (http://www.aw.com/catalog/academic/product/0,4096,0201100886,00)

During the 1990s a large number of free compilers and compiler development tools have been developed for all kinds of languages, both as part of the GNU project and other open-source initiatives. Some of them are considered to be of high quality and their free source code makes a nice read for anyone interested in modern compiler concepts.

All Wikipedia text is available under the terms of the GNU Free Documentation License

Search Encyclopedia

Search over one million articles, find something about almost anything!