Encyclopedia > Machine translation

  Article Content

Machine translation

Machine translation (MT) is the process of automatic translation from one natural language to another by a computer. One of the very earliest pursuits in computer science, MT has proved to be an elusive goal, but today a number of systems are available which produce output which, if not perfect, is of sufficient quality to be useful in a number of specific domains or applications.

Translation is anything but simple. It's not a mere substitution for each word, but being able to know "all of the words" in a given sentence or phrase and how one may influence the other. Human languages consist of morphology (the way words are built up from small meaning-bearing units), syntax (sentence structure), and semantics (meaning). Even simple texts can be filled with ambiguities.

It is often argued that the problem of machine translation requires the problem of natural language understanding to be solved first. However, a number of heuristic methods of machine translation work surprisingly well, including:

In general terms, rule-based methods (the first three) will parse a text, usually creating an intermediary, symbolic representation, from which it then generates text in the target language. This approach requires extensive lexicons with morphologic, syntactic, and semantic information, and large sets of rules.

Statistical-based methods (the last two) eschew manual lexicon building and rule-writing and instead try to generate translations based on bilingual text corpora, such as the Canadian Hansard corpus, the English-French record of the Canadian parliament. Where such corpora are available, impressive results can be achieved translating texts of a similar kind, but such corpora are still very rare.

Given enough data, most MT programs work well enough for a native speaker of one language to get the approximate meaning of what is written by the other native speaker. The difficulty is getting enough data of the right kind to support the particular method. The large multilingual corpus of data needed for statistical methods to work isn't necessary for the grammar based methods, for example. But then, the grammar methods need a skilled linguist to carefully design the grammar that they use.

Despite their inherent limitations, MT programs are currently used by various organizations around the world. Probably the largest institutional user is the European Commission, which uses a highly customized version of SYSTRAN to handle the automatic translation of preliminary drafts of documents for internal use.

See also:

Free Software

External Links

All Wikipedia text is available under the terms of the GNU Free Documentation License

  Search Encyclopedia

Search over one million articles, find something about almost anything!
  Featured Article