Corpus Linguistics is the study of language as expressed in samples
(corpora) or "real world" text. The approach runs counter to
Noam Chomsky's view that real language is riddled with
performance-related errors, thus requiring careful analysis of small speech samples obtained in a highly controlled laboratory setting. Corpus Linguistics does away with Chomsky's
competence/
performance split, viewing that we can only ever reliably analyse language if the researcher does not interfere.
In some areas there is an overlap with computational linguistics, as the latter moves towards language processing applications. This means dealing with real input data, where descriptions based on a linguist's intuition are not usually helpful.
The COBUILD dictionaries, designed for users learning English as a foreign language, are based on corpus linguistics; definitions are based on how words are used rather than on historical definitions of their meaning.
Some keywords:
Some links:
- The Centre for Corpus Linguistics at Birmingham University:
http://www.corpus.bham.ac.uk/
All Wikipedia text
is available under the
terms of the GNU Free Documentation License