is the study of language as expressed in samples (corpora)
or "real world" text. The approach runs counter to Noam Chomsky
's view that real language is riddled with performance
-related errors, thus requiring careful analysis of small speech samples obtained in a highly controlled laboratory setting. Corpus Linguistics does away with Chomsky's competence
split, viewing that we can only ever reliably analyse language if the researcher does not interfere.
In some areas there is an overlap with computational linguistics, as the latter moves towards language processing applications. This means dealing with real input data, where descriptions based on a linguist's intuition are not usually helpful.
The COBUILD dictionaries, designed for users learning English as a foreign language, are based on corpus linguistics; definitions are based on how words are used rather than on historical definitions of their meaning.
- The Centre for Corpus Linguistics at Birmingham University:
All Wikipedia text
is available under the
terms of the GNU Free Documentation License