Encyclopedia > Corpus

Article Content

Corpus

In law a corpus (Latin: "body") is a set, a collection of documents and sources. See Corpus Juris Civilis.

In linguistics, corpus (plural corpora) is a large and structured set of texts (now usually electronically stored and processed). A corpus may contain single texts in single language (monolingual corpus) or text data in multiple languages (multilingual corpus). Multilingual corpora that have been specially formatted for side-by-side comparison are called parallel corpora.

In order to make the corpora more useful for doing linguistic research, they are often subjected to a process known as part-of-speech tagging[?], or POS-tagging, in which information about each word's part of speech (verb, noun, adjective, etc.) are added to the corpus in the form of tags. In general, any information added to a corpus is called tagging.

Corpora (plural for corpus) are the main knowledge base in corpus linguistics.

Links:

WebCorp - The Web as a Corpus: http://www.webcorp.org.uk/

All Wikipedia text is available under the terms of the GNU Free Documentation License

Search Encyclopedia

Search over one million articles, find something about almost anything!