Encyclopedia > Corpus

  Article Content

Corpus

In law a corpus (Latin: "body") is a set, a collection of documents and sources. See Corpus Juris Civilis.


In linguistics, corpus (plural corpora) is a large and structured set of texts (now usually electronically stored and processed). A corpus may contain single texts in single language (monolingual corpus) or text data in multiple languages (multilingual corpus). Multilingual corpora that have been specially formatted for side-by-side comparison are called parallel corpora.

In order to make the corpora more useful for doing linguistic research, they are often subjected to a process known as part-of-speech tagging[?], or POS-tagging, in which information about each word's part of speech (verb, noun, adjective, etc.) are added to the corpus in the form of tags. In general, any information added to a corpus is called tagging.

Corpora (plural for corpus) are the main knowledge base in corpus linguistics.

Links:



All Wikipedia text is available under the terms of the GNU Free Documentation License

 
  Search Encyclopedia

Search over one million articles, find something about almost anything!
 
 
  
  Featured Article
Jamesport, New York

... and 0.46% from two or more races. 6.36% of the population are Hispanic or Latino of any race. There are 605 households out of which 26.1% have children under the age of 18 ...

 
 
 
This page was created in 37.8 ms