Encyclopedia > Corpus

  Article Content

Corpus

In law a corpus (Latin: "body") is a set, a collection of documents and sources. See Corpus Juris Civilis.


In linguistics, corpus (plural corpora) is a large and structured set of texts (now usually electronically stored and processed). A corpus may contain single texts in single language (monolingual corpus) or text data in multiple languages (multilingual corpus). Multilingual corpora that have been specially formatted for side-by-side comparison are called parallel corpora.

In order to make the corpora more useful for doing linguistic research, they are often subjected to a process known as part-of-speech tagging[?], or POS-tagging, in which information about each word's part of speech (verb, noun, adjective, etc.) are added to the corpus in the form of tags. In general, any information added to a corpus is called tagging.

Corpora (plural for corpus) are the main knowledge base in corpus linguistics.

Links:



All Wikipedia text is available under the terms of the GNU Free Documentation License

 
  Search Encyclopedia

Search over one million articles, find something about almost anything!
 
 
  
  Featured Article
Denis Sassou-Nguesso

... of vice-president of the CMP. He remained there until February 1979 when Yhomby-Opango was forced from power in a technical coup accused of corruption and political ...