Corpus Design Flashcards
1
Q
Types of corpora
A
- Specialized corpus – e.g.
• genre: the language of newspapers
• time: 2005 to the present day
• place: just texts published in Germany - General/Reference corpus – needs to be much larger. e.g.
• The British National Corpus (BNC) has about 100 million words of
• spoken and written British English - Multilingual (comparable) corpus – e.g.
• English and German – texts of same genre/register/topics etc. GeCCO corpus. - Parallel corpus – e.g.
• English and German – exactly the same texts translated. E.g. the GeCCO corpus. - Learner corpus – • language use created by people learning a particular language. E.g. the International
Corpus of Learner English. - Historical or Diachronic corpus – e.g.
• Helsinki corpus – 1.5 million words of texts from 700AD to 1700AD.
• RSC corpus – 100 million words of texts from scientific English 1665-1996 - Monitor corpus – continually being added to, e.g.
• the Bank of English.
• DEREKO corpus (German)