Corpus Design Flashcards

1
Q

Types of corpora

A
  1. Specialized corpus – e.g.
    • genre: the language of newspapers
    • time: 2005 to the present day
    • place: just texts published in Germany
  2. General/Reference corpus – needs to be much larger. e.g.
    • The British National Corpus (BNC) has about 100 million words of
    • spoken and written British English
  3. Multilingual (comparable) corpus – e.g.
    • English and German – texts of same genre/register/topics etc. GeCCO corpus.
  4. Parallel corpus – e.g.
    • English and German – exactly the same texts translated. E.g. the GeCCO corpus.
  5. Learner corpus – • language use created by people learning a particular language. E.g. the International
    Corpus of Learner English.
  6. Historical or Diachronic corpus – e.g.
    • Helsinki corpus – 1.5 million words of texts from 700AD to 1700AD.
    • RSC corpus – 100 million words of texts from scientific English 1665-1996
  7. Monitor corpus – continually being added to, e.g.
    • the Bank of English.
    • DEREKO corpus (German)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly