Word Level Analysis 3 Flashcards
Definition of “Text Corpus”
A large body of text
What type of corpus is Brown Corpus?
Categorised
one category per document, categories do not overlap
What type of corpus is Reuters Corpus?
Overlapping
multiple categories per document, categories overlap
What type of corpora are gutenberg,webtext and udhr?
Isolated
What type of corpus is inaugural?
Temporal
Name three text corpora available in NLTK.
Gutenberg, brown and inaugural
What does this fileids() get you ?
the files of the corpus
What does this fileids([categories]) get you ?
the files of the corpus corresponding to these categories
What does this categories() get you ?
the categories of the corpus
What does this categories([fileids]) get you ?
the categories of the corpus corresponding to these files
What does this raw(fileids=[f1,f2,f3]) get you ?
the raw content of the specified files
What does this raw(categories=[c1,c2]) get you ?
the raw content of the specified categories
What does this raw() get you ?
the raw content of the corpus
What does this words() get you ?
the words of the whole corpus
What does words(fileids=[f1,f2,f3]) this get you ?
the words of the specified files