text mining Flashcards
1
Q
terminology of text mining
A
a token/term: a word or a group of words
a document; one piece of text
a corpus; a collection of documents
2
Q
Document term matrix
A
each document is a row and each term is a column
3
Q
term document matrix
A
each term is a row and each document is a column
4
Q
methods to clean and preprocess text
A
- case normalization
- remove punctuation
- remove numbers
remove stopwords