text mining Flashcards

1
Q

terminology of text mining

A

a token/term: a word or a group of words
a document; one piece of text
a corpus; a collection of documents

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Document term matrix

A

each document is a row and each term is a column

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

term document matrix

A

each term is a row and each document is a column

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

methods to clean and preprocess text

A
  1. case normalization
  2. remove punctuation
  3. remove numbers
    remove stopwords
How well did you know this?
1
Not at all
2
3
4
5
Perfectly