text_mining Flashcards

1
Q

what is text mining

A

The discovery of knowledge trough text analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what are the text characteristics

A

High dimensionality, unstrcutured form, not readily accessible to be use by computer , huge collection of document, words have position and position matters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what are the steps of text mining

A

preprocessing, feature extraction, feature selection, discovery and interpretation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

provide examples of preprocessing

A

stop words removal, stemming, punctuation marks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is transformation normalization

A

this include document representation in the vector space model
invese document frequency, Normalizeword frequency over documents
frequency damping, normalize word frequency within a document
a normalized frequency of a word if tf-idf this norlized frequency can later by used in similarut measurement

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what is captured by th-idf

A

meaning a word is less frequent in the corpus but frequent in a document then it is interesting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

in the inverse document frequency

A

the more a word appear in the document the less interesting it is or we may judge they are interesting so we use a frequency damping and we take the log

How well did you know this?
1
Not at all
2
3
4
5
Perfectly