C1 Flashcards

Question 1

Q

3 types of text mining tasks

Answer

A

Question 2

Q

4 challenges of text data

Answer

A

Question 3

Q

bag of words model

Answer

A

text as classification object
each word becomes a feature
each term in collection becomes a dimension in the vector space
only a few of all words occur in a given document => high dimensional, sparse vectors

Question 4

Q

word embeddings

Answer

A

Question 5

Q

evaluation metrics

Answer

A

Question 6

Q

precision versus recall bij terroristen schatten

Answer

A

precision: hoe veel geschatte terroristen waren niet echt terrorist

recall: hoeveel terroristen heb je gemist door ze niet als terrorist te schatten

Question 7

Q

text mining

Answer

A

automatic extraction of knowledge from text

Question 8

Q

text mining pipeline for discovering side effects for hypertension medications

Answer

A

Question 9

Q

Zipf’s law

Answer

A

Given a text collection, the frequency of any word is inversely proportional to its rank in the frequency table

Question 10

Q

extrinsic evaluation

Answer

A

evaluation of complete application
- human vs. automatic
- are humans helped/satisfied by the results?

Question 11

Q

intrinsic evaluation

Answer

A

evaluation of the components: ground truth labels needed
- existing labels in the data
- human-assigned labels in the data

(11 cards)