#2 Flashcards

1
Q

Intrinsic evaluation

A

define a metric and check which system does the best

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Extrinsic evaluation

A

check how much the performance is improved when u use the output as an input to a larger system

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

issues with NB Classifier

A

a missing feature
out-of-vocabulary words
stop words

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

how you can deal with a missing feature in the dataset

A

smoothing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

stop words

A

very frequent, uninformative words

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

smoothing

A

add 1 to the count of features

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

generative classifier

A

learn a model how the data is generated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

discriminative classifier

A

learn which features best predict a certain class

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How to prevent overfitting

A

by fine-tunning, K-fold-cross validation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Prior probability

A

represents your initial belief about the likelihood of the hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

likelihood

A

the evidence that an instance has been generated by a given class

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Posterior probability

A

represents your updated belief about the likelihood of the hypothesis after taking into account the evidence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How NB treats a text document

A

Bag of Words

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Advantage of Rule systems

A
  1. Robust
  2. no need of large dataset
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Disadvantage of Rule systems

A
  1. Expensive to write
  2. Require domain knowledge
  3. Rigit when dealing with ambiguity
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

4 terms of learning

A

generalize (understanding new info) ,

infer( use it within different situations),

analogize (interpret the results),

adaptive ( the moving system adapts quickly when the environment is changing)

17
Q

Discriminative classifier

A

Learn which features best predict a certain class

18
Q

Text simplification

A

Mapping a text into another text

19
Q

main difference between intrinsic and extrinsic evaluation

  • what was your algorithm trained to do
A

Intrinsic: how well the system is doing what’s trained to do

Extrinsic: using the system in a larger environment, how well the algorithm is performing on a bigger environment

20
Q

Does bag of words crack co-occurence

A

No, bag of words does not crack co-occurrence

21
Q

Why NBC is a “naive”

A

because assumes that the features are conditionally independent of each other

22
Q

out-of-vocabulary words

A

words that appear at the test set, but not in the training set

23
Q

how to treat OOV words?

A

-ignore them
- make a dedicated feature in the training set