# 4 Flashcards

1
Q

How to decide the n-gram size?

A

Use the dev set
Find the parameters that minimize perplexity on the dev set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the relationship between words that don’t occur/ occur rarely in the finite corpus with the anount of n-grams?

A

The n-grams containing the rare words are a lot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Smoothing (Discounting)

A

The entropy if continuation is rising

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Smoothing techniques

A

Laplace technique

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Laplace technique

A

Add 1 to all frequency counts before normalization ( avoid impossible situations)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Backoff

A

Recursively fall back on the smaller n-grams

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What we need to take into consideration when using backoff

A

We need to discount ML estimates

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Smoothing ( discounting)

A

Relies on observed transitions to estimate unobserved ones

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Which set you use to make decisons? ( decide the n-gram size)

A

Dev set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How to choose the parameters for the dev set?

A

Find the parameters which minimize perplexity on the dev set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Perplexity

A

The interplet (inverse probability) between the test set and the language model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is computed by the perplexity

A

The amount of surprise of the system

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Trade off of perplexity

A

The higher the probability the model assign to valid sentences, the the better the LM is

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does it mean a too complex model given a data?

A

Higher sparsity, higher rate of OOV, unreliability in estimated probabilities

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How to reduce parameters

A

Normalize (stamming, lemmatization)
Use smeantic cases
Consider words that occur once in the corpus, pretend that they don’t exist

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

language is …

A

sequential (words are coming one after another)

17
Q

usage of Lm

A

-score sentences
- spelling correction

18
Q

n-grams

A

unigram (one-word)
two-grams (two word)

19
Q

usage of n-grams

A

-to featurize sequences and classify them
-predict the next word

20
Q

the markov assumption in LM

A

approximate the conditional prob of a word given its entire history

21
Q

evaluation of LM

A

instrinsic
extrinsic

22
Q

What happens if we don’t use smoothing

A

we assume that the most of the data in the test set is wrong

23
Q

OOV rate

A

% of words in the test set that never appear in the training