# 4 Flashcards

Question 1

Q

How to decide the n-gram size?

Answer

A

Use the dev set
Find the parameters that minimize perplexity on the dev set

Question 2

Q

What is the relationship between words that don’t occur/ occur rarely in the finite corpus with the anount of n-grams?

Answer

A

The n-grams containing the rare words are a lot

Question 3

Q

Smoothing (Discounting)

Answer

A

The entropy if continuation is rising

Question 4

Q

Smoothing techniques

Answer

A

Laplace technique

Question 5

Q

Laplace technique

Answer

A

Add 1 to all frequency counts before normalization ( avoid impossible situations)

Question 6

Q

Backoff

Answer

A

Recursively fall back on the smaller n-grams

Question 7

Q

What we need to take into consideration when using backoff

Answer

A

We need to discount ML estimates

Question 8

Q

Smoothing ( discounting)

Answer

A

Relies on observed transitions to estimate unobserved ones

Question 9

Q

Which set you use to make decisons? ( decide the n-gram size)

Question 10

Q

How to choose the parameters for the dev set?

Answer

A

Find the parameters which minimize perplexity on the dev set

Question 11

Q

Perplexity

Answer

A

The interplet (inverse probability) between the test set and the language model

Question 12

Q

What is computed by the perplexity

Answer

A

The amount of surprise of the system

Question 13

Q

Trade off of perplexity

Answer

A

The higher the probability the model assign to valid sentences, the the better the LM is

Question 14

Q

What does it mean a too complex model given a data?

Answer

A

Higher sparsity, higher rate of OOV, unreliability in estimated probabilities

Question 15

Q

How to reduce parameters

Answer

A

Normalize (stamming, lemmatization)
Use smeantic cases
Consider words that occur once in the corpus, pretend that they don’t exist

Question 16

Q

language is …

Answer

Study These Flashcards

A

sequential (words are coming one after another)

Question 17

Q

usage of Lm

Answer

Study These Flashcards

A

-score sentences
- spelling correction

Question 18

Q

n-grams

Answer

Study These Flashcards

A

unigram (one-word)
two-grams (two word)

Question 19

Q

usage of n-grams

Answer

Study These Flashcards

A

-to featurize sequences and classify them
-predict the next word

Question 20

Q

the markov assumption in LM

Answer

Study These Flashcards

A

approximate the conditional prob of a word given its entire history

Question 21

Q

evaluation of LM

Answer

Study These Flashcards

A

instrinsic
extrinsic

Question 22

Q

What happens if we don’t use smoothing

Answer

Study These Flashcards

A

we assume that the most of the data in the test set is wrong

Question 23

Q

OOV rate

Answer

Study These Flashcards

A

% of words in the test set that never appear in the training

# 4 Flashcards

(23 cards)