# 4 Flashcards
How to decide the n-gram size?
Use the dev set
Find the parameters that minimize perplexity on the dev set
What is the relationship between words that don’t occur/ occur rarely in the finite corpus with the anount of n-grams?
The n-grams containing the rare words are a lot
Smoothing (Discounting)
The entropy if continuation is rising
Smoothing techniques
Laplace technique
Laplace technique
Add 1 to all frequency counts before normalization ( avoid impossible situations)
Backoff
Recursively fall back on the smaller n-grams
What we need to take into consideration when using backoff
We need to discount ML estimates
Smoothing ( discounting)
Relies on observed transitions to estimate unobserved ones
Which set you use to make decisons? ( decide the n-gram size)
Dev set
How to choose the parameters for the dev set?
Find the parameters which minimize perplexity on the dev set
Perplexity
The interplet (inverse probability) between the test set and the language model
What is computed by the perplexity
The amount of surprise of the system
Trade off of perplexity
The higher the probability the model assign to valid sentences, the the better the LM is
What does it mean a too complex model given a data?
Higher sparsity, higher rate of OOV, unreliability in estimated probabilities
How to reduce parameters
Normalize (stamming, lemmatization)
Use smeantic cases
Consider words that occur once in the corpus, pretend that they don’t exist