N-gram Modeling Flashcards
Probability of a Sentence
Multiplying together the probability of each word occurring after the previous word AND its context.
Count-based Models
To get probability of word, using corpus, count up frequency of word occurrence (can include given previous word) and divide by total number of words.
Count-based Model Formula
P(wi∣wi−n+1 ,…,wi−1)=
C(wi−n+1,…,wi−1,wi)/ C(wi−n+1,…,wi−1)
Probability of a Sentence Formula
P(S)=P(w1)⋅P(w2∣w1)⋅P(w3∣w2)⋅…⋅P(wn∣wn−1)
Perplexity
Inverse probability of the sentence normalized by the number of words in the sentence.
HIGHER means less confident, found more possible words
LOWER means more confident, found less possible words
Ways to Evaluate Models
- Log-likelihood
- Per-word Log Likelihood
- Per-word (cross) Entropy *
- Perplexity ***
Ways to Sample Text with LMs
- Greedy decoding
- Beam search
- Nucleus sampling
- Top-k sampling
Greedy Decoding
Choose next word with highest probability of occurring after previous word.
Problem: Most probable next word doesn’t lead to most probable sentence.
Beam Search
Maintain several paths for most probably sentences. 1. Choose two most probable words
2. Calculate total probability of sentence
3. Choose two most probable words for each sentence
4. Eliminate 2 least likely sentences
5. Repeat
Problem: too many similar sequences, too close to optimal
Nucleus Sampling
Take top p% of distribution, sample within that
Top-k Sampling
Take top k most likely words, sample from those