N-gram Language Model 4 Flashcards
What is the goal of a language model?
Assign a probability to a sentence
Give three reasons why assigning a probability to a sentence is important?
Why important • Machine translation: P(high winds tonight) > P(large winds tonight) • Spell correction: P(15 minutes walk) > P(15 minuets walk) • Speech Recognition: P(I like the pink flower) > P(I like the pink flour)
A model that computes the probability of a sentence, a sequence of words or the probability of an upcoming word is called a?
language model
A language model (LM) can be viewed as a?
probabilistic grammar
What is the Markov Assumption?
Markov property refers to the memoryless property of a stochastic process. A stochastic process has the Markov property if the conditional probability distribution of future states of the process (conditional on both past and present states) depends only upon the present state, not on the sequence of events that preceded it.
Simplifying the assumption by only considering a few words before
the target word
How do you calculate the probability of a word or sentence in a unigram model?
- Multiply the probability of individual tokens
- The word sequence information is completely ignored
- Count the frequency that word appears in the corpus to get it’s probability, then multiply all the probabilities
- FreqDict() function from NLTK
P(w1, w2, …, wn) = ∏i P(wi)
What are the limitations of the N-Gram Language Model?
Only considers words in n-width windows, ignores long-distance n-dependencies between words.
“The computer which I had put into the machine room on the fifth floor crashed.”
But in practice, 2/3-grams LM strikes a good trade-off between cost and performance
Possible ways to estimate probability?
Why they can’t be done.
Count and divide
P(W) = Count(W)/Total_Token_Num
P(know | As, far, as, I) = Count(As far as I know) / Count(As far as I)
• Impossible in practice
• Too many possible sentences
• We will never have enough data for estimating all conditional probabilities,
especially with the increase of the text length
How do you calculate the probability of a word or sentence N-gram Language model?
P(w1, w2, …, wn) = ∏i P(wi | wi-1, …, wi-k+1)
(Only use k for trigram or higher)
• Count the frequency that words appears in the corpus in the order it’s in to get it’s probabilities
How do you calculate the probability of a word or sentence in a bigram model?
P(w1, w2, …, wn) = ∏i P(wi | wi-1)
• Count the frequency that words appears in the corpus in the order it’s in to get it’s probabilities
What could you do with an N-gram language model?
Given a language model, you can :
• Given two sentences, estimate which is more likely to appear (i.e. which has higher probability)
• Given a few words, and generate the following words
• P(new word | existing n-1 words)
How you would build an N-gram LM in Python?
Tokenize text and extract n- grams
Build the LM by count and divide