N-gram Language Model 4 Flashcards

Question 1

Q

What is the goal of a language model?

Answer

A

Assign a probability to a sentence

Question 2

Q

Give three reasons why assigning a probability to a sentence is important?

Answer

A

Why important
• Machine translation: P(high winds
tonight) > P(large winds tonight)
• Spell correction: P(15 minutes walk) >
P(15 minuets walk)
• Speech Recognition: P(I like the pink
flower) > P(I like the pink flour)

Question 3

Q

A model that computes the probability of a sentence, a sequence of words or the probability of an upcoming word is called a?

Answer

A

language model

Question 4

Q

A language model (LM) can be viewed as a?

Answer

A

probabilistic grammar

Question 5

Q

What is the Markov Assumption?

Answer

A

Markov property refers to the memoryless property of a stochastic process. A stochastic process has the Markov property if the conditional probability distribution of future states of the process (conditional on both past and present states) depends only upon the present state, not on the sequence of events that preceded it.

Simplifying the assumption by only considering a few words before
the target word

Question 6

Q

How do you calculate the probability of a word or sentence in a unigram model?

Answer

A

Multiply the probability of individual tokens
The word sequence information is completely ignored
Count the frequency that word appears in the corpus to get it’s probability, then multiply all the probabilities
FreqDict() function from NLTK

P(w1, w2, …, wn) = ∏i P(wi)

Question 7

Q

What are the limitations of the N-Gram Language Model?

Answer

A

Only considers words in n-width windows, ignores long-distance n-dependencies between words.

“The computer which I had put into the machine room on the fifth floor crashed.”

But in practice, 2/3-grams LM strikes a good trade-off between cost and performance

Question 8

Q

Possible ways to estimate probability?

Why they can’t be done.

Answer

A

Count and divide
P(W) = Count(W)/Total_Token_Num
P(know | As, far, as, I) = Count(As far as I know) / Count(As far as I)

• Impossible in practice
• Too many possible sentences
• We will never have enough data for estimating all conditional probabilities,
especially with the increase of the text length

Question 9

Q

How do you calculate the probability of a word or sentence N-gram Language model?

Answer

A

P(w1, w2, …, wn) = ∏i P(wi | wi-1, …, wi-k+1)

(Only use k for trigram or higher)

• Count the frequency that words appears in the corpus in the order it’s in to get it’s probabilities

Question 10

Q

How do you calculate the probability of a word or sentence in a bigram model?

Answer

A

P(w1, w2, …, wn) = ∏i P(wi | wi-1)

• Count the frequency that words appears in the corpus in the order it’s in to get it’s probabilities

Question 11

Q

What could you do with an N-gram language model?

Answer

A

Given a language model, you can :
• Given two sentences, estimate which is more likely to appear (i.e. which has higher probability)
• Given a few words, and generate the following words
• P(new word | existing n-1 words)

Question 12

Q

How you would build an N-gram LM in Python?

Answer

A

Tokenize text and extract n- grams

Build the LM by count and divide

N-gram Language Model 4 Flashcards

(12 cards)