Module 10 - Probabilities and Language Models Flashcards

Question 1

Q

Which of the following must be true of the probabilities of two events, A and B, if A is independent of B?

𝑃(𝐴∣𝐵)=𝑃(𝐴)
𝑃(𝐵∣𝐴)=𝑃(𝐵)
𝑃(𝐴∨𝐵)=𝑃(𝐴)+𝑃(𝐵)−𝑃(𝐴∧𝐵)
𝑃(𝐴∧𝐵)=𝑃(𝐴)𝑃(𝐵)

Answer

A

𝑃(𝐴∣𝐵)=𝑃(𝐴)
𝑃(𝐵∣𝐴)=𝑃(𝐵)
𝑃(𝐴∨𝐵)=𝑃(𝐴)+𝑃(𝐵)−𝑃(𝐴∧𝐵)
𝑃(𝐴∧𝐵)=𝑃(𝐴)𝑃(𝐵)

Question 2

Q

Which of the following are equivalent to the joint probability 𝑃(𝑋1,𝑋2,𝑋3,𝑋4)?

Check all that apply.

𝑃(𝑋1)𝑃(𝑋2∣𝑋1)𝑃(𝑋3∣𝑋2)𝑃(𝑋4∣𝑋3)
𝑃(𝑋1∣𝑋2,𝑋3,𝑋4)𝑃(𝑋4)𝑃(𝑋3∣𝑋4)𝑃(𝑋2∣𝑋3,𝑋4)
𝑃(𝑋2,𝑋1)𝑃(𝑋3∣𝑋2,𝑋1)𝑃(𝑋4∣𝑋3,𝑋2,𝑋1)
𝑃(𝑋1)𝑃(𝑋2∣𝑋1)𝑃(𝑋3∣𝑋2,𝑋1)𝑃(𝑋4∣𝑋3,𝑋2,𝑋1)

Answer

A

𝑃(𝑋1∣𝑋2,𝑋3,𝑋4)𝑃(𝑋4)𝑃(𝑋3∣𝑋4)𝑃(𝑋2∣𝑋3,𝑋4)
𝑃(𝑋2,𝑋1)𝑃(𝑋3∣𝑋2,𝑋1)𝑃(𝑋4∣𝑋3,𝑋2,𝑋1)
𝑃(𝑋1)𝑃(𝑋2∣𝑋1)𝑃(𝑋3∣𝑋2,𝑋1)𝑃(𝑋4∣𝑋3,𝑋2,𝑋1)

Chain rule.

Question 3

Q

T/F
In a bigram model, one assumes that words w{i} and w{i - 2} are independent for i > 2.

Answer

A

True

Bigram model only considers the w{i - 1}.

Question 4

Q

When we compute a tri-gram, we normalize the following to add up to 1:

Answer

A

the probabilities of all words w given w{i - 1}, w{i - 2}

Trigram model considers the previous two words.

Question 5

Q

How can we estimate the probability of a sentence P(w1, w2, …, wN)?

Answer

A

By applying the chain rule.

Definition of conditional probability.

Question 6

Q

What is the effect of the Markov assumption in n-gram language models?

Answer

A

It makes it possible to estimate the probabilities from data.

N-gram could learn the conditional probability from the document.

(6 cards)