Module 10 - Probabilities and Language Models Flashcards
Which of the following must be true of the probabilities of two events, A and B, if A is independent of B?
𝑃(𝐴∣𝐵)=𝑃(𝐴)
𝑃(𝐵∣𝐴)=𝑃(𝐵)
𝑃(𝐴∨𝐵)=𝑃(𝐴)+𝑃(𝐵)−𝑃(𝐴∧𝐵)
𝑃(𝐴∧𝐵)=𝑃(𝐴)𝑃(𝐵)
𝑃(𝐴∣𝐵)=𝑃(𝐴)
𝑃(𝐵∣𝐴)=𝑃(𝐵)
𝑃(𝐴∨𝐵)=𝑃(𝐴)+𝑃(𝐵)−𝑃(𝐴∧𝐵)
𝑃(𝐴∧𝐵)=𝑃(𝐴)𝑃(𝐵)
Which of the following are equivalent to the joint probability 𝑃(𝑋1,𝑋2,𝑋3,𝑋4)?
Check all that apply.
𝑃(𝑋1)𝑃(𝑋2∣𝑋1)𝑃(𝑋3∣𝑋2)𝑃(𝑋4∣𝑋3)
𝑃(𝑋1∣𝑋2,𝑋3,𝑋4)𝑃(𝑋4)𝑃(𝑋3∣𝑋4)𝑃(𝑋2∣𝑋3,𝑋4)
𝑃(𝑋2,𝑋1)𝑃(𝑋3∣𝑋2,𝑋1)𝑃(𝑋4∣𝑋3,𝑋2,𝑋1)
𝑃(𝑋1)𝑃(𝑋2∣𝑋1)𝑃(𝑋3∣𝑋2,𝑋1)𝑃(𝑋4∣𝑋3,𝑋2,𝑋1)
𝑃(𝑋1∣𝑋2,𝑋3,𝑋4)𝑃(𝑋4)𝑃(𝑋3∣𝑋4)𝑃(𝑋2∣𝑋3,𝑋4)
𝑃(𝑋2,𝑋1)𝑃(𝑋3∣𝑋2,𝑋1)𝑃(𝑋4∣𝑋3,𝑋2,𝑋1)
𝑃(𝑋1)𝑃(𝑋2∣𝑋1)𝑃(𝑋3∣𝑋2,𝑋1)𝑃(𝑋4∣𝑋3,𝑋2,𝑋1)
Chain rule.
T/F
In a bigram model, one assumes that words w{i} and w{i - 2} are independent for i > 2.
True
Bigram model only considers the w{i - 1}.
When we compute a tri-gram, we normalize the following to add up to 1:
the probabilities of all words w given w{i - 1}, w{i - 2}
Trigram model considers the previous two words.
How can we estimate the probability of a sentence P(w1, w2, …, wN)?
By applying the chain rule.
Definition of conditional probability.
What is the effect of the Markov assumption in n-gram language models?
It makes it possible to estimate the probabilities from data.
N-gram could learn the conditional probability from the document.