Module 10 - Probabilities and Language Models Flashcards

1
Q

Which of the following must be true of the probabilities of two events, A and B, if A is independent of B?

𝑃(𝐴∣𝐵)=𝑃(𝐴)
𝑃(𝐵∣𝐴)=𝑃(𝐵)
𝑃(𝐴∨𝐵)=𝑃(𝐴)+𝑃(𝐵)−𝑃(𝐴∧𝐵)
𝑃(𝐴∧𝐵)=𝑃(𝐴)𝑃(𝐵)

A

𝑃(𝐴∣𝐵)=𝑃(𝐴)
𝑃(𝐵∣𝐴)=𝑃(𝐵)
𝑃(𝐴∨𝐵)=𝑃(𝐴)+𝑃(𝐵)−𝑃(𝐴∧𝐵)
𝑃(𝐴∧𝐵)=𝑃(𝐴)𝑃(𝐵)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Which of the following are equivalent to the joint probability 𝑃(𝑋1,𝑋2,𝑋3,𝑋4)?

Check all that apply.

𝑃(𝑋1)𝑃(𝑋2∣𝑋1)𝑃(𝑋3∣𝑋2)𝑃(𝑋4∣𝑋3)
𝑃(𝑋1∣𝑋2,𝑋3,𝑋4)𝑃(𝑋4)𝑃(𝑋3∣𝑋4)𝑃(𝑋2∣𝑋3,𝑋4)
𝑃(𝑋2,𝑋1)𝑃(𝑋3∣𝑋2,𝑋1)𝑃(𝑋4∣𝑋3,𝑋2,𝑋1)
𝑃(𝑋1)𝑃(𝑋2∣𝑋1)𝑃(𝑋3∣𝑋2,𝑋1)𝑃(𝑋4∣𝑋3,𝑋2,𝑋1)

A

𝑃(𝑋1∣𝑋2,𝑋3,𝑋4)𝑃(𝑋4)𝑃(𝑋3∣𝑋4)𝑃(𝑋2∣𝑋3,𝑋4)
𝑃(𝑋2,𝑋1)𝑃(𝑋3∣𝑋2,𝑋1)𝑃(𝑋4∣𝑋3,𝑋2,𝑋1)
𝑃(𝑋1)𝑃(𝑋2∣𝑋1)𝑃(𝑋3∣𝑋2,𝑋1)𝑃(𝑋4∣𝑋3,𝑋2,𝑋1)

Chain rule.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

T/F
In a bigram model, one assumes that words w{i} and w{i - 2} are independent for i > 2.

A

True

Bigram model only considers the w{i - 1}.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

When we compute a tri-gram, we normalize the following to add up to 1:

A

the probabilities of all words w given w{i - 1}, w{i - 2}

Trigram model considers the previous two words.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How can we estimate the probability of a sentence P(w1, w2, …, wN)?

A

By applying the chain rule.

Definition of conditional probability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the effect of the Markov assumption in n-gram language models?

A

It makes it possible to estimate the probabilities from data.

N-gram could learn the conditional probability from the document.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly