probability and stats Flashcards
Explain P(A)P(B) in plain English
The probability of independent events A and B occurring.
Explain P(A and B) in plain English
The probability of events A and B occurring together.
And rule for independence:
P(A and B) = ?
P(A and B) = P(A)* P(B)
assumes A, B are independent events
And rule for dependence
P(A and B) = ?
P(A and B) = P(A) * P(B | A)
P(A and B) = P(B) * P(A | B)
assumes A, B are dependent events
Are independent events are mutually exclusive?
No! Common fallacy. Independent events are NOT mutually exclusive!
Or rule (generalized to mutually exclusive and non-mutually exclusive events): P(a or b) =
P(a) + P(b) - P(a and b)
Or rule for mutually exclusive events:
P(a or b) = ?
P(a or b) = P(a) + P(b)
What’s the definition of conditional probability?
P(A | B) = P(A and B) / P(B)
probability of A given that B occurred
(only valid when P(B) > 0)
P(B) is the total outcome space. You know B happened, so you’re in that space, hence it’s the denominator.
P(A | B) = 0.2
What is P(~A | B)?
P(~A | B) = 0.8
Realize that you’re summing over outcomes for A, not B!
Conditionalized version of Bayes theorem in the context of general background evidence E:
P(X | Y, E) = ?
P(X | Y, E) = P(X | E) * P(Y | X, E) / P(Y | E)
Conditionalized version of marginalization in the context of general background evidence E:
P(X | E) = ?
P(X | E) = sum over y of P(X, Y = y | E)
What does the following statement mean in plain english?
P(X | Y, E) = P(X|E)
X is conditionally independent of Y given E
What does the following statement mean in plain english?
P(Y | X, E) = P(Y | E)
X is conditionally independent of Y given E
What does the following statement mean in plain english?
P(X, Y | E) = P(X | E) P(Y | E)
X is conditionally independent of Y given E
Marginal independence: produce 2 other equivalent statements that imply each other:
P(X|Y) = P(X)
…
…
P(X|Y) = P(X)
P(Y|X) = P(Y)
P(X, Y) = P(X) P(Y)
All imply each other.
Conditional independence: produce 2 other equivalent statements that imply each other:
P(X|Y, E) = P(X | E)
…
…
P(X|Y, E) = P(X | E)
P(Y | X, E) = P(Y | E)
P(X, Y | E) = P(X | E) P(Y | E)
what’s a synonym for disjoint?
mutually exclusive
what’s a synonym for mutually exclusive?
disjoint
what does it mean when A is disjoint from B?
A and B cannot happen at the same time
How do you determine if events A and B are mutually exclusive?
P(A and B) = 0
Can you have a events A, B that are independent and disjoint of each other?
If at least one of the events has zero probability, then the two events can be mutually exclusive and independent simultaneously. Let A be the empty set, for example, and let B be any event. Then they are mutually exclusive (because their intersection is empty) and they are independent (because the probability of their intersection is equal to the product of their individual probabilities, which is zero).
However, if both events have non-zero probability, then they cannot be mutually exclusive and independent simultaneously. “Mutually exclusive” implies that the intersection of the two events has zero probability, but the events themselves have non-zero probabilities, so the product of their probabilities cannot be zero.
Can you have a events A, B that are independent and NOT disjoint of each other?
Yes, of course
Apply conditionalized bayes rule (flip C and D):
P(C|A,B,D) = …
Background evidence = A, B
P(C | A,B,D) = P(D | C,A,B) * P(C | A, B) / P(D | A, B)
What is a prior probability?
a probability w/o conditions
e.g. P(A)
What is a posterior probability?
a probability w/ conditions
e.g. P(A | B)
(B) —–> (A)
Which node is parent and which is child?
B causes A. A is the child; B is the parent.
(parent) —-> (child)
A parent ‘causes’ the child
Does independence imply conditional independence? Are they the same concept?
No.
https://math.stackexchange.com/questions/22407/independence-and-conditional-independence-between-random-variables
Define Unigram model Pu(S)
where S is a sequence of letters
Pu(S) = Product from l=1 to L of P1(character in sequence at l)
Define bigram model Pb(S)
where S is a sequence of letters
Pb(S) = Pu(character at l=1) * (Product from l=2 to L of P(l | l-1))
How do you calculate bigram probability P(l | l-1) aka P(letter | prev_letter) ?
= count(letter l-1 followed by l) / count(letter l-1 followed by any letter)
S = “aabbaa”
Calculate bigram probability P(letter = a | prev letter = a)
Since S = “aabbaa”,
P(letter = a | prev letter = a) = count(a followed by a) / count(a followed by any letter) = 2/3
Notice the denominator is 3, not 4 even though there are 4 occurrences of a. The reason is because the ‘a’ in the last index of the sequence S is not followed by anything. We don’t count it.
In HW4 bigram programming example, Saul processed the dataset such that we had an end-of-sentence terminator to simplify our work for us (didn’t have to count for this effect).