Exam Prep Flashcards
Explain what capacity implies in practice for an error correction code
The capacity is the highest rate of the error correction code whilst maintaining errorless decoding
Capacity
max_pmf I(X;Y)
Conditional Entropy Inequality
“Information Can’t hurt”
H(X | Y) <= H(X)
Chain Rule of Entropy
H(X, Y) = H(X) + H(Y|X) = H(Y) + H(X|Y)
Conditional Entropy
H(Y|X) = SUM_x p(x) H(Y|X=x)
= - SUM_x SUM_y p(x) p(y|x) log p(y|x)
Chain Rule of Conditional Entropy
H(X, Y|Z) = H(Y|Z) + H(X|Y, Z) = H(X|Z) + H(Y|X, Z)
Joint Entropy
H(X, Y) = - SUM_xy p(x,y) log p(x,y)
Relative Entropy (Kullbach-Leiber Distance)
D(p||r) = SUM_x p(x) log p(x)/r(x)
Bayes Theorem
P(B|A) = P(A|B) × P(B)/ P(A)
Conditional Probability
P(B|A) = P(A ∩ B) / P(A)
Total Probability
if {A1,A2, . . . ,Ak } is a partition of Ω,
then for any B ⊂ Ω
P(B) = SUM(i=1 -> k) P(B|A_i)P(A_i)
Mutual Information
I(X;Y) = = H(Y) − H(Y|X)
Entropy
H(X) = - SUM_x p(x) log p(x)
Two key properties of Entropy (value)
Always nonnegative
Only depends on the probabilities in the pmf of the RV
Binomial pmf X ∼ Bi(n, q)
X ∼ Bi(n, q)
p(k) = (nCk) (q^k)(1-q)^n-k for k = 0,1,2,…,n
Joint Distribution
Pr(X = x, Y = y) = p(x, y)
Conditional Distribution
p(y|x) = p(x, y)/ p(x)
Marginalisation
p(x) = SUM_y p(x,y)
Independence of RV
p(x,y) = p(x)p(y) for all x, y
Expectation of a discrete RV
E(X) = SUM_x p(x) x
Expectation of two RVs
E(X+Y) = E(X) + E(Y) unless independent then
E(X+Y) = E(X)E(Y)
Log 1/x =
- log x
log(x/y) =
log x - log y
When is entropy maximised and minimised
When p = u, H(X) = log|X|
When p has one probability = 1 and the rest = 0, cardinality > 1