Exam Prep Flashcards
Explain what capacity implies in practice for an error correction code
The capacity is the highest rate of the error correction code whilst maintaining errorless decoding
Capacity
max_pmf I(X;Y)
Conditional Entropy Inequality
“Information Can’t hurt”
H(X | Y) <= H(X)
Chain Rule of Entropy
H(X, Y) = H(X) + H(Y|X) = H(Y) + H(X|Y)
Conditional Entropy
H(Y|X) = SUM_x p(x) H(Y|X=x)
= - SUM_x SUM_y p(x) p(y|x) log p(y|x)
Chain Rule of Conditional Entropy
H(X, Y|Z) = H(Y|Z) + H(X|Y, Z) = H(X|Z) + H(Y|X, Z)
Joint Entropy
H(X, Y) = - SUM_xy p(x,y) log p(x,y)
Relative Entropy (Kullbach-Leiber Distance)
D(p||r) = SUM_x p(x) log p(x)/r(x)
Bayes Theorem
P(B|A) = P(A|B) × P(B)/ P(A)
Conditional Probability
P(B|A) = P(A ∩ B) / P(A)
Total Probability
if {A1,A2, . . . ,Ak } is a partition of Ω,
then for any B ⊂ Ω
P(B) = SUM(i=1 -> k) P(B|A_i)P(A_i)
Mutual Information
I(X;Y) = = H(Y) − H(Y|X)
Entropy
H(X) = - SUM_x p(x) log p(x)
Two key properties of Entropy (value)
Always nonnegative
Only depends on the probabilities in the pmf of the RV
Binomial pmf X ∼ Bi(n, q)
X ∼ Bi(n, q)
p(k) = (nCk) (q^k)(1-q)^n-k for k = 0,1,2,…,n
Joint Distribution
Pr(X = x, Y = y) = p(x, y)
Conditional Distribution
p(y|x) = p(x, y)/ p(x)
Marginalisation
p(x) = SUM_y p(x,y)
Independence of RV
p(x,y) = p(x)p(y) for all x, y
Expectation of a discrete RV
E(X) = SUM_x p(x) x
Expectation of two RVs
E(X+Y) = E(X) + E(Y) unless independent then
E(X+Y) = E(X)E(Y)
Log 1/x =
- log x
log(x/y) =
log x - log y
When is entropy maximised and minimised
When p = u, H(X) = log|X|
When p has one probability = 1 and the rest = 0, cardinality > 1
Fundamental Lower Bound
L(C) >= H(X)
Kraft’s Inequality
SUM(i=1, m) D^-l_i <= 1
Average Codeword Length
SUM_x p(x)l(x)
Property of H(X|g(X))
H(X|g(X)) >= 0
How many errors can a repetition code detect and correct. What is its main issue
The code can detect n-1 errors, it can detect and correct up to floor(n/2) errors.
Its main issue is efficiency, the rate is 1/n, so as n tends to inf, the rate tends to 0.
Rate of an error detection/correction code
rate = len(msg) / len(msg) + len(redundancy)
measured in bits/ channel use
Data Processing Inequality
if X → Y → Z then
I(X; Y ) ≥ I(X; Z)
AEP
p(X_1, . . . ,X_n) → 2^−nH(X)
Shannon codeword length
l*_i = ceiling(log_D 1/p_i)
Shannon average codeword length and its relationship with entropy
H(X) ≤ L* < H(X) + 1
What is the probability distribution associated with a transition probabilility matrix Π
[p(y|x)]
What is the channel capacity of a BSC
C = 1 - h(ε) bits/channel use
What is the channel capacity of a Binary Erasure channel
C = 1 − α bits/channel use
Definition of a symmetric channel
if all the rows of the transition probability matrix Π are permutations of each other, and so are the columns, the channel is said to be symmetric
Definition of a weakly symmetric channel
if all the rows of Π are permutations of each other, and its columns add up to the same value, the channel is weakly symmetric
Key property of channel capacity in relation to I(X;Y)
any maximum of I(X; Y ) is the global maximum
binomial theorem
(a+b)^n = SUM(k=0, n) (nCk) a^k b^n-k
Probability of undetectble errors in parity checking (excluding zero)
SUM(k=2, n) (nCk) q^k (1-q)^n-k
Pr(E is even) in parity checking under errors (binomial expansion of (1-2q)^n
(1 − 2q)^n = (1 − q − q)^n
= SUM(k=0, n) (nCk)(-1)^k q^k (1-q)^n-k
What is the minimum hamming distance and its relation to the number of errors that can be corrected
d = 2t + 1, we can correct t errors
EX: (7,4,3) Hamming code we can detect two errors and correct one
Rate of a hamming code (n, k, d)
k/n bits/ channel use