Shannon's information measures Flashcards

Question 1

Q

Shannon entropy

Answer

A

Entropy is the uncertainty in a random variable X
H(X) = -\sum_{i=1}^N p(x_i) log(p(x_i))

It has a limit of
0 ≤ H(X) ≤ log(N).

It is non increasing under functions
H(X) ≥ H(g(X))
with equality iff g is invertible. Furthermore, conditioning can not increase entropy.

Question 2

Q

Renyi entropy

Answer

A

General form of the Shannon entropy
Let β ∈ [0, ∞], then Renyi entropy of order β is
H_β(X) = 1 / (1-β) log [ \sum_{i=1}^N P(X_i)^β ].
For β=1, Shannon entropy is obtained.

Question 3

Q

Joint entropy

Answer

A

H(X, Y) = -\sum_{x,y} p(x,y) log( p(x,y) ) = -E[ log( p(x,y) ) ]
There are some relations
H(X,Y) = H(X) + H(Y|X)
H(X,Y) = H(Y) + H(Y|X)
Chain rule:
H(X_1, X_2, …, X_n) =\sum_ {i=1}^n H(X_i | X_{i-1}, … , X_1)

Question 4

Q

Conditional entropy

Answer

A

H(X | Y) = -\sum_{x,y} p(x,y) log( p(x | y) ) = -E[ log( p(x | y) ) ]
Conditioning can never increase the entropy:
H(Y | X) ≤ H(Y),
with equality iff X and Y independent.

Question 5

Q

Differential entropy

Answer

A

It is the entropy of a continuous random variable with support set S, with density f(x),
h(X) = - \int_S f(x) log( f(x) ) dx

Question 6

Q

Joint and conditional differential entropy

Answer

A

The differential entropy of a set X_1, X_2, … , X_n of random variables with density f(x_1, x_2, … , x_n ) is defined as
h(X1, X2, … , Xn ) = −\int f(x^n) log f(x^n) dx^n.

If X, Y have a joint density function f (x, y), we can define the conditional differential entropy h(X|Y) as

h(X|Y) = −\int f (x, y) log f (x|y) dx dy = h(X, Y) − h(Y).

Question 7

Q

Mutual information and relative entropy for densities

Answer

A

The mutual information I (X; Y ) between two random variables with joint density f (x, y) is defined as
I (X; Y ) = \int f (x, y) log[ f (x, y) / (f(x)* f(y)) ] dxdy.
From the definition it is clear that
I (X; Y ) = h(X) − h(X|Y ) = h(Y ) − h(Y |X) = h(X) + h(Y ) − h(X, Y ).

The relative entropy (or Kullback–Leibler distance) D(f ||g) between two densities f and g is defined by
D(f ||g) = \int f * log(f/g).
Note that D(f ||g) is finite only if the support set of f is contained in the support set of g.

Question 8

Q

Relation between entropy and differential entropy

Answer

A

If the density f (x) of the random variable X is Riemann integrable, then

H (X Δ ) + log Δ → h(f ) = h(X), as Δ → 0.

Thus, the entropy of an n-bit quantization of a continuous random variable X is approximately h(X) + n

Question 9

Q

Mutual information

Answer

A

It is the information about X provided by Y
I(X;Y) = I(Y;X) = \sum_{x,y} p(x,y) log[ p(x,y) / (p(x) * p(y) ) ] = H(X) + H(Y) - H(X,Y) = H(X) - H(Y|X) = H(Y) − H(Y|X)
It quantifies the relevant information.
Chain rule:
I(X_1, X_2, …, X_n ; Y) =\sum_ {i=1}^n I(X_i ; Y | X_{i-1}, … , X_1)

Question 10

Q

Conditional mutual information

Answer

A

I(X;Y|Z) = \sum_{x,y,z} p(x,y,z) log[ p(x,y|z) / (p(x|z) * p(y|z) ) ]
I(X;Y|Z) = H(X|Z) - H(X|Y,Z) = H(X|Z) + H(Y|Z) - H(X,Y|Z).

Question 11

Q

Kullback-Leibler divergence

Answer

A

Also called relative entropy. It is not a distance measure, since it is not symmetric. It is non negative.

Distinction between p and q
D(p||q) = \sum_x p(x) log[ p(x)/q(x) ].

For continuity, 0 log(0) = 0.
For q(x) = 0, p(x) > 0, p(x) log(p(x)/0) = ∞.
We often require p and q to have same support, because q(x) may be zero for some x even though p(x) is non-zero.

Question 12

Q

Directed information

Answer

A

I(x^k → y^k) = \sum_{i=1}^k I(y(i) ; x^i | y^{i-1})

Question 13

Q

Markov chains

Answer

A

Let X ↔ Y ↔ Z form a Markov chain, then
I(X;Z) ≤ I(X;Y).
The intuition is that processing can only decrease the mutual information, thus variables further apart have a smaller mutual information.

However, conditioning can increase mutual information. Let Y be an encrypted version of X using the secret key Z. Then I(X;Y) = 0 and I(X;Z) = 0. However, I(X;Y|Z) > 0 since with Z we can recover Y from X.

Question 14

Q

What is information?

Answer

A

Information is stimuli which has meaning.

Question 15

Q

Which distributions maximize the entropy?

Answer

A

For discrete sources, the uniform distribution is entropy maximizing.
For continuous sources:
* Given the constraint of finite support: Uniform distribution maximizes differential entropy.
* Given the constraint of fixed finite variance: Gaussian distribution maximizes differential entropy.