information theory Flashcards
what is information theory
quantifying information
what are the 3 axioms shannon proposed based on self information
1) event with 100% certainty
is perfectly unsurprisingy and yields no information.
2) less probable an event is the more surprising an event is , so more information can be yield
3) if 2 individual events are meausred separately the total amount of information is the sum of the self information from the individual events
Given a random variable X with probability mass function PX (x), the
self-information of measuring X as outcome x is defined as:
IX(x) = − logb[PX (x)] = logb(1/PX (x))
(i can do pictures on here so ceck the slides)
quantifies the level of surprise in observing a particular
outcome x for a random variable X that has PMF PX (x)
in self information if b in log b = 2
what are we measuring in
b = 2 so bits
in self information if b in log b = e
what are we measuring in
b = e so natural units or nats
in self information if b in log b = 10
what are we measuring in
b = 10 so dits , bans or hartleys
what is the logit for self information
elf-information and logit (log-odds) are related concepts.
For an event A, occurring with probability p (hence the probability of
the event not occurring P(¬A) = 1 − p, the logit function is defined
as:
logit(A) = log(p/1 − p) = log(p) − log(1 − p)
what is the relationshp for the logit of self info
Relationship:
logit(A) = I (¬A) − I (A)
where I (A) and I (¬A) represent the self-information of events A and
¬A respectively.
what does entropy mean
it quantifies the uncertainty in a random variable x
what does joint entopy mean
measure of the uncertainty associated with a set of variables.”
For discrete random variables X and Y :
H(X , Y ) = −E [log P(X , Y )] = − X
x∈RX
X
y ∈RY
P(x, y ) log P(x, y )
13 / 27
what does Kullback-Leibler Divergence do (DKL)
quantifies the distance between 2 probaility distributions
by convention what does 0 log (0/Q) equal
0
by convention what does P log (P/0) equal
INFINITY
what are thr properties
DKL(P ∥ Q) ≥ 0
DKL(P ∥ Q) = 0 if P(x) = Q(x)
Not symmetric: DKL(P ∥ Q) ̸ = DKL(Q ∥ P)
what is mutual information
It is measuring the information 2 variables x and y share
(quantifies how much one variable reduces the uncertainty of another variable)
what are the properties of mutual information
Non-negative: I (X ; Y ) ≥ 0
Symmetric: I (X ; Y ) = I (Y ; X )
Measures statistical dependence:
I (X ; Y ) = 0 if and only if X and Y are independent.
I (X ; Y ) increases with the dependence between X and Y and with
their individual entropies H(X ) and H(Y ).
I (X ; X ) = H(X ) − H(X |X ) = H(X ) + 0 = H(X
explain the measure statistical independence property of mutual information
I (X ; Y ) = 0 if and only if X and Y are independent.
I (X ; Y ) increases with the dependence between X and Y and with
their individual entropies H(X ) and H(Y ).