C2 Flashcards
Shannon’s information theory
How to send messages from sender to receiver in as few bits as possible?
entropy h(n)
the amount of information in bits, gained by learning which outcome occurred (equally likely outcomes)
h(n) = log n = - log 1/n
with n the number of possible outcomes
unequally likely outcomes
denote probability of each possible outcome by p(x)
h(x) = log 1/p(x) = - log p(x)
Shannon’s entropy
H(X) = - sum(p(x) * log(p(x)))
Shannon’s Source Coding Theorem
an optimal lossless encoding satisfies
L(a) = - log p(a)
where L(a) is the length in bits of the code word assigned to a
decoding
decoding function D
D(y) = x means y is a code word for source word x
E = D^-1 is the encoding relation (not a function because a code word can have multiple corresponding source words)
Halting problem
Lemma: there is no recursive function g such that for all x,y we have:
1. g(x,y) = 1 if phi_x(y) is defined and
2. g(x,y) = 0 otherwise
relative complexity
let U be some universal Turing machine and x some finite binary string
the relative complexity C_U of x with regard to U is defined as:
C_U(x) = min{ l(p) | U(p) halts and U(p) = x}
a string x is random relative to U iff
C_U(x) = |x|
additive optimality
A function f is additively optimal for F
- if it belongs to F and
- if for every g ∈ F there is a constant
cf,g such that C_f (x) ≤ C_g (x) + cf,g, for all x
invariance theorem
For any two universal Turing machines,
a constant-length ‘cross-compiler’ exists
Kolmogorov complexity C(x)
the length of the shortest program that outputs x and halts (given an additively optimal universal Turing machine)
properties of Kolmogorov complexity
x is incompressible iff C(x) ≥ |x|
C(x) can be regarded as the length of the ultimate compressed version from which x can be recovered by a general (lossless!) compression program
BUT C(x) is uncomputable because the halting problem is undecidable
Conditional Kolmogorov complexity C(x|y)
the length, in bits, of the shortest computer program that produces x as output when given string y as input
Normalised Information Distance (NID)
e(x,y) = max{C(x|y), C(y|x)} / max{C(x), C(y)}