Lossless data compression Flashcards

1
Q

What is the interpretation of entropy in data compression?

A

The entropy H(X) = H is the data compression limit as well as the number of bits needed in random number generation, and codes achieving H turn out to be optimal from many points of view.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Source code

A

A source code C for a random variable X is a mapping from the range of X to D*, the set of finite-length strings of symbols from a D-ary alphabet. Let C(x) denote the codeword corresponding to x and let l(x) denote the length of C(x).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Codebook

A

A codebook is a reference table or mapping used in encoding and decoding processes to associate input symbols with their corresponding codewords.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Expected codelength

A

The expected length L(C) of a source code C(x) for a random variable X with probability mass function p(x) is given by
L(C) = \sum_{x∈X} p(x) * l(x),
where l(x) is the length of the codeword associated with x.
The length of an optimal code is bounded by
H(X) ≤ L < H(X) + 1/n.
Hence, by using large block lengths we can achieve an expected code length per symbol arbitrarily close to the entropy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the expected length of a code, if the assumed distribution is incorrect?

A

The expected length under p(x) of the code assignment l(x) = ⌈log(1 / q(x))⌉ satisfies
H(p) + D(p||q) ≤ E_p[ l(X)] < H(p) + D(p||q) + 1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the classes of codes?

A

All codes > Nonsingular codes > Uniquely decodeable codes > Instantaneous codes

Nonsingular if every element of the range of X maps into a different string in D*
Uniquely decodable if its extension (concatenation of codewords) is nonsingular
Prefix code or instantaneous code if no codeword is a prefix of any other codeword

An instantaneous code can be decoded without reference to future codewords since the end of a codeword is immediately recognizable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Krafts inequality

A

For any instantaneous code (prefix code) over an alphabet of size D, the codeword lengths l_1, l_2, . . . , l_m must satisfy the inequality

\sum_i D^{−l_i} ≤ 1

Conversely, given a set of codeword lengths that satisfy this inequality, there exists an instantaneous code with these word lengths.
It is also valid for infinite length codes. It is also valid for uniquely decodable codes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

McMillan inequality

A

It is the Kraft inequality extended to uniquely decodable D-ary codes.

The codeword lengths of any uniquely decodable D-ary code must satisfy the Kraft inequality

\sum D^{−l_i} ≤ 1.

Conversely, given a set of codeword lengths that satisfy this inequality, it is possible to construct a uniquely decodable code with these codeword lengths.
It is also valid for infinite length codes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Huffman code

A

Looks like a tree. Initially, all nodes are leaf nodes, which contain the symbol itself, the weight (frequency of appearance) of the symbol and optionally, a link to a parent node which makes it easy to read the code (in reverse) starting from a leaf node. Internal nodes contain a weight, links to two child nodes and an optional link to a parent node.

The process begins with the leaf nodes containing the probabilities of the symbol they represent. Then, the process takes the two nodes with smallest probability, and creates a new internal node having these two nodes as children. The weight of the new node is set to the sum of the weight of the children. We then apply the process again, on the new internal node and on the remaining nodes (i.e., we exclude the two leaf nodes), we repeat this process until only one node remains, which is the root of the Huffman tree.

Huffman coding generates prefix codes, meaning no code is a prefix of another. Any other code for the same alphabet can not have a lower expected length than the code constructed by the algorithm.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

If we know an approximation X’ of the source X. How many bits do we on average need to send to know X?

A

We need H(X|X’) bits, which can be upper bounded using Fano’s inequality. This is called conditional lossless source coding.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Arithmetic coding

A

Arithmetic coding is a method of lossless data compression that encodes an entire message into a single fractional value between 0 and 1.

Range Division:
Each symbol in the input is assigned a probability (or frequency). The range [0,1) is divided into subranges proportional to these probabilities.

Encoding:
The message is encoded by successively narrowing the range based on the sequence of symbols in the input.

Precision:
Longer messages result in narrower ranges, requiring higher precision to represent.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Binary fraction to decimal coding

A

Start from the right of the bits
Add the value of the bit to the sum
Divide by 2
Continue using the next right most bit
Example:
1100
1/2(0+0) = 0
1/2(0+0) = 0
1/2(1+0) = 0.5
1/2(1+0.5) = 1.5

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Operational rate

A

The operational rate, R, refers to the actual rate at which information is transmitted, stored, or processed in a system, taking into account practical constraints such as system design, coding schemes, and channel conditions. It is often expressed in terms of bits per symbol and is constrained to
H(X) ≤ R ≤ H(X) + 1/N
It is also known as the bitrate.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Fanos inequality

A

It is used for knowing how many bits on average is needed to know X based on an approximation X’.
For any estimator ˆX such that X → Y → ˆX, with P_e = Pr(X ≠ ˆX), we have

H(P_e) + P_e * log |X| ≥ H (X|ˆX) ≥ H(X|Y).

This inequality can be weakened to
1 + P_e * log |X| ≥ H(X|Y)
or
P_e ≥ (H(X|Y) − 1) / log |X|.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Expected code length

A

H (X) ≤ L_n < H (X) + 1/n
where L_n is the expected codeword length per input symbol

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How can a random variable be discretized?

A

Approximate the continuous random variable with Δ in the x-axis, then the discrete entropy is
H(X) ≈ h(X) + log(Δ).