L15 - Data Compression: Entropy and Huffman Coding Flashcards

1
Q

What is the purpose of compression?

A

To remove / reduce redundancy in data being transmitted.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Define Entropy…

A
  • The probability of an outcome of a set of events, each element of which is associated with a probability P.
  • We can’t compress better than the entropy of the data.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does Entropy provide a benchmark for?

A
  • Benchmark for how well data can be compressed.
  • The number of bits or symbols of a decodable data source must be greater than or equal to the entropy.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the formula for Entropy H?

A
  • Sum( Pi Log(1/Pi) ) for all M where M is the number of symbol / probability pairs.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How would you calculate then entropy for written plain english?

A
  • Sum ( probability of word * log(1/prob.of word) ) for every word in the alphabet.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What effect does the increase or decrease of entropy have on compression ability of the data?

A
  • Increase in entropy -> Decrease in compression ability.
  • Decrease in entropy -> Increase in compression ability.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

The limit to which data can be compressed will always be…

A

The entropy of the data source.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the purpose of the Huffman Coding Algorithm?

A

An algorithm used to compress data from various mediums e.g Text, Audio etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the steps of the Huffman Coding Algorithm?

A
  • Sort symbols in data by their proabilities in decreasing order (highest at top)
  • Recursively sum the probabilities of the lowest probability events.
  • Terminate if the sum of the 2 symbol probabilities == 1.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Do a Huffman Coding example on the data { A:0.16, B:0.51, C:0.09, D:0.13, E:0.11 } and calculate the average number of bits per symbol and compare this to the entropy…

A
  • { B -> 1, A -> 011, D -> 010, E -> 001, C -> 000 }
  • Bits per symbol -> 10.51 + 30.16 + 30.13 + 30.11 + 3*0.09 = 1.98 bits/symbol.
  • The entropy is 1.964
  • Thus our calculated bits/symbol is close to entropy, and can be considered a good compression algorithm.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What type of algorithm is Huffman Coding?

A

Greedy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the time complexity of Huffman Coding?

A

O( n log n )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Give the pros and cons of Huffman Coding…

A

Pros:
- Prefix free -> Instantly and uniquely decodable.
- Use in many compression standards -> JPEG, MPEG etc.

Cons:
- Need to know probabilities of symbols
- Error propagation -> If error occurs it will cause error overflow to next probability calculation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly