Lecture 4 - Huffman Encoding Flashcards

1
Q

What kind of a text compression method if huffman encoding?

A

Statistical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is each character replaced by?

A

A variable length code

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are characters represented by?

A

Unique codewords of varying lengths

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is special about frequently occuring characters?

A

They will be represented by a shorter code word than those that are less frequent.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Can a codeword be a prefix of another codeword? WHY?

A

No
- This would give ambigous decompression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is this method based on?

A

The Huffman Tree.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a huffman tree?

A

A binary tree where each character is represented by a leaf node and the codeword for a character is given by the path from the root to the leaf.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Bit code for huffman tree.

A

left = 0
right = 1
- the prefix property follows from this

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Steps for building a Huffman Tree?

A
  • add leaves (one per char)
  • add parent to parentless nodes of smallest weight
  • weight of new node is equal to sum of weights of the child nodes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the weighted path length of a Huffman tree?

A

sum of (weight * distance to root) for each leaf

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is special about a Huffman tree in relation to WPL?

A

Huffman trees have a minimum WPL over all binary trees with the given leaf nodes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Does a Huffman tree need to be unique?

A

No, there can be many solutions that are optimal.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Why do we care about WPL in relation to compression?

A

This is because WPL is the number of bits in the compressed file

-bits = sum over chars (frequency of char × code length of char)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the complexity of building a Huffman tree?

A

O(n + mlogm) overall

  • O(n) to find frequencies
  • O(m logm) to construct the code
    as it takes O(m) to build tree and O(log m) to insert/remove elements to tree
  • there are m-1 iterations before heap is empty
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the complexity of building a tree if m (number of distinct chars) is treated as a constant.

A

O(m)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does compression use?

A

A code table (array of codes indexed by char). It uses the built tree to get path

-> O (mlogm) to build the table as m characters so m paths of length <=log m

-> O(n) to compress. n characters in the text so n O(1) lookups

Compression is O(mlogm + n)

17
Q

What does decompression use?

A

Uses the tree directly, which means decompression is O(nlogm).

This is beacause each codeword is replaced by a char found in the Huffman Tree.

18
Q

If we assume m is a constant what is the time complexity of decompression and compression.

A

O(n)

19
Q

What is a problem with Huffman Encoding?

A

That the huffman tree must be stored with a compressed file.

20
Q

Why must a huffman tree be stored with the compressed file?

A

As otherwise decompression would be impossible.

21
Q

What are the alternatives to Huffman Encoding?

A
  • use a fixed set of frequencies based on a typical values for text (will likely reduce compression ratio)
  • use adaptive huffman coding: the same tree is built and adapted by the compressor and by the decompressor as characters are encoded/decoded (likely to slow down compression and decompression)