Lecture 4 - Huffman Encoding Flashcards

Question 1

Q

What kind of a text compression method if huffman encoding?

Answer

A

Statistical

Question 2

Q

What is each character replaced by?

Answer

A

A variable length code

Question 3

Q

What are characters represented by?

Answer

A

Unique codewords of varying lengths

Question 4

Q

What is special about frequently occuring characters?

Answer

A

They will be represented by a shorter code word than those that are less frequent.

Question 5

Q

Can a codeword be a prefix of another codeword? WHY?

Answer

A

No
- This would give ambigous decompression

Question 6

Q

What is this method based on?

Answer

A

The Huffman Tree.

Question 7

Q

What is a huffman tree?

Answer

A

A binary tree where each character is represented by a leaf node and the codeword for a character is given by the path from the root to the leaf.

Question 8

Q

Bit code for huffman tree.

Answer

A

left = 0
right = 1
- the prefix property follows from this

Question 9

Q

Steps for building a Huffman Tree?

Answer

A

add leaves (one per char)
add parent to parentless nodes of smallest weight
weight of new node is equal to sum of weights of the child nodes

Question 10

Q

What is the weighted path length of a Huffman tree?

Answer

A

sum of (weight * distance to root) for each leaf

Question 11

Q

What is special about a Huffman tree in relation to WPL?

Answer

A

Huffman trees have a minimum WPL over all binary trees with the given leaf nodes

Question 12

Q

Does a Huffman tree need to be unique?

Answer

A

No, there can be many solutions that are optimal.

Question 13

Q

Why do we care about WPL in relation to compression?

Answer

A

This is because WPL is the number of bits in the compressed file

-bits = sum over chars (frequency of char × code length of char)

Question 14

Q

What is the complexity of building a Huffman tree?

Answer

A

O(n + mlogm) overall

O(n) to find frequencies
O(m logm) to construct the code
as it takes O(m) to build tree and O(log m) to insert/remove elements to tree
there are m-1 iterations before heap is empty

Question 15

Q

What is the complexity of building a tree if m (number of distinct chars) is treated as a constant.

Question 16

Q

What does compression use?

Answer

Study These Flashcards

A

A code table (array of codes indexed by char). It uses the built tree to get path

-> O (mlogm) to build the table as m characters so m paths of length <=log m

-> O(n) to compress. n characters in the text so n O(1) lookups

Compression is O(mlogm + n)

Question 17

Q

What does decompression use?

Answer

Study These Flashcards

A

Uses the tree directly, which means decompression is O(nlogm).

This is beacause each codeword is replaced by a char found in the Huffman Tree.

Question 18

Q

If we assume m is a constant what is the time complexity of decompression and compression.

Answer

Study These Flashcards

A

O(n)

Question 19

Q

What is a problem with Huffman Encoding?

Answer

Study These Flashcards

A

That the huffman tree must be stored with a compressed file.

Question 20

Q

Why must a huffman tree be stored with the compressed file?

Answer

Study These Flashcards

A

As otherwise decompression would be impossible.

Question 21

Q

What are the alternatives to Huffman Encoding?

Answer

Study These Flashcards

A

use a fixed set of frequencies based on a typical values for text (will likely reduce compression ratio)
use adaptive huffman coding: the same tree is built and adapted by the compressor and by the decompressor as characters are encoded/decoded (likely to slow down compression and decompression)

Lecture 4 - Huffman Encoding Flashcards

(21 cards)