1.8 Huffman compression Flashcards
What is the basic idea of compression?
To encode frequently-occurring items using fewer bits.
How many bits does uncompressed ASCII characters use?
8 bits each.
Compress the text ‘AAA Go’ using Huffman coding.
0 0 0 10 110 111.
How many bits does the compressed version of ‘AAA Go’ use?
11 bits.
What does the dictionary in Huffman compression do?
It assigns shorter codes to frequent items.
What is the compressed code for the ASCII string ‘00000000 00000000 11111111 00000100’ using the given dictionary?
00 00 01 111.
Is any code in the provided dictionary a prefix of another code?
No.
What is the output for decompressing the code ‘00 01 00’?
00000000 11111111 00000000.
What is a character frequency table?
A table that contains each distinct character from the input string and each character’s number of occurrences.
What does the pseudocode ‘BuildCharacterFrequencyTable’ do?
It builds a frequency table for characters in an input string.
What character frequency is assigned to the letter ‘A’ in the string ‘APPLES AND BANANAS’?
5.
What is Huffman coding?
A common compression technique that assigns fewer bits to frequent items using a binary tree.
What is the first step in Huffman coding?
Determine the frequencies of each item.
What is the compressed output for the text ‘aabbbaaccd’ using Huffman coding?
0 0 10 10 10 0 0 110 110 111.
How is the encoding for each leaf node obtained in Huffman coding?
By traversing from the top node to the leaf, appending 0 for left branches and 1 for right branches.
Fill in the blank: Prior to compression, a _______ must be built for an input string.
character frequency table.
What occurs when a character appears for the first time in the frequency table?
Its frequency is set to 1.
What happens to the frequency of a character when it appears again in the frequency table?
The existing frequency is incremented.
What is the total frequency count of the string ‘seems he fleed’ for the letter ‘e’?
3.
True or False: In Huffman coding, the merging of nodes continues until only one node exists.
True.
What is the first merge in Huffman coding for the frequencies D: 3 and E: 3?
D and E: 6
This merge yields the smallest sum: 3 + 3 = 6.
What is the second merge in Huffman coding after merging D and E?
DE and B: 10
DE is 6 (from the first merge). B is 4. So 6 + 4 = 10.
What is the third merge in Huffman coding after DE and B?
DEB and C: 50
DEB is 10 (from the second merge). C is 40. So 10 + 40 = 50.
What is the fourth merge in Huffman coding after DEB and C?
DEBC and A: 100
DEBC is 50 (from the third merge), and A is 50. So 50 + 50 = 100.
What is the fifth merge in Huffman coding?
None
Only one node remains: DEBCA. No more merges are possible.
What is the code for character A in Huffman coding?
0
A is one branch to the left. A left branch appends a 0 to the code.
What is the code for character C in Huffman coding?
10
C is a right branch (1) and then a left branch (0), yielding 10.
What is the code for character B in Huffman coding?
110
Right, right, left yields 1, 1, 0, or 110.
What is the code for character D in Huffman coding?
1110
Right, right, right, left yields 1, 1, 1, 0, or 1110.
What is the code for character E in Huffman coding?
1111
Right, right, right, right yields 1111.
If 5 unique characters can each be uniquely encoded in 3 bits, how many bits are needed for a 100-character text?
300
100 * 3 = 300.
For the Huffman code determined, how many bits are needed for the 100-character text with frequencies A: 1, C: 2, B: 3, D: 4, E: 4?
166
501 + 402 + 43 + 34 + 3*4 = 166.
What data members do leaf nodes in a Huffman tree have?
A character and an integer frequency.
Leaf nodes represent individual characters and their frequencies.
What data members do internal nodes in a Huffman tree have?
Left and right child nodes, and an integer frequency value.
Internal nodes represent the sum of the frequencies of their child nodes.
What is the first step in building a Huffman tree?
Build the frequency table.
This table lists the frequency of each character in the input string.
How are parent nodes created in a Huffman tree?
By dequeuing the two lowest-priority nodes and creating a new parent node.
The parent’s frequency is the sum of the child frequencies.
What does the function HuffmanGetCodes do?
It builds Huffman codes for each character.
The codes are generated by tracing a path from the root to each character’s leaf node.
What is the output of the HuffmanGetCodes function when called on a tree with root node 7?
A: 0, S: 101, B: 100, N: 11
The output is a dictionary mapping characters to their respective Huffman codes.
How many entries does the character frequency table have for the string ‘zyBooks’?
6
One entry exists per distinct character.
What is the frequency of the parent node for nodes B and k in the priority queue?
2
The parent’s frequency is the sum of the child frequencies, which are both 1.
What is the purpose of the HuffmanGetCodes function?
To generate Huffman codes for each character based on the tree structure
The codes are built recursively; left branches add a 0, and right branches add a 1.
What is the Huffman code for character A in the tree built from ‘BANANAS’?
0
A’s leaf node is reached via a left branch from the root.
What is the Huffman code for character B in the tree built from ‘BANANAS’?
100
B’s code is determined by traversing left and right branches from the root.
What is the Huffman code for character S in the tree built from ‘BANANAS’?
101
S is reached by moving right from node 4 then left.
What is the Huffman code for character N in the tree built from ‘BANANAS’?
11
N is reached by following the right branches after reaching node 4.
What is the maximum length of a Huffman code in the tree built from ‘BANANAS’?
4
This length corresponds to the longest path from the root to a leaf.
What is the first step in compressing data using Huffman coding?
Obtain Huffman codes for each character
Codes are generated based on the frequency of characters in the input string.
Fill in the blank: The result of HuffmanCompress function is a _______.
compressed binary string
The result is formed by concatenating bit codes corresponding to each character.
What does the HuffmanDecompress function do?
Decompresses Huffman encoded data by tracing the tree based on bit values
It starts at the root and follows left or right based on each bit until a leaf is reached.
What is the output of HuffmanCompress(‘BANANAS’)?
111 0 10 0 10 0 110
This result reflects the frequency of characters in the input string.
True or False: Each distinct character in a Huffman tree has a unique code.
True
This uniqueness follows from the properties of the Huffman coding algorithm.
What is the Huffman code for character P in the tree built from ‘APPLES AND BANANAS’?
011
The path to P’s leaf involves moving left and right from the root.
What is the length of the longest Huffman code from the tree built from ‘APPLES AND BANANAS’?
4
The longest path from root to leaf determines the maximum code length.
Fill in the blank: To decompress a Huffman coded string, one must start at the _______.
root of the Huffman tree
Traversal follows the left or right child based on the bit value until reaching a leaf.
What character does the bit sequence ‘00’ represent in the decompression process?
space
The space character is represented by a specific leaf in the Huffman tree.
What is the first step in the HuffmanDecompress function?
Initialize node to treeRoot and result to an empty string
In the HuffmanDecompress function, what does the bit value determine?
It determines whether to go to the left or right child
What happens when a leaf node is reached in the HuffmanDecompress function?
The character is added to the result and the node is reset to treeRoot
Fill in the blank: The function HuffmanDecompress takes _______ and _______ as parameters.
[compressedString], [treeRoot]
What is the first decoded character from the compressed string 0111101000101 using the provided tree?
D
What does the bit sequence ‘11’ correspond to in the decompression process?
It corresponds to the character O
What is the second decoded character from the compressed string 0111101000101?
O
What character is reached after decoding ‘100’?
A
What is the complete decoded text from the sequence 0111101000101?
DOODADS
True or False: The HuffmanDecompress function only processes bits until it reaches a non-leaf node.
False