Representing Text Flashcards
What is a character set?
a list of characters and the codes used to represent each one
What does ASCII stand for?
American Standard Code for Information Interchange
The original ASCII had how many bits?
7 for 128 unique characters
How many control characters are there?
33
How many bits does the “later/new” ASCII have?
8
The first and last ____ of the ASCII are control characters/hidden characters
32
Uppercase letters start when?
65
Lowercase letters start when?
97
What is the difference between the upper and lower case letters?
32
Where is 0-9 coded consecutively?
48-57
Where is A-Z coded consecutively?
65-90
Where is a-z coded consecutively?
97-122
What is Unicode?
superset of ASCII
- bigger character set than ASCII
How many bits are in Unicode?
16
True or False
In Unicode, 1-127 is the same as ASCII, but 128-255 are symbols and accented letters
True
What is data compression?
It is a reduction in the amount of space needed to store a piece of data
What is ratio compression?
It is the size of the compressed data divided by the size of the original data
What are the 2 techniques for data compression?
- Lossless
- Lossy
What does Lossless mean?
The data can be retrieved without any loss of the original information
What does Lossy mean?
Some information may be lost in the process of compaction
What are the 3 examples of techniques for data compression?
- Keyword encoding
- Run-length encoding
- Huffman encoding
Is Keyword encoding effective?
No, it is the least effective
Which technique is good for compressing data with lots of spaces?
Run-length encoding
Which example is the most effective?
Huffman encoding
What are examples of Huffman encoding?
JPG, MP3, ZIP
What does Keyword encoding refer to?
Words are replaced with a symbol
What is the limitation rule to the Keyword encoding?
Symbols used cannot be in the original data
What does Run-length encoding refer to?
Words with multiple instances of the same letters beside each other can use a number instead
What does Huffman coding refer to?
The idea of Morse code
- Only using a few bits to represent characters
- Some characters will be represented by 5 bits and others by 6 bits
Does the Huffman coding allow for ASCII?
No, it neglects it and uses a different number of bits, not 8
What are the 4 steps in Huffman’s Algorithm?
- Count Frequencies
- Sort in ascending order
- Start merging the letter and number into a tree diagram
- Label branches
Typically, which branches are labelled with 0 and which are labelled with 1?
Left branch = 0
Right branch = 1
For Huffman’s Algorithm, what is our input?
symbols and their frequency counts
For Huffman’s Algorithm, what is our output?
binary code for each symbol
For Huffman’s Algorithm, what is our property?
optimum compression rate with the prefix property
Where would we find the prefix property in the tree diagram?
displayed at the leaf node
What is the optimum compression ratio?
merge the least frequent first and the most frequent last
What is the compression’s bit length?
sum of char-code length x frequency-count
What are examples of run-length encoding?
white spaces in faxes