Representing Text Flashcards
What is a character set?
A list of characters and the codes used to
represent each one
What does ASCII stand for? How many bits are in each character and how many characters total?
- ASCII: American Standard Code for Information Exchange
- Seven bits for each character, 128 unique character
How many bits was ASCII later evolved to?
- ASCII evolved so that all eight bits were used
- Represent lines, symbols, and letters with accents
How many are control characters?
- First 32 and last one are control characters or hidden characters, they control how text appears, but do not
appear as text
How to find the code for uppercase/lowercase letters?
Uppercase letters start 65
• Code for J (10th letter) is 65 + (10 – 1) = 74
Lowercase letters start at 97
• Code for j (10th letter) is 97+ (10 – 1) = 106
What is the issue with the ASCII set?
- Limited, a lot of symbols are missing and is not enough for international use
What is Unicode character set?
- A super set of ASCII
- The first 128 characters in the Unicode character set correspond exactly to the ASCII character set
How many bits per character and characters are represented in the Unicode character set?
- Uses 16+ bits per character and can represent more than 1 million characters
What is lossless compression?
- A compression technique that does not lose any data in the compression process
What is data compression?
- A reduction in the amount of space needed to store a piece of data
What is the compression ratio?
- Size of the compressed data/size of the original data
What are the two types of data compression?
- Lossless: A compression technique that does not lose any data in the compression process
- Lossy: Some information may be lost in the process
of compaction.
What are the three types of loseless techniques?
- Keyword Encoding
- Run-Length Encoding
- Huffman Encoding
What is keyword encoding?
- Frequently used words are replaced with a single character
- The characters used to encode cannot be part of the original text
What type of technique is keyword encoding usually used with?
-Huffman encoding
What is Run-length encoding?
- A single character may be repeated over and over again in a long sequence
- A sequence of repeated characters
is replaced by
1. A flag character,
2. Followed by the repeated character,
3. Followed by a single digit that indicates how many times the character is repeated - you do not need to process 1-3 characters
What does Huffman Encoding use to represent each
character?
- Uses variable-length bit strings
What is an advantage of Huffman Encoding
-saving lots of storage space by making shorter sequences (less bits) for the frequent letters and longer sequences (more bits) for the non-frequent letters
What is the least effective compression?
-Keyword Encoding
What is the most effective compression?
-Huffman Encoding
What are the applications of Huffman encoding?
- JPG, MP3, ZIP
How to find compression ratio for Huffman encoding?
number of characters*bits/compressed bit length