Representing Text Flashcards
What is a character set?
A list of characters and the codes used to
represent each one
What does ASCII stand for? How many bits are in each character and how many characters total?
- ASCII: American Standard Code for Information Exchange
- Seven bits for each character, 128 unique character
How many bits was ASCII later evolved to?
- ASCII evolved so that all eight bits were used
- Represent lines, symbols, and letters with accents
How many are control characters?
- First 32 and last one are control characters or hidden characters, they control how text appears, but do not
appear as text
How to find the code for uppercase/lowercase letters?
Uppercase letters start 65
• Code for J (10th letter) is 65 + (10 – 1) = 74
Lowercase letters start at 97
• Code for j (10th letter) is 97+ (10 – 1) = 106
What is the issue with the ASCII set?
- Limited, a lot of symbols are missing and is not enough for international use
What is Unicode character set?
- A super set of ASCII
- The first 128 characters in the Unicode character set correspond exactly to the ASCII character set
How many bits per character and characters are represented in the Unicode character set?
- Uses 16+ bits per character and can represent more than 1 million characters
What is lossless compression?
- A compression technique that does not lose any data in the compression process
What is data compression?
- A reduction in the amount of space needed to store a piece of data
What is the compression ratio?
- Size of the compressed data/size of the original data
What are the two types of data compression?
- Lossless: A compression technique that does not lose any data in the compression process
- Lossy: Some information may be lost in the process
of compaction.
What are the three types of loseless techniques?
- Keyword Encoding
- Run-Length Encoding
- Huffman Encoding
What is keyword encoding?
- Frequently used words are replaced with a single character
- The characters used to encode cannot be part of the original text
What type of technique is keyword encoding usually used with?
-Huffman encoding