Misc Flashcards
What is ASCII character set ?
(can be described as character encoding or since Unicode - codepage)
Computer stores 8 bit bytes. The ASCII Character Set is an encoding scheme that attributes a number to a letter.
for e.g. ‘a’ = 1000001 (binary) = 65 (decimal)
ASCII History
How many characters ?
What language ?
what were first 32 characters ?
128 (0-127)
English - since it started in US
non-printable control characters - now obsolete , used to control devices
What was the Extended ASCII character set ?
Since ASCII only used 7 bits, the 8th bit was unused.
Non-English users, took advantage of this to set the 8th bit.
If the 8th bit was set, it was possible to define an additional set of characters with that character on. This was not official and resulted in clashes. Different languages used different Extended character sets
What was the Unicode consortium ?
different countries/companies
An attempt to unify encodeings around the globe
What is a Code Point ?
65 is the decimal code point of the character ‘a’
1000001 is the binary code point of the same character.
What is a code point encoding ?
Code points can be encoding in more than one way…
What is the defacto standard for encoding code points in Web Applications ?
UTF-8
How many bytes does UTF-32 use to encode each codepoint ?
4 bytes… so it wasn’t adopted because it wasted too much space
What is UTF-16 ? how many bytes does it use ?
It depends on the value of the codepoint. Either 2 bytes or 4 bytes. Not backward compatible with ASCII
What is Big Endian and Little Endian ?
In UTF-16 encoding, the order of the bytes can be one of two ways.
What is a Byte Order Mark ?
It indicates whether the following UTF-16 is Little Endian style or Big Endian Style
What is UTF-8 ?