Character Encoding Flashcards

1
Q

Unicode

A
  • attempt to represent all text from all languages in a single standard
  • to support electronic rendering of all texts and symbols
  • each grapheme assigned unique number of code point
  • allow different orthographies to co-exist in a single document
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Text documents

A
  • represented as series of numbers
  • simplest form of encoding through fixed precision ie. fixed number of digits to represent code point for each character
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

ASCII

A
  • 128 characters
  • 7 bit
  • compact but can only encode small number of characters
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

UTF-32

A
  • 32 bit encoding

- can encode all unicode characters but bloated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

ISO-8859

A
  • single byte encoding built on top of ASCII to include extra 128 characters
  • can represent orthographies such as Thai, unable to support big orthographies e.g. Japanese
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Variable-width encoding

A
  • variable bytes
  • encode code points using variable number of code units of fixed size
    e. g. UTF-8, UTF-16
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

UTF-8

A
  • 8 bit, variable-width encoding
  • compatible with ASCII, superset of ASCII
  • character boundaries easily locatable, continuation bytes always start with 10
  • used to represent unicode strings
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Declaring character encoding

A
  • manually specify character encoding in document e..g charset = ISO8859-8
  • automatically detects character encoding in terms of compatibility, user preferences, statistical model
How well did you know this?
1
Not at all
2
3
4
5
Perfectly