Unicode Flashcards

1
Q

writing system/script

A

a system more or less permanent marks used to represent an utterance in such a way that it can be recovered without the intervention of an utterance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

character

A

the smallest component of a writing system that has a semantic value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

grapheme

A

the smallest sound unit in the spoken language

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

glyphs

A

representation of a character as it is displayed (i.e. fonts)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

unicode

A

clear encoding to embrace all the world’s languages & is emerging as the gold standard

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

design principles of Unicode

A

universality, efficiency, characters not glyphs, semantics, plain text, logical order, unification, dynamic composition, stability, convertibility

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

surrogate pairs

A

an extension mechanism that consists of 2, 16-bit values.
the first value = high surrogate
the second value = low surrogate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what are the advantages and disadvantages of UTF-8

A

advantages:
- existing ASCII files are in utf-8
- most broadly supported encoding form today

disadvantages:
- ideographic languages required 3 bytes/character so utf-8 encodings are larger than most existing encodings

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what are the advantages and disadvantages of UTF-16

A

advantages:
- allows all Unicode code points to be mapped into 2 bode units (bytes)

disadvantages:
- Latin text = x2 large therefore single-byte encodings
- not backward/forwards compatible with ASCII so programs that expect single-byte character sets won’t work in UTF-16

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what are the advantages and disadvantages of UTF-32

A

advantages:
- simple: allows all code points to be mapped into 1 fixed-length code units

disadvantages:
- Latin texts = x4 large therefore single-byte encodings
- not backward/forwards compatible with ASCII so programs that expect single-byte character sets won’t work in UTF-32

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

encoding model

A

3 level model:

  1. abstract character repertoire
  2. code space
  3. encoding forms
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

code space

A

mapping to a set of integers, where a particular integer in set is known as the code points

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

encoding forms

A

once defined mapping from abstract character set to set of integers further mappings is required.

character encoding form & character encoding scheme

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

character encoding form

A

a mapping from a set of integers to a set of sequences of code units of specified width

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

character encoding scheme

A

a mapping from a set of sequences of code units to a. serialised sequence of bytes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

challenges of character encoding

A

generality, character set specification, hardware issues, variable/fixed width, interoperability