Base64, Base32, Base16 (10.12.2022 3M) Flashcards

1
Q

What the purpose of base encoding of data?

A

The main purpose is storing and transifering data in systems that are restricted to US-ASCII.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Describe discrepancy: Line Feeds in Encoded Data. And the requirement of it.

A

Multipurpose Internet Mail Extensions (MIME) uses base64.

But it states that you must line-feed every 76 characters.

MIME inherits the encoding from Privacy Enhanced Mail (PEM) [3], stating that it is “virtually identical”; however, PEM uses a line length of 64 characters. The MIME and PEM limits are both due to limits within SMTP.

Implementations MUST NOT add line feeds to base-encoded data unless the specification referring to this document explicitly directs base encoders to add line feeds after a specific number of characters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Describe Interpretation of Non-Alphabet Characters in Encoded Data.

A

Base encodings use a specific, reduced alphabet to encode binary data. Non-alphabet characters may be exploited as a “covert channel”, where non-protocol data can be sent for nefarious purposes. Non-alphabet characters might also be
sent in order to exploit implementation errors leading to, e.g., buffer overflow attacks.

Implementations MUST reject the encoded data if it contains characters outside the base alphabet when interpreting base-encoded data unless the specification referring to this document explicitly states otherwise.

Such specifications may instead state, as MIME does, that characters outside the base encoding alphabet should simply be ignored when interpreting data (“be liberal in what you accept”).

Note that this means that any adjacent carriage return/line feed (CRLF) characters constitute “non-alphabet characters” and are ignored. Furthermore, such specifications MAY ignore the pad character, “=”, treating it as non-alphabet data, if it is present before the end of the encoded data. If more than the allowed number of pad characters is found at the end of the string (e.g., a base 64 string terminated with “===”), the excess pad characters MAY also be ignored.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is RFC number of standard describing base64, base32, base16?

A

RFC 4648

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Why name is base64?

A

Because base64 uses a subset of US-ASCII with the magnitude of 64 (6 bit). Actually, it uses 65 characters, but ‘=’ is a special pad character.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Describe base64 encoding process?

A

Iteratively:
1. get the next 24 bits of input (3 octets).
2. transform 24-bits to 4 alphabet characters:
Iteratively:
a. get the next 6 bits
b. transform it to an integer
c. use the integer as an index in the alphabet to get the character

Obviously, that input length is not always multiple of 24. This is why padding is used. And there are only two situations possible:
1. Two octets are missed at the end of the data
2. One octet is missed at the end of the data

How to manage these situations respectively:
1. The final quantum of encoding input is exactly 8 bits; here, the final unit of encoded output will be two characters followed by two “=” padding characters.
2. The final quantum of encoding input is exactly 16 bits; here, the final unit of encoded output will be three characters followed by one “=” padding character.

As you may notice there are as many ‘=’ characters as many “missed” characters. We expect 4 characters from every group, but if you can only produce two characters, the rest two characters are the ‘=’ character.

You may ask, for example, for the first situation we have only 8 bits, but to get two characters 12 bits are needed. The answer is to make absent bits zeroes.

Standard alphabet of base64:

Value Encoding Value Encoding Value Encoding
0 A 17 R 34 i 51 z
1 B 18 S 35 j 52 0
2 C 19 T 36 k 53 1
3 D 20 U 37 l 54 2
4 E 21 V 38 m 55 3
5 F 22 W 39 n 56 4
6 G 23 X 40 o 57 5
7 H 24 Y 41 p 58 6
8 I 25 Z 42 q 59 7
9 J 26 a 43 r 60 8
10 K 27 b 44 s 61 9
11 L 28 c 45 t 62 +
12 M 29 d 46 u 63 /
13 N 30 e 47 v
14 O 31 f 48 w (pad) =
15 P 32 g 49 x
16 Q 33 h 50 y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What the difference between base64 and base64url?

A

The main difference is in the 62:nd and the 63:nd characters of the encoding alphabet.
As soon as for base64 they are ‘+’ and ‘/’ and this is not “URL friendly”, for base64url they are suggested to be replaced with ‘-‘ and ‘_’.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a trick with padding character ‘=’ in base64url?

A

The pad character “=” is typically percent-encoded when used in an URL, but also can be excluded at all

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Is base64url only URL safe?

A

No, it’s URL and Filename safe.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What for Base 64 encoding is designed?

A

The Base 64 encoding is designed to represent arbitrary sequences of octets in a form that allows the use of both upper- and lowercase letters but that need not be human-readable.

It’s the most compact encoding. If system is case sensitive, this is the best choice.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What Base 32 encoding is designed for?

A

The Base 32 encoding is designed to represent arbitrary sequences of octets in a form that needs to be case insensitive but that need not be human-readable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Why name is base32?

A

Because base32 uses a subset of US-ASCII with the magnitude of 32 (5 bit). Actually, it uses 33 characters, but ‘=’ is a special pad character.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Describe base32 encoding process?

A

Iteratively:
1. get the next 40 bits of input (5 octets).
2. transform 40-bits to 8 alphabet characters:
Iteratively:
a. get the next 5 bits
b. transform it to an integer
c. use the integer as an index in the alphabet to get the character

Obviously, that input length is not always multiple of 40. This is why padding is used. And there are four situations possible:
1. One octet is missed at the end of the data
2. Two octets are missed at the end of the data
3. Three octets are missed at the end of the data
4. Four octets are missed at the end of the data

How to manage these situations respectively:
1. The final quantum of encoding input is exactly 32 bits; here, the final unit of encoded output will be seven characters followed by one “=” padding character.
2. The final quantum of encoding input is exactly 24 bits; here, the final unit of encoded output will be five characters followed by three “=” padding characters.
3. The final quantum of encoding input is exactly 16 bits; here, the final unit of encoded output will be four characters followed by four “=” padding characters.
4. The final quantum of encoding input is exactly 8 bits; here, the final unit of encoded output will be two characters followed by six “=” padding characters.

As you may notice there are as many ‘=’ characters as many “missed” characters. We expect 8 characters from every group, but if you can only produce two characters, the rest six characters are the ‘=’ character.

You may ask, for example, for the first situation we have only 32 bits, but to get seven characters 35 bits are needed. The answer is to make absent bits zeroes.

Standard alphabet of base32:

Value Encoding Value Encoding Value Encoding
0 A 9 J 18 S 27 3
1 B 10 K 19 T 28 4
2 C 11 L 20 U 29 5
3 D 12 M 21 V 30 6
4 E 13 N 22 W 31 7
5 F 14 O 23 X
6 G 15 P 24 Y (pad) =
7 H 16 Q 25 Z
8 I 17 R 26 2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is base32hex?

A

This encoding is identical to the base32, except for the alphabet.

The “Extended Hex” Base 32 Alphabet:

Value Encoding Value Encoding Value Encoding
0 0 9 9 18 I 27 R
1 1 10 A 19 J 28 S
2 2 11 B 20 K 29 T
3 3 12 C 21 L 30 U
4 4 13 D 22 M 31 V
5 5 14 E 23 N
6 6 15 F 24 O (pad) =
7 7 16 G 25 P
8 8 17 H 26 Q

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What the main property of base32hex?

A

One property with this alphabet, which the base64 and base32 alphabets lack, is that encoded data maintains its sort order when the encoded data is compared bit-wise.

Check this: you have any 3 original strings, and you encode them with base 32hex, so now you have also 3 encoded strings. If you sort orginal strings bitwise and then sort encoded strings bitwise, both sets will have the same order.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How also you can refer base16 encoding?

A

hex

17
Q

Why name is base16?

A

Because base16 uses a subset of US-ASCII with the magnitude of 16 (4 bit). Base16 has no padding, because group is exectly one octet.

18
Q

Describe base16 encoding process?

A

Iteratively:
1. get the next 8 bits of input (1 octet).
2. transform 8-bits to 2 alphabet characters:
Iteratively:
a. get the next 4 bits
b. transform it to an integer
c. use the integer as an index in the alphabet to get the character

Standard alphabet of base16:

Value Encoding Value Encoding Value Encoding
0 0 4 4 8 8 12 C
1 1 5 5 9 9 13 D
2 2 6 6 10 A 14 E
3 3 7 7 11 B 15 F

19
Q

How to perform the simplest tests of own base encodings?

A

Specification of RFC4648 contains test vectors. See 10. Test Vectors

20
Q

Describe base64 decoding algorithm.

A

The input is a string, not a bytes stream.
An input of the decoder is always multiple of 4 characters, because of padding used during encoding.

Remember, that base64 encoding iterates over input with 3 octets groups, which is 24 bits long. And a number of produced characters is 4 (6 bits per character).

  1. get the next 4 characters of input.
  2. transform every character into bits:
    Iteratively:
    a. get the next character
    b. get the decimal value of the index of the character in the alphabet (for pad use 0)
    c. transform the decimal index to bits
    d. add bits to the group result (in the end group result will be filled in with 24 bits)
  3. If no padding in the group add the group result to the output
    else if the group contains one padding symbol put only 16 bits to the output
    if the group contains two padding symbols put only 8 bits to the output.

Standard alphabet of base64:

Value Encoding Value Encoding Value Encoding
0 A 17 R 34 i 51 z
1 B 18 S 35 j 52 0
2 C 19 T 36 k 53 1
3 D 20 U 37 l 54 2
4 E 21 V 38 m 55 3
5 F 22 W 39 n 56 4
6 G 23 X 40 o 57 5
7 H 24 Y 41 p 58 6
8 I 25 Z 42 q 59 7
9 J 26 a 43 r 60 8
10 K 27 b 44 s 61 9
11 L 28 c 45 t 62 +
12 M 29 d 46 u 63 /
13 N 30 e 47 v
14 O 31 f 48 w (pad) =
15 P 32 g 49 x
16 Q 33 h 50 y

21
Q

Describe base32 decoding algorithm.

A

The input is a string, not a bytes stream.
An input of the decoder is always multiple of 8 characters, because of padding used during encoding.

Remember, that base32 encoding iterates over input with 5 octets groups, which is 40 bits long. And a number of produced characters is 8 (5 bits per character).

  1. get the next 8 characters of input.
  2. transform every character into bits:
    Iteratively:
    a. get the next character
    b. get the decimal value of the index of the character in the alphabet (for pad use 0)
    c. transform the decimal index to bits
    d. add bits to the group result (in the end group result will be filled in with 40 bits)
  3. If no padding in the group add the group result to the output
    else if the group contains one padding symbol put only 32 bits to the output
    if the group contains three padding symbols put only 24 bits to the output
    if the group contains four padding symbols put only 16 bits to the output
    if the group contains six padding symbols put only 8 bits to the output.

Standard alphabet of base32:

Value Encoding Value Encoding Value Encoding
0 A 9 J 18 S 27 3
1 B 10 K 19 T 28 4
2 C 11 L 20 U 29 5
3 D 12 M 21 V 30 6
4 E 13 N 22 W 31 7
5 F 14 O 23 X
6 G 15 P 24 Y (pad) =
7 H 16 Q 25 Z
8 I 17 R 26 2

22
Q

Describe base16 decoding algorithm.

A

The input is a string, not a bytes stream.
An input of the decoder is always multiple of 2 characters.

Remember, that base16 encoding iterates over input by one octet, which is 8 bits long. And a number of produced characters is 2 (4 bits per character).

Base16 has no padding.

  1. get the next 2 characters of input.
  2. transform every character into bits:
    Iteratively:
    a. get the next character
    b. get the decimal value of the index of the character in the alphabet
    c. transform the decimal index into bits
    d. add bits to the group result (in the end group result will be filled in with 8 bits)
  3. add the group result to the output

Standard alphabet of base16:

Value Encoding Value Encoding Value Encoding
0 0 4 4 8 8 12 C
1 1 5 5 9 9 13 D
2 2 6 6 10 A 14 E
3 3 7 7 11 B 15 F