Base64, Base32, Base16 (10.12.2022 3M) Flashcards
What the purpose of base encoding of data?
The main purpose is storing and transifering data in systems that are restricted to US-ASCII.
Describe discrepancy: Line Feeds in Encoded Data. And the requirement of it.
Multipurpose Internet Mail Extensions (MIME) uses base64.
But it states that you must line-feed every 76 characters.
MIME inherits the encoding from Privacy Enhanced Mail (PEM) [3], stating that it is “virtually identical”; however, PEM uses a line length of 64 characters. The MIME and PEM limits are both due to limits within SMTP.
Implementations MUST NOT add line feeds to base-encoded data unless the specification referring to this document explicitly directs base encoders to add line feeds after a specific number of characters.
Describe Interpretation of Non-Alphabet Characters in Encoded Data.
Base encodings use a specific, reduced alphabet to encode binary data. Non-alphabet characters may be exploited as a “covert channel”, where non-protocol data can be sent for nefarious purposes. Non-alphabet characters might also be
sent in order to exploit implementation errors leading to, e.g., buffer overflow attacks.
Implementations MUST reject the encoded data if it contains characters outside the base alphabet when interpreting base-encoded data unless the specification referring to this document explicitly states otherwise.
Such specifications may instead state, as MIME does, that characters outside the base encoding alphabet should simply be ignored when interpreting data (“be liberal in what you accept”).
Note that this means that any adjacent carriage return/line feed (CRLF) characters constitute “non-alphabet characters” and are ignored. Furthermore, such specifications MAY ignore the pad character, “=”, treating it as non-alphabet data, if it is present before the end of the encoded data. If more than the allowed number of pad characters is found at the end of the string (e.g., a base 64 string terminated with “===”), the excess pad characters MAY also be ignored.
What is RFC number of standard describing base64, base32, base16?
RFC 4648
Why name is base64?
Because base64 uses a subset of US-ASCII with the magnitude of 64 (6 bit). Actually, it uses 65 characters, but ‘=’ is a special pad character.
Describe base64 encoding process?
Iteratively:
1. get the next 24 bits of input (3 octets).
2. transform 24-bits to 4 alphabet characters:
Iteratively:
a. get the next 6 bits
b. transform it to an integer
c. use the integer as an index in the alphabet to get the character
Obviously, that input length is not always multiple of 24. This is why padding is used. And there are only two situations possible:
1. Two octets are missed at the end of the data
2. One octet is missed at the end of the data
How to manage these situations respectively:
1. The final quantum of encoding input is exactly 8 bits; here, the final unit of encoded output will be two characters followed by two “=” padding characters.
2. The final quantum of encoding input is exactly 16 bits; here, the final unit of encoded output will be three characters followed by one “=” padding character.
As you may notice there are as many ‘=’ characters as many “missed” characters. We expect 4 characters from every group, but if you can only produce two characters, the rest two characters are the ‘=’ character.
You may ask, for example, for the first situation we have only 8 bits, but to get two characters 12 bits are needed. The answer is to make absent bits zeroes.
Standard alphabet of base64:
Value Encoding Value Encoding Value Encoding
0 A 17 R 34 i 51 z
1 B 18 S 35 j 52 0
2 C 19 T 36 k 53 1
3 D 20 U 37 l 54 2
4 E 21 V 38 m 55 3
5 F 22 W 39 n 56 4
6 G 23 X 40 o 57 5
7 H 24 Y 41 p 58 6
8 I 25 Z 42 q 59 7
9 J 26 a 43 r 60 8
10 K 27 b 44 s 61 9
11 L 28 c 45 t 62 +
12 M 29 d 46 u 63 /
13 N 30 e 47 v
14 O 31 f 48 w (pad) =
15 P 32 g 49 x
16 Q 33 h 50 y
What the difference between base64 and base64url?
The main difference is in the 62:nd and the 63:nd characters of the encoding alphabet.
As soon as for base64 they are ‘+’ and ‘/’ and this is not “URL friendly”, for base64url they are suggested to be replaced with ‘-‘ and ‘_’.
What is a trick with padding character ‘=’ in base64url?
The pad character “=” is typically percent-encoded when used in an URL, but also can be excluded at all
Is base64url only URL safe?
No, it’s URL and Filename safe.
What for Base 64 encoding is designed?
The Base 64 encoding is designed to represent arbitrary sequences of octets in a form that allows the use of both upper- and lowercase letters but that need not be human-readable.
It’s the most compact encoding. If system is case sensitive, this is the best choice.
What Base 32 encoding is designed for?
The Base 32 encoding is designed to represent arbitrary sequences of octets in a form that needs to be case insensitive but that need not be human-readable.
Why name is base32?
Because base32 uses a subset of US-ASCII with the magnitude of 32 (5 bit). Actually, it uses 33 characters, but ‘=’ is a special pad character.
Describe base32 encoding process?
Iteratively:
1. get the next 40 bits of input (5 octets).
2. transform 40-bits to 8 alphabet characters:
Iteratively:
a. get the next 5 bits
b. transform it to an integer
c. use the integer as an index in the alphabet to get the character
Obviously, that input length is not always multiple of 40. This is why padding is used. And there are four situations possible:
1. One octet is missed at the end of the data
2. Two octets are missed at the end of the data
3. Three octets are missed at the end of the data
4. Four octets are missed at the end of the data
How to manage these situations respectively:
1. The final quantum of encoding input is exactly 32 bits; here, the final unit of encoded output will be seven characters followed by one “=” padding character.
2. The final quantum of encoding input is exactly 24 bits; here, the final unit of encoded output will be five characters followed by three “=” padding characters.
3. The final quantum of encoding input is exactly 16 bits; here, the final unit of encoded output will be four characters followed by four “=” padding characters.
4. The final quantum of encoding input is exactly 8 bits; here, the final unit of encoded output will be two characters followed by six “=” padding characters.
As you may notice there are as many ‘=’ characters as many “missed” characters. We expect 8 characters from every group, but if you can only produce two characters, the rest six characters are the ‘=’ character.
You may ask, for example, for the first situation we have only 32 bits, but to get seven characters 35 bits are needed. The answer is to make absent bits zeroes.
Standard alphabet of base32:
Value Encoding Value Encoding Value Encoding
0 A 9 J 18 S 27 3
1 B 10 K 19 T 28 4
2 C 11 L 20 U 29 5
3 D 12 M 21 V 30 6
4 E 13 N 22 W 31 7
5 F 14 O 23 X
6 G 15 P 24 Y (pad) =
7 H 16 Q 25 Z
8 I 17 R 26 2
What is base32hex?
This encoding is identical to the base32, except for the alphabet.
The “Extended Hex” Base 32 Alphabet:
Value Encoding Value Encoding Value Encoding
0 0 9 9 18 I 27 R
1 1 10 A 19 J 28 S
2 2 11 B 20 K 29 T
3 3 12 C 21 L 30 U
4 4 13 D 22 M 31 V
5 5 14 E 23 N
6 6 15 F 24 O (pad) =
7 7 16 G 25 P
8 8 17 H 26 Q
What the main property of base32hex?
One property with this alphabet, which the base64 and base32 alphabets lack, is that encoded data maintains its sort order when the encoded data is compared bit-wise.
Check this: you have any 3 original strings, and you encode them with base 32hex, so now you have also 3 encoded strings. If you sort orginal strings bitwise and then sort encoded strings bitwise, both sets will have the same order.