IEEE 754 Floating Point Flashcards
What is the most common representation for real numbers on computers?
IEEE Standard 754 floating point
What form does floating point encode rational numbers/values?
V = x × 2^y
What types of computations is floating point useful for?
- Very large numbers: |V|»_space; 0
- Numbers very close to 0: |V| «_space;1
- Approximation to real numbers
What is the bit size of single precision in floating point representation?
32 bits
What is the bit size of double precision in floating point representation?
64 bits
What is the range of single precision floating point?
+/- 2^-149 to +/- 2^127
What is the range of double precision floating point?
+/- 2^-1074 to +/- 2^1023
What does overflow mean in floating point representation?
Values have grown too large for the representation
What does underflow mean in floating point representation?
A value too small to be represented, resulting in a loss of precision
What is a characteristic of real numbers in floating point representation?
Having a decimal portion that is not necessarily equal to 0
How is the decimal number 123.14 represented in base 10?
110^2 + 210^1 + 310^0 + 110^-1 + 4*10^-2
What is the digit format for a decimal number?
dmdm-1…d1d0.d-1d-2…d-n
How is a binary number like 110.11 represented in base 2?
12^2 + 12^1 + 02^0 + 12^-1 + 1*2^-2
What is the digit format for a binary number?
bmbm-1…b1b0.b-1b-2…b-n
What happens when you shift the binary point one position left?
Divides the number by 2
What happens when you shift the binary point one position right?
Multiplies the number by 2
True or False: Numbers like 0.111…11 base 2 can represent numbers just below 1.
True
What type of numbers can fractional binary notation represent exactly?
Rational numbers with denominators that are powers of 2
What is an example of a number that cannot be represented exactly in binary?
1/3 and 5/7
What is the general form for numbers that can be represented in fractional binary notation?
x * 2^y (for integers x and y)
How can increasing accuracy be achieved in binary representation?
By lengthening the binary representation
Fill in the blank: Only finite-length encodings can be stored; ______ and 5/7 cannot be represented exactly.
1/3
What is the typical format for a number in scientific notation?
Only 1 digit to the left of the decimal point
Example: 6.385 * 10^4 equals 63,850
What are the four fields when representing a number in scientific notation?
- Sign
- Left side of decimal point
- Right side of decimal point
- Exponent
True or False: The IEEE 754 standard is used for consistent floating-point representation across computers.
True
What does the biased representation allow for negative numbers?
Operations on the biased numbers to be the same as for unsigned integers
What is the bias value typically chosen based on?
The number of bits available for representing an integer
What are the three fields in the encoding of a real number value in IEEE 754?
- S field (sign field)
- E field (exponent field)
- F field (significand/mantissa field)
What does the S field in IEEE 754 represent?
Encodes the sign of the number: 0 for positive, 1 for negative
What is the formula for the real number decimal value represented in IEEE 754?
(-1)^s * (f) * (2^e)
How is the exponent represented in the E field of IEEE 754?
E is an 8-bit biased integer representing the exponent e
What is the bias value for single precision (32-bit) representation in IEEE 754?
127
What is the bias value for double precision (64-bit) representation in IEEE 754?
1023
Fill in the blank: In IEEE 754, the exponent field (E) contains ______ plus the true exponent for single precision.
127
What is the significance of the hidden bit in IEEE 754?
It is implied to be 1 for normalized numbers and not stored in the encoding
What are the three types of values used in IEEE 754?
- Denormalized values
- Normalized values
- Special values (e.g., +/- infinity, NaN)
What does NaN stand for in IEEE 754?
Not a Number
What is the purpose of denormalized values in IEEE 754?
To provide gradual underflow and fill the gap between zero and the smallest normalized number
How is the exponent determined from the biased representation?
e = E - bias
What happens during abrupt underflow in floating-point representation?
The relative change is infinite
True or False: Denormalized values are used to keep the relative difference between tiny numbers small.
True
What is the effect of using denormals compared to abrupt underflow?
Denormals provide a more reasonable relative change in calculations
How do you represent the decimal number 64.2 in IEEE standard single precision?
Follow a 5-step process to obtain Sign(S), Exponent(E), and Mantissa(F)
What is the first step to convert a decimal number into IEEE 754 format?
Get a binary representation for the whole number and fractional parts
What is the binary representation of 64?
0100 0000
What is the binary representation of 0.2?
.0011 0011 0011…
What happens during normalization of the binary representation?
Make it look like scientific notation with a single 1 to the left of the binary point
What is the normalized form of 64.2?
1.0000 0000 1100 1100 1100 110 x 2^6
What is the true exponent (e) for the normalized form of 64.2?
6
What is the binary representation of the exponent E for 64.2?
1000 0101
How many bits are reserved for the mantissa in single precision?
23 bits
What is the mantissa stored (F) for 64.2 in IEEE 754?
000 0000 0110 0110 0110 0110
What is the final IEEE 754 representation for 64.2?
0 1000 0101 000 0000 0110 0110 0110 0110
What is the purpose of denormals in floating-point representation?
Denormals fill the gap between 0 and the smallest floating-point number that is 252 times the size of the difference between the smallest two numbers
Denormals are used to represent numbers that are too small to be represented as normal floating-point numbers, ensuring a gradual underflow instead of abrupt underflow.
What happens during abrupt underflow in floating-point arithmetic?
The difference between two tiny but different numbers becomes zero if it is less than the minimum cutoff value
This leads to the violation of the equivalence x == y when x and y are close but not equal.
How does gradual underflow differ from abrupt underflow?
Gradual underflow allows the difference between two tiny but different normal numbers to become a denormal, preserving the equivalence
In contrast, abrupt underflow results in a loss of distinction between the two values.
Is it advisable to use denormals on purpose in calculations?
No, using denormals on purpose is not advised
Denormals were designed only as a backup mechanism in exceptional cases.
Fill in the blank: Denormals were designed as a _______ mechanism in exceptional cases.
backup