Chapter 2: Floating Point Numbers and Rounding Flashcards
What is truncation?
Dropping unwanted bits, rounding toward zero, rounding down
What is rounding toward +- infinity?
rounding up, essentially rounding to the nearest possible floating-point number while increasing the magnitude
What is rounding to the nearest?
rounding to the closest possible floating-point number irrespective of direction
What is normalization?
When a number is made such that there is only one non-zero digit leading the radix point and an exponent is made to compensate the shifting of the radix point
Explain how to normalize a fractional binary value
A
What is the Significand also called?
Mantissa
What is the significand/mantissa?
The significant bits of a normalized number
What is the significand/mantissa represented in?
sign and magnitude format (leading bit is the sign bit)
How many unique zeros are there in IEEE-754 format?
2:
00000000000000000000000000000000) and (10000000000000000000000000000000
What is the general bias of 32-bit IEEE-754 format?
127
What does the leading bit of IEEE-754 format represent?
The sign
What do the 8 bits after the first bit represent in IEEE-754 format?
The biased exponent
What do the final 23 bits in IEEE-754 format represent?
The fractional bits
When normalizing a binary number for IEEE-754 format, what is removed?
The leading 1 to the left of the radix point
If every digit but possibly not the first is a 0 in IEEE-754 format, the number is:
+- zero
If the exponent is all zeros, but there is at least one non-zero bit in the mantissa in IEEE-754 format:
The number is un-normalized and the bias is 126
What is the one case where the bias of IEEE-754 format is 126 not 127?
If the exponent is all zeros, but there is at least one non-zero bit in the mantissa
If the exponent is all 1’s and there is NO fractional bit in IEEE-754 format, the number is:
Either +- infinity
If the exponent is all 1’s and there is at least one non-zero fractional bit in IEEE-754 format, the number is:
NaN (Not a Number)
Explain floating-point arithmetic:
Make the exponents the same and then perform addition or subtraction
For floating-point arithmetic, if the difference between the two exponents of the normalized two numbers is greater than the number of significant bits:
The addition result of these two numbers will be the larger of them
For floating-point arithmetic, when will the addition result of these two numbers be the larger of them?
If the difference between the two exponents of the normalized two numbers is greater than the number of significant bits
What is the smallest possible normalized value to be represented in IEEE-754 format?
+- E = 1, F = 0x000000
+- 1.17549 x 10^-38
What is the largest possible normalized value to be represented in IEEE-754 format?
+- E = 254, F = 0x7FFFFF
+- 3.40282 x 10^38
How many many possible NaN values can be represented in IEEE-754 format in TOTAL?
((2^22) - 1) * 2
possible NaN values
How many many possible NaN NEGATIVE values can be represented in IEEE-754 format in?
((2^22) - 1)
possible NaN values
How many many possible NaN POSITIVE values can be represented in IEEE-754 format in?
((2^22) - 1)
possible NaN values