Floating Point Flashcards
Most modern architectures implement ___ standard for floating point representations
IEEE 754
True or False: “floating-point units (FPUs) are outdated and included in CPUs anymore”
False. Most CPUs include FPUs now
Benefits and purpose of FPUs
Have instructions to do FP arithmetic very quickly. If no FPU, FPU simulation is very slow.
Explain the difference between fixed and floating point representations
Fixed: number has a fixed number of decimal digits, more precise, smaller range
Floating: number of decimal digits may vary, less precise, large range
How is the position of the binary point established in a fixed-point number?
By convention. It is agreed upon by the programmer and the system so it may not be the same for every program.
If the binary point comes after the second bit in a fixed point number, 0111, what decimal number will result?
0x2^1 + 1x2^0 + 1x2^(-1) + 1x2^(-2) = 1.75
What is the mantissa (significand) and what is used to represent it in a floating point representation?
4.65 in 465 x10^(-4)
Fixed point is used to represent the mantissa
What are the differences between floating-point single format and scientific notation?
- Base 2 instead of 10
- The significand (mantissa) and exponent are in 0b
- The number is normalized so that 1<= significand <2
- Since MSB is always 1, it’s not stored
- the exponent is biased by adding the constant 127 so it’s stored as a + integer
- IEEE 754 standard
What does IEEE 754 standard look like for floating point single-format numbers?
- Uses 4 bytes
- Number = (-1)^s x 1.f x 2^(e-127)
s= signed bit
f = fractional part of significand
e = biased exponent (add 127 to unbiased)
-check notes for diagram
Range of biased exponents allowed
1-254
- 255 used to represent not numbers (NaNs)
- 0 used for subnormal numbers (tiny fractional quantities)
Range for floating-point single format. Also approximately how many digits of precision?
1.175e-38 to 3.403e+38
1.0 x2^(-126) to (2.0 -ε) x2^(127)
total range:
-3.403e+38 to -1.175e-38
Gap to 0.0 gap to
+1.175e-38 to +3.403
~7 digits of precision
What are the s, e and f values for |0|01111101|0000000…| (0x3E800000)
sign = 0
biased exponent = 125
fractional part of significand: 0.0
What are the s, e and f values for |0|01111101|0100000…| (0x3E900000)
sign = 0
biased exponent = 125
fractional part of significand: 0.01
What does IEEE 754 standard look like for floating point double-format numbers?
- 8 bytes
- Number = (-1)^s x 1.f x 2^(e-1023)
- check diagram
Range for floating-point double format. Also approximately how many digits of precision?
~2.2e-308 to ~1.8e+308
~17 digits
Also NaNs are represented with biased exponent 2047
+/- infinity, root(-1) NaNs in single format
+: 0x7f800000
-:0xff800000
root(-1): 0x7fffffff
+/- infinity, root(-1) NaNs in double format
+: 0x7ff00000 00000000
-:0xfff00000 00000000
root(-1): 0x7fffffff ffffffff
How can NaN be used?
- in most instructions causes an exception but can be compared to a number using fcmp
- can be used as an argument to some functions
How many floating point registers does ARMv8 have? Also what types of registers are they?
32 128-bit fp registers
s registers use low-order 32 bits for single-precision fp numbers
d registers use low-order 64 bits for double-precision fp numbers
opcode for multiply-negate of a fp number
fnmul
How is #fpimm encoded? How is it expressed?
With 1 sign bit, 4 bits of fraction and 3 bits for exponent
as +-n/16x2^r
n in range: 16-31
r in range: -3 to +4
Instruction for converting s <-> d
Instruction for converting s or d to nearest 32-bit or 64-bit signed int
instruction for converting s or d to unsigned ints
fcvt
fcvtns
fcvtnu
Instruction for converting Wn or Xn to Sd or Dd
Instruction for converting unsigned ints to floats
scvtf
ucvtf
What registers are used to pass fp arguments into a subroutine?
d0-d7 / s0-s7
What are registers d8-d15 called?
Callee-saved
- if used in a subroutine, it must save & restore their values on the stack
- only the bottom 64-bits of the 128 bit register need to be preserved
Which registers can be overwritten by a subroutine?
0-7 and 16-31
caller is responsible for saving/restoring if they need to be preserved over a subroutine call
What are two things that are stored in .text?
String literals
floating point values
What is the bias we add to the exponent for floating point double format?
1023