Floating Point Flashcards

1
Q

Most modern architectures implement ___ standard for floating point representations

A

IEEE 754

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

True or False: “floating-point units (FPUs) are outdated and included in CPUs anymore”

A

False. Most CPUs include FPUs now

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Benefits and purpose of FPUs

A

Have instructions to do FP arithmetic very quickly. If no FPU, FPU simulation is very slow.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Explain the difference between fixed and floating point representations

A

Fixed: number has a fixed number of decimal digits, more precise, smaller range

Floating: number of decimal digits may vary, less precise, large range

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How is the position of the binary point established in a fixed-point number?

A

By convention. It is agreed upon by the programmer and the system so it may not be the same for every program.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

If the binary point comes after the second bit in a fixed point number, 0111, what decimal number will result?

A

0x2^1 + 1x2^0 + 1x2^(-1) + 1x2^(-2) = 1.75

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the mantissa (significand) and what is used to represent it in a floating point representation?

A

4.65 in 465 x10^(-4)

Fixed point is used to represent the mantissa

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the differences between floating-point single format and scientific notation?

A
  • Base 2 instead of 10
  • The significand (mantissa) and exponent are in 0b
  • The number is normalized so that 1<= significand <2
    • Since MSB is always 1, it’s not stored
    • the exponent is biased by adding the constant 127 so it’s stored as a + integer
  • IEEE 754 standard
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does IEEE 754 standard look like for floating point single-format numbers?

A
  • Uses 4 bytes
  • Number = (-1)^s x 1.f x 2^(e-127)

s= signed bit
f = fractional part of significand
e = biased exponent (add 127 to unbiased)

-check notes for diagram

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Range of biased exponents allowed

A

1-254

  • 255 used to represent not numbers (NaNs)
  • 0 used for subnormal numbers (tiny fractional quantities)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Range for floating-point single format. Also approximately how many digits of precision?

A

1.175e-38 to 3.403e+38
1.0 x2^(-126) to (2.0 -ε) x2^(127)

total range:
-3.403e+38 to -1.175e-38
Gap to 0.0 gap to
+1.175e-38 to +3.403

~7 digits of precision

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the s, e and f values for |0|01111101|0000000…| (0x3E800000)

A

sign = 0
biased exponent = 125
fractional part of significand: 0.0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the s, e and f values for |0|01111101|0100000…| (0x3E900000)

A

sign = 0
biased exponent = 125
fractional part of significand: 0.01

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does IEEE 754 standard look like for floating point double-format numbers?

A
  • 8 bytes
  • Number = (-1)^s x 1.f x 2^(e-1023)
  • check diagram
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Range for floating-point double format. Also approximately how many digits of precision?

A

~2.2e-308 to ~1.8e+308
~17 digits

Also NaNs are represented with biased exponent 2047

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

+/- infinity, root(-1) NaNs in single format

A

+: 0x7f800000
-:0xff800000
root(-1): 0x7fffffff

16
Q

+/- infinity, root(-1) NaNs in double format

A

+: 0x7ff00000 00000000
-:0xfff00000 00000000
root(-1): 0x7fffffff ffffffff

17
Q

How can NaN be used?

A
  • in most instructions causes an exception but can be compared to a number using fcmp
  • can be used as an argument to some functions
18
Q

How many floating point registers does ARMv8 have? Also what types of registers are they?

A

32 128-bit fp registers

s registers use low-order 32 bits for single-precision fp numbers

d registers use low-order 64 bits for double-precision fp numbers

19
Q

opcode for multiply-negate of a fp number

A

fnmul

20
Q

How is #fpimm encoded? How is it expressed?

A

With 1 sign bit, 4 bits of fraction and 3 bits for exponent

as +-n/16x2^r
n in range: 16-31
r in range: -3 to +4

21
Q

Instruction for converting s <-> d

Instruction for converting s or d to nearest 32-bit or 64-bit signed int

instruction for converting s or d to unsigned ints

A

fcvt

fcvtns

fcvtnu

22
Q

Instruction for converting Wn or Xn to Sd or Dd

Instruction for converting unsigned ints to floats

A

scvtf
ucvtf

23
Q

What registers are used to pass fp arguments into a subroutine?

A

d0-d7 / s0-s7

24
Q

What are registers d8-d15 called?

A

Callee-saved
- if used in a subroutine, it must save & restore their values on the stack
- only the bottom 64-bits of the 128 bit register need to be preserved

25
Q

Which registers can be overwritten by a subroutine?

A

0-7 and 16-31
caller is responsible for saving/restoring if they need to be preserved over a subroutine call

26
Q

What are two things that are stored in .text?

A

String literals
floating point values

27
Q

What is the bias we add to the exponent for floating point double format?

A

1023