Lecture 7 Flashcards

Question 1

Q

Floating point numbers + precision

Answer

A

x = ± b0.b1b2…bn x 2^m, L≤m≤U, bi \in {0,1}

Precision p=n+1

Question 2

Q

Normalized floating point representation

Answer

A

x = ± 1.b1b2…bn x 2^m = ± 1.f x 2^m, L≤m≤U, bi \in {0,1}

Hidden bit representation: we don’t store b0=1, thus we add 1 bit of precision.

Question 3

Q

Normalized floating point representation of 47.125

Answer

A

(101111.001) = (1.01111001)x2^5

Question 4

Q

Smallest positive normalized FP number

Answer

A

1.000…0 x 2^L = 2^L (UFL)

Question 5

Q

Largest positive normalized FP number

Answer

A

S = 1.111…1 x 2^U = 2^U+2^{U-1}+…+2^{U-n}
2S = 2^{U+1}+2^{U}+…+2^{U-n+1}
2S - S = S = 2^{U+1}-2^{U-n} = 2^{U+1}(1-2^{-p}) (OFL)

Question 6

Q

Overflow

Answer

A

To -∞ or +∞ if number < -2^{U+1}(1-2^{-p}) or > 2^{U+1}(1-2^{-p})

Question 7

Q

Underflow

Answer

A

To zero if number -2^L < x < 2^L

Question 8

Q

Machine epsilon

Answer

A

Distance/gap between 1 and the next floating point number, depends on n only (# digits of the fractional part f). ϵm = 0.00…01 x 2^0 = 2^{-n}

Question 9

Q

Subnormal/denormalized FP representation

Answer

A

We set b0=0 and m=L. It provides a more gradual underflow, but a loss of precision/slower computation.

Question 10

Q

Subnormal/denormalized FP representation additional numbers

Answer

A

2(2^n - 1) – n #digits of f, x2 for positive + negative

Question 11

Q

Smallest positive subnormal number

Answer

A

0.00…1 x 2^L = 2^{-n}2^L = 2^{L-n}

(11 cards)