Lecture 8 Flashcards

Question 1

Q

IEEE-754 Single Precision

Answer

A

32 bits - sign s (1) / exponent c (8) / significand f (23)
x=(-1)^s 1.f x 2^m
c = m + 127, 1 ≤ c ≤ 254, m \in [L= -126, U=127] (c=255 / c=0 reserved)

Question 2

Q

IEEE-754 Single Precision Zero

Answer

A

c = (00000000)
f = (000000...00)

Question 3

Q

IEEE-754 Single Precision Subnormal numbers

Answer

A

c = (00000000) but f≠0
Set m=L= -126 (NOT -127!) and leading digits to 0.
x = +0.f x 2^{-126}

Question 4

Q

IEEE-754 Single Precision Infinity

Answer

A

c=(11111111), f=(00…0)

Question 5

Q

IEEE-754 Single Precision NaN

Answer

A

c=(11111111), f≠0

Question 6

Q

IEEE-754 Double Precision

Answer

A

64 bits, sign s (1), exponent c (11), significand f (52)

c = m + 1023, 1 ≤ c ≤ 2046,m \in [-1022,1023]

Question 7

Q

IEEE-754 Rounding

Answer

A

Round toward zero (truncate): x_- = ± 1.b1b2…bn x 2^m

Round toward ±∞ (add ϵm.2^{m}): x_+ = ± 1.b1b12…bn x 2^m + 0.00…1 x 2^m

Question 8

Q

IEEE-754 Round up/down

Answer

A

Round up (ceil) toward ∞: x_+ if x positive, x_- if x negative
Round down (floor) toward -∞: x_- if x positive, x_+ if x negative

Question 9

Q

IEEE-754 Rounding Absolute/relative error

Answer

A

err_abs = |~x - x| ≤ |x_+ - x_-| = ϵm x 2^m
err_rel = |~x - x|/|x|≤ ϵm

Question 10

Q

IEEE Single precision, find the smallest ⍺ s.t. 2^8 + ⍺ ≠ 2^8.

Answer

A

x_+ = 2^8 + 2^8.ϵ_m = 2^{-15}

Gap from number x can be estimated x.ϵm

Question 11

Q

IEEE Single/double precision Machine epsilon

Answer

A

single: ϵm = 2^{-23} = 10^{-7}
double: ϵm = 2^{-52} = 10^{-16}

Question 12

Q

a=10^5, b=1.0
while a+b > a:
b = b/2
For which b will it stop?

Answer

A

Will stop when a+b=a, that is when b = a.ϵm = 10^5 10^{-16} = 10^{-11}

Question 13

Q

Catastrophic Cancellation

Answer

A

c = a - b when a≃b
a = 1.1011 ×2^1
b = 1.1010 ×2^1
Normalization: c=1.???? ×2^{-3}

Question 14

Q

Cancellation

Answer

A

c = a+b with a≪b or b≪a

Question 15

Q

x = 0.3721448693 and y = 0.3720214371, compute (x-y) using 5 decimal digits of accuracy. Relative error due to rounding vs. relative error due to subtraction?

Answer

A

Rounding: 1.3 x 10^{-5}
Substraction: 3 x 10^{-2}

Question 16

Q

Loss of significance, how to avoid f(x) = √(x² + 1) - 1

Answer

Study These Flashcards

A

Rewrite function to eliminate subtraction of similar numbers:
f(x) = x² / (√(x² + 1) + 1)
Trick: (a-b) = (a-b)[(a+b)/(a+b)] = (a²-b²)/(a+b)

Question 17

Q

Trick sum numbers without cancellation

Answer

Study These Flashcards

A

sum(np.sort(data)), similar numbers will be added together first

Lecture 8 Flashcards

(17 cards)