Floating-point arithmetic Flashcards
What numbers can we store exactly on a computer?
Integers up to some maximum size
What is the largest possible number than can be stored using 64-bit?
Assuming one bit is used to store the sign ±, the largest possible number is 263 - 1
What is fixed point representation?

What is (10.1)2
1 x 21 + 0 x 20 + 1 x 2-1 = 2.5
With fixed-point numbers are any numbers ever the same?
No - every number has a unique representation
What is a problem with fixed-point representation?
Easy to “escape”
What is meant by fixed-point representaion being easy to escape?
Numbers like (0.01)10(0.10)10 = (0.001)10 can’t be represented.
What is floating-point representation?

What is the (0.d1d2…dm)β in the following called?

- Fraction
- Significand
- Mantissa
What is β and e in the following called?

- Base
- Exponent
What is one advantage and disadvantage to usinh floating point numbers over fixed point numbers?
- You can represent a much larger range of numbers in a floating-point representation
- However the numbers in floating-point representation are not equally spaced
In floating-point numbers if d1 ≠ 0 then each number in F has a unique representaion and is called?
Normalised
What is the IEEE?
A standard for double-precision (64 bit) arithmetic
What are the 64 bits used in the IEEE standard?
- 52 bits for the fraction
- 11 for the exponent
- 1 for the sign
What is the IEEE representation?

What does exponent bias mean in the IEEE standard?
The actual exponents are in range -1022 go 1055
What are the exponents -1022 and 1025 used to store in the IEEE standard?
±0 and ±∞ respectively
When β = 2, what does the first digit being normalsied mean?
The first digit is normalised to 1, so doesn’t need to be stored in memory
Define underflow.
If a calculation falls below the lower non-zero limit (in absolute value it is called underflow.
Define overflow
If a calculation falls above the upper limit (in absolute value) it is called overflow, and usually results in a flaoting-point exception
Define rounding.
The mapping from ℝ to F is called rounding.
What is used to denote rounding?
fl(x)
How do you round a number?
Round the nearest number in F to x, if x lies exactly midway between two numbers in F, a method of breakinf ties is required. This is to round to the nearest even digit
How do we count significant figures.
Start with the first non-zero digit from the left, and count all digits thereafter, uncluding ginal zeros if they are after the decimal point.
What is the equation for the fl(x)?
fl(x) = x(1 + δ)
What is the equation for the relative error incurred by rounding?

What does δ stand for?
The relative rounding error
How do we find an upper bound of |δ|?

What is the upper bound of |δ|?
|δ| ≤ εM
What does εM stand for?
Machine epsilon (or unit roundoff)
Why is the machine epsilon also called the unit roundoff?
It is the distance between the smallest number in F greater than 1 but not rounded to 1
What does εM equal?

What is the fundamental axiom of floating-point arithmetic?

What is the error when we are adding the following two numbers?


What is a major cause of error in floating-point calculations?
Loss of significance
What is loss of significance?
If x ± y is very close together, then there can be an arbitrarily large relative error in the result compared to the inital values of x and y.
Does (a + b) + c = a + (b + c) in floating point arithmetic?
Not always