Floating Point Arithmetic Flashcards
How are integers represented in computing?
Integers can be represented exactly using a fixed number of bits.
What is the largest unsigned integer that can be represented with n bits?
2^n - 1
How are negative integers represented in computing?
Using two’s complement notation.
Why do we need floating point representation?
Many scientific and engineering calculations involve non-integer values that require fractional representation.
Give an example of a scientific value that requires floating point representation.
Mass of an electron: 9.109 × 10^−31kg
What key feature makes floating point representation different from integer representation?
The decimal point can float, allowing for a wide range of values.
What components make up a floating point number?
- Significand/Mantissa
- Exponent
- Base
What does the exponent do in floating point numbers?
It determines how much the significand is scaled by the base
How would 9.109 × 10−31 be represented in floating point?
- Significand: 9.109
- Base: 10
- Exponent: -31
What is IEEE 754?
The most widely adopted standard for floating point arithmetic.
What does IEEE 754 specify?
- Number representations (e.g., single precision, double precision)
- How operations like addition, subtraction, multiplication, and division behave
What are the two most commonly used floating point precisions in C?
-
Single Precision: 32-bit ,
float
-
Double Precision: 64-bit,
double
Why is double precision preferred for scientific computing?
It provides greater accuracy and range compared to single precision
What happens if a number exceeds single precision limits?
It is represented as infinity (inf
)
In what applications might single precision be sufficient?
Machine learning and graphics.
How many bits does double precision floating point use?
64 bits (8 bytes)
How are the 64 bits divided in double precision floating point?
- 52 bits for the mantissa
- 11 bits for the exponent
- 1 bit for the sign
How is a normalized floating point number represented in IEEE 754?
x=±(1.b{1}.b{2}…b{52}) x 2^(a{1}.a{2}…a{11})-1023
where:
- b{1}, b{2}, …b{52} are the mantissa
- a{1}, a{2}, …a{11} are the exponent bits
What is the smallest normalised double precision number?
2^(−1022) ≈ 10^(−308)
What is the largest normalised double precision number?
(2−2^(−52)) × 2^(1023) ≈ 10^(308)
Why are floating point numbers not always exact?
Because they have finite precision, leading to rounding errors.
What is machine epsilon?
The smallest difference between 1 and the next representable number, approximately: 2^(−52) ≈ 10^(−16)
Why is machine epsilon important?
It determines the smallest detectable rounding error in double precision calculations.
What are some cases where floating point arithmetic fails?
- 1.0/0.0 → Infinity (
inf
) - 0.0/0.0 → Not a Number (
NaN
) - sqrt{-1.0} → Not a Number (
NaN
) - Operations that exceed floating point range → Overflow or underflow