3.5.4.4 Numbers With a Fractional Part. Flashcards
Outline the us of fixed point representation.
Used if we want to represent fractions.
We can fix the decimal point by defining our byte to represent fractions (but not actually storing the point).
How do we represent decimals and what does this involve?
Ways we can represent decimals:
325.5 can be represented as 0.3255 x 10^3
This way of representing numbers in different forms involves moving (floating) the decimal point to a new position.
We need a consistent method of representing decimals.
Outline the process of floating point notation.
In floating point notation, real numbers are represented in the following way:
- A sign ( 0 indicates a positive, 1 a negative number).
- some significant digits expressed as a number with a fractional part (mantissa).
- and an integer power of 2 (exponent).
0●1011000 0011
mantissa exponent.
When is the implied binary point?
The implied binary point is always after the first digit.
What is the exam question structure for Floating Point Numbers.
Exam questions will use 12 bit numbers:
8 bits for the mantissa
4 bits for the exponent.
How to calculate floating point numbers.
Find the sign of the mantissa - if the mantissa is negative perform two’s compliment.
- Find the value of the exponent (positive or negative).
- Move the decimal point the distance the exponent asks for (add up the positive values in the exponent) (right for a positive exponent, left for a negative exponent).
- Starting at the decimal point, work out the value of the mantissa.
Define a mantissa.
The number you want to store.
Define an exponent.
The position of the binary point in the number.
What must we remember about the exponent?
The exponent does not form part of the end number.
Outline how to float the binary point when working with a negative exponent.
Shift the entire binary number however many values specified by the exponent in the binary waiting line, add 0’s.
Outline the need for normalisation.
Normalisation overcomes the loss of a less significant bit, which will then lead to an unnecessary error, when there are leading zeros before the least significant bit.
Outline normalised floating point numbers.
With a fixed number of bits, a normalised representation of a number will display the number to the greatest accuracy possible.
Summarise what normalised numbers do.
In summary normalised numbers:
Give only one representation of a number.
Save space.
Give the most accurate representation of a number in a given number of bits.
Outline how to normalise.
The first two digits must be different.
- 0.1 (positive)
- 1.0 (negative)
Examples of normalisation:
(after floating point notation)
1101.1010 is normalised as 1.0110100 0011.
00101 normalised as 0.101000 1010
Outline precision in both the mantissa and exponent.
There needs to be a decision between the range of a number and the precision.
- If you want a very precise number, use more digits for the mantissa and less for the exponent as this will allow for more decimal places.
If you want a large range of numbers, use more digits for the exponent and less for the mantissa.
Outline what is meant by a rounding error in floating point notation.
When we try to represent some numbers but cant within the space we are given.
E.g. 1/3 = 0.33333333….etc.
What will occur if we cannot get perfect precision?
This can lead to errors, namely, a rounding error.
Define an absolute error.
The difference between the target number and the closest number achieved.
Define a relative error.
The percentage difference between the target number and the rounded value.
Example of an absolute error:
If I want to represent 23.27 in binary, and the closest I can get is 23.25, then the absolute error is (23.27 - 23.25) 0.02.
Example of a relative error:
If I want to represent 23.27 in binary, and the closest I can get is 23.25, then the relative error is (23.27 - 23.25) /23.27 - 0.09%.
Outline an underflow error.
Occurs when the exponent is too small to be displayed accurately in the number of bits available.
Outline an overflow error.
Occurs when the exponent is too large to be displayed in the number of bits available.
Essentially, the opposite of and underflow error.
Outline a cancellation error.
This type of error occurs during the subtraction of two very similar values where most of the significant digits are lost.