Measures of location and spread Flashcards
measure of location
A measure of location is a single value describing a position in a data set.
measure of central tendency
A measure of central tendency (averages) is a single value that describes the centre of the data.
Measures of central tendency (averages) - Mean
The mean uses all the data points
The mean can be distorted by extreme values
Measures of central tendency (averages) - Median
middle value when data is arranged in order (or average of middle two values).
The position of the median is given by (n+1) / 2 where n is the number of items of data.
Some points about the median:
* The median is not distorted by extreme values
* The median can still be calculated even if some of the data is missing, e.g. times taken for people to finish a race
* The median is the value with the property that half the values are higher than it and half the values are lower than it
* It can be tedious to have to order the data first
Measures of central tendency (averages) - Mode:
Mode: most common value
Measures of central tendency (averages) - Modal class:
Modal class:
class that occurs most often ie. has the highest frequency.
Some points about the mode:
* The mode is useless unless there are lots of repeated values
* It is used when the data set has either a single mode or two modes (bimodal)
Grouped Data - Mean:
Mean:
When the data is grouped into classes, you can obtain an estimate for the mean by using the midpoint of the classes (the mid-interval value). This means that you assume that all the values in each class interval are equally spaced about the mid-point.
Grouped Data - Modal class:
Modal class:
This is the class which has the highest frequency.
Grouped Data - Class containing the median:
Class containing the median:
This is the class that contains the middle data value.
Other measures of location
Other measures of location include quartiles and percentiles.
To find the lower quartile for discrete data containing n data values you need to use the following rules:
- Lower quartile: Divide n by 4.
→ If this is a whole number, the lower quartile is halfway between this data
point and the one above.
→ If this is not a whole number, round up and pick this data point.
To find the upper quartile for discrete data containing n data values you need to use the following rules:
- Upper quartile: Find ¾ of n.
→ If this is a whole number, the upper quartile is halfway between this data
point and the one above.
→ If this is not a whole number, round up and pick this data point.
Measures of spread / dispersion / variation - Range
Range = Largest value – Smallest value
This is simple to calculate but is highly sensitive to outliers.
Consider this set of marks for a maths test:
45, 50,43, 49, 52, 58, 48, 10, 50, 82, 56, 40, 47, 39, 51
Range = 82 – 10 = 72 marks
This is not a good measure of spread as most of the marks are in the range 40 – 60.
Discounting the ‘10’ and ‘80’ as outliers gives a range of 58 – 40 = 18 which is perhaps more representative of the data.
Measures of spread / dispersion / variation - Interquartile Range
One way of refining the range so that it does not rely completely on the most extreme items of data is to use the interquartile range. This gives the spread of the middle 50% of the data and therefore avoids extreme values.
Interquartile Range = Upper Quartile (Q3) – Lower Quartile (Q1)
i.e. IQR = Q3 – Q1
For a large data set, 25% of the data lie below the lower quartile, and 75% of the data lie below the upper quartile. The interquartile range measures the range of the middle 50% of the data.
Measures of spread / dispersion / variation - Interpercentile Range
This is the difference between the values for two given percentiles.
This is still not affected by extreme values but allows more of the data to be considered.
Eg. The 20th to 80th interpercentile range considers the spread of the middle 60% of the data.
The 10th to 90th interpercentile range considers the spread of the middle 80% of the data.
The 10th to 90th interpercentile range is often used as it includes a lot of the data whilst not being affected by extreme values.