UNIT 3: Summarising data: measures of central tendency and dispersion Flashcards
Median from discrete data
To find the middle of the list for n numbers you find the (n+1)/2 th value
Negative Skew
- peak on the right
- toes on left feet
- mean
Symmetric
- peak in the middle
- mean=median=mode
Positive Skew
- peak on the left
- toes on right feet
- mean>median>mode
Geometric Mean
nth root of value 1 * value 2 * … * value n
When to use the geometric mean
- When you are finding an average of percentage increase/decrease or RATES.
- When you are comparing things that have several very different values. Eg marks out of 100 and marks of out 10 for each item
Weighted Mean
∑weight
Which measure of spread to compare for MODE
Range
Which measure of spread to compare for MEDIAN
Interquartile range (or if specified in the question the inter percentile range)
Which measure of spread to compare for MEAN
Standard Deviation
Advantages of Range
Easy to calculate
Disadvantages of Range
Affected by outliers
Advantages of IQR/Inter decile/percentile range
Not affected by outliers
Disadvantages of IQR/Inter decile/percentile range
Doesn’t allow you to calculate skew
Advantages of standard deviation
Use to calculate skew
Disadvantages of standard deviation
Affected by outliers
Doesn’t have much meaning for skewed data
Advantages of mean
- Uses all the data
- Use to calculate standard deviation
- Use to calculate skew
Disadvantages of mean
-Always effected by extreme values
-Hard to calculate if you have an open ended class in a table.
(eg x>300)
Advantages of mode
- Easy to find (calculate)
- Can always find the mode - even for qualitative data
- Not affected by extreme values
- Always a data value in the set
Disadvantages of mode
- May be no mode
- May be more than one mode
- Cannot use it to calculate a measure of spread
Advantages of median
- Easy to calculate (Unless using interpolation)
- Not affected by extreme values
- Best to use when data is skewed
- Can be used to calculate IQR and skew and outliers
Disadvantages of median
May not be a data value