Measures of location and spread Flashcards

1
Q

measure of location

A

A measure of location is a single value describing a position in a data set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

measure of central tendency

A

A measure of central tendency (averages) is a single value that describes the centre of the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Measures of central tendency (averages) - Mean

A

The mean uses all the data points
The mean can be distorted by extreme values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Measures of central tendency (averages) - Median

A

middle value when data is arranged in order (or average of middle two values).
The position of the median is given by (n+1) / 2 where n is the number of items of data.

Some points about the median:
* The median is not distorted by extreme values
* The median can still be calculated even if some of the data is missing, e.g. times taken for people to finish a race
* The median is the value with the property that half the values are higher than it and half the values are lower than it
* It can be tedious to have to order the data first

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Measures of central tendency (averages) - Mode:

A

Mode: most common value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Measures of central tendency (averages) - Modal class:

A

Modal class:
class that occurs most often ie. has the highest frequency.

Some points about the mode:
* The mode is useless unless there are lots of repeated values
* It is used when the data set has either a single mode or two modes (bimodal)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Grouped Data - Mean:

A

Mean:
When the data is grouped into classes, you can obtain an estimate for the mean by using the midpoint of the classes (the mid-interval value). This means that you assume that all the values in each class interval are equally spaced about the mid-point.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Grouped Data - Modal class:

A

Modal class:
This is the class which has the highest frequency.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Grouped Data - Class containing the median:

A

Class containing the median:
This is the class that contains the middle data value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Other measures of location

A

Other measures of location include quartiles and percentiles.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

To find the lower quartile for discrete data containing n data values you need to use the following rules:

A
  • Lower quartile: Divide n by 4.
    → If this is a whole number, the lower quartile is halfway between this data
    point and the one above.
    → If this is not a whole number, round up and pick this data point.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

To find the upper quartile for discrete data containing n data values you need to use the following rules:

A
  • Upper quartile: Find ¾ of n.
    → If this is a whole number, the upper quartile is halfway between this data
    point and the one above.
    → If this is not a whole number, round up and pick this data point.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Measures of spread / dispersion / variation - Range

A

Range = Largest value – Smallest value

This is simple to calculate but is highly sensitive to outliers.
Consider this set of marks for a maths test:

45, 50,43, 49, 52, 58, 48, 10, 50, 82, 56, 40, 47, 39, 51

Range = 82 – 10 = 72 marks
This is not a good measure of spread as most of the marks are in the range 40 – 60.
Discounting the ‘10’ and ‘80’ as outliers gives a range of 58 – 40 = 18 which is perhaps more representative of the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Measures of spread / dispersion / variation - Interquartile Range

A

One way of refining the range so that it does not rely completely on the most extreme items of data is to use the interquartile range. This gives the spread of the middle 50% of the data and therefore avoids extreme values.

Interquartile Range = Upper Quartile (Q3) – Lower Quartile (Q1)

i.e. IQR = Q3 – Q1

For a large data set, 25% of the data lie below the lower quartile, and 75% of the data lie below the upper quartile. The interquartile range measures the range of the middle 50% of the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Measures of spread / dispersion / variation - Interpercentile Range

A

This is the difference between the values for two given percentiles.
This is still not affected by extreme values but allows more of the data to be considered.
Eg. The 20th to 80th interpercentile range considers the spread of the middle 60% of the data.
The 10th to 90th interpercentile range considers the spread of the middle 80% of the data.
The 10th to 90th interpercentile range is often used as it includes a lot of the data whilst not being affected by extreme values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

deviation

A

The deviation of an item of data from the mean is the difference between the data item and the mean i.e. x- x ̅

Consider a small set of data: {0, 1, 1, 3, 5}

The mean of this data is given by x ̅= (0+1+1+3+5) / 5 =2

The set of deviations for this set of data is: {-2, -1, -1, 1, 3}

17
Q

Sum of squares

A

To compensate for differing signs, we square the differences or deviations.
The sum of the squares of the deviations is known as the sum of squares and is denoted by .
For the set of data above:

18
Q

variance

A

To use S_xx as a comparable measure of spread, it is necessary to take into account the number of data items. This allows two data sets of different sizes to be compared.
Therefore, we need to divide this value by n.

19
Q

standard deviation

A

The standard deviation is the square root of the variance and is given by

20
Q

coding???

A

Coding
Find the mean and standard deviation of the following data sets:-
1. 20, 26, 26, 27, 28
2. 21, 27, 27, 28, 29
3. 12, 18, 18, 19, 20
4. 18, 24, 24, 25, 26
5. 40, 46, 46, 47, 48
6. 40, 52, 52, 54, 56
7. 60, 78, 78, 81, 84
8. 2, 2.6, 2.6, 2.7, 2.8
9. 4, 5.2, 5.2, 5.4, 5.6
10. 41, 53, 53, 55, 57
11. 55, 73, 73, 76, 79
12. 11, 14, 14, 14.5, 15