Stats - measures of central tendency and dispersion Flashcards
What is the difference between descriptive vs inferential statistics?
Descriptive statistics are used to describe the basic features of the data in a study. They are typically distinguished from inferential statistics which help to form conclusions beyond the immediate data. Descriptive statistics help us to simplify data.
What are the 3 main measures of central tendency?
Mean
Median
Mode
What is the median?
The median is the middle item in a data set that has been arranged in numerical order. It may be an actual item in the data set or it may be an item that needs to be calculated. It is not affected by outliers
It is calculated by arranging the items in order then selecting the item whereby half the items are above and half are below. For example in the data set below, the item 3 (in bold) is the median value
1, 3, 3, 4, 5
In cases where there are an even number of items the median is half way between the middle two items. For example in the data set below, the median is half way between 3 and 4 which is 3.5
1, 3, 3, 4, 5, 6
What is the mode?
Used to summarise categorical data set, the mode is the most frequent item in a data set.
In some data sets there may be two modes or more. For example see the data set below:
1, 1, 1, 2, 2, 2, 3, 4
The modal values in this case would be both 1 and 2 (it is bimodal / multimodal)
Similarly, in some data sets there may be no mode when all the values appear with similar frequency, see below:
0, 1, 2, 3, 4, 5, 6
The mode is not used as much for continuous variables because with this type of variable, it is likely that no value will appear more than once (e.g. if you ask 20 people their personal income in the previous year, it’s possible that many will have amounts of income that are very close, but that you will never get exactly the same value for two people).
What is the mean?
The mean is calculated by adding all the items of a data set together and dividing by the number of items.
For example in the following data set
1, 2, 2, 2, 3
mean = (1 + 2 + 2 + 2 + 3) / 5
mean = 10 / 5
mean = 2
Unlike the median or the mode, the mean is sensitive to a change in any value of the data set. The mean is sensitive to outliers and skewed data.
Note: this is the arithmetic mean (as opposed other means such as the geometric, harmonic, and generalised means
What is the preferred measure of central tendency for the following measurement scale:
Categorical
Mode
What is the preferred measure of central tendency for the following measurement scale:
Nominal
Mode
What is the preferred measure of central tendency for the following measurement scale:
Ordinal
Median/ mode
What is the preferred measure of central tendency for the following measurement scale:
Interval (Normal distribution)
Mean (preferable), median or mode
What is the preferred measure of central tendency for the following measurement scale:
Interval (skewed data)
Median
What is the preferred measure of central tendency for the following measurement scale:
Ratio (normal distribution)
Mean (preferable), median or mode
What is the preferred measure of central tendency for the following measurement scale:
Ratio (skewed)
Median
What is the variance?
What units is is measured in?
The variance gives an indication as to the amount the items in the data set vary from the mean.
It is a measure of dispersion that describes the relative distance between the data points in the set and the mean of the data set
Measured in units squared.
What is Standard Deviation?
How is it calculated, and what does it mean if this is low or high?
Standard deviation is a statistical measure that quantifies the amount of variation or dispersion in a set of values i.e it quantifies scatter.
It is calculated by taking the square root of the variance, which itself measures how far each number in the set is from the mean (or average) and thus from every other number in the set. (units the same as original units)
A low standard deviation indicates that the values tend to be close to the mean of the set, while a high standard deviation indicates that the values are spread out over a wider range.
Note: it can never be 0, it is affected by outliers, would be 0 if all values the same, uses same units as original data
What % of values lie between the following SD above and below the mean:
- 1
- 2
- 3
1 - 68.2%
2 - 95.4%
3 - 99.7%