Stats - measures of central tendency and dispersion Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

What is the difference between descriptive vs inferential statistics?

A

Descriptive statistics are used to describe the basic features of the data in a study. They are typically distinguished from inferential statistics which help to form conclusions beyond the immediate data. Descriptive statistics help us to simplify data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the 3 main measures of central tendency?

A

Mean
Median
Mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the median?

A

The median is the middle item in a data set that has been arranged in numerical order. It may be an actual item in the data set or it may be an item that needs to be calculated. It is not affected by outliers

It is calculated by arranging the items in order then selecting the item whereby half the items are above and half are below. For example in the data set below, the item 3 (in bold) is the median value

1, 3, 3, 4, 5

In cases where there are an even number of items the median is half way between the middle two items. For example in the data set below, the median is half way between 3 and 4 which is 3.5

1, 3, 3, 4, 5, 6

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the mode?

A

Used to summarise categorical data set, the mode is the most frequent item in a data set.

In some data sets there may be two modes or more. For example see the data set below:

1, 1, 1, 2, 2, 2, 3, 4

The modal values in this case would be both 1 and 2 (it is bimodal / multimodal)

Similarly, in some data sets there may be no mode when all the values appear with similar frequency, see below:

0, 1, 2, 3, 4, 5, 6

The mode is not used as much for continuous variables because with this type of variable, it is likely that no value will appear more than once (e.g. if you ask 20 people their personal income in the previous year, it’s possible that many will have amounts of income that are very close, but that you will never get exactly the same value for two people).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the mean?

A

The mean is calculated by adding all the items of a data set together and dividing by the number of items.

For example in the following data set

1, 2, 2, 2, 3

mean = (1 + 2 + 2 + 2 + 3) / 5
mean = 10 / 5
mean = 2

Unlike the median or the mode, the mean is sensitive to a change in any value of the data set. The mean is sensitive to outliers and skewed data.

Note: this is the arithmetic mean (as opposed other means such as the geometric, harmonic, and generalised means

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the preferred measure of central tendency for the following measurement scale:

Categorical

A

Mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the preferred measure of central tendency for the following measurement scale:

Nominal

A

Mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the preferred measure of central tendency for the following measurement scale:

Ordinal

A

Median/ mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the preferred measure of central tendency for the following measurement scale:
Interval (Normal distribution)

A

Mean (preferable), median or mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the preferred measure of central tendency for the following measurement scale:
Interval (skewed data)

A

Median

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the preferred measure of central tendency for the following measurement scale:
Ratio (normal distribution)

A

Mean (preferable), median or mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the preferred measure of central tendency for the following measurement scale:
Ratio (skewed)

A

Median

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the variance?
What units is is measured in?

A

The variance gives an indication as to the amount the items in the data set vary from the mean.

It is a measure of dispersion that describes the relative distance between the data points in the set and the mean of the data set

Measured in units squared.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is Standard Deviation?

How is it calculated, and what does it mean if this is low or high?

A

Standard deviation is a statistical measure that quantifies the amount of variation or dispersion in a set of values i.e it quantifies scatter.

It is calculated by taking the square root of the variance, which itself measures how far each number in the set is from the mean (or average) and thus from every other number in the set. (units the same as original units)

A low standard deviation indicates that the values tend to be close to the mean of the set, while a high standard deviation indicates that the values are spread out over a wider range.

Note: it can never be 0, it is affected by outliers, would be 0 if all values the same, uses same units as original data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What % of values lie between the following SD above and below the mean:
- 1
- 2
- 3

A

1 - 68.2%
2 - 95.4%
3 - 99.7%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the co-efficient of variation?

A

The coefficient of variation is defined as the ratio of standard deviation to mean, often expressed as a percentage. This measure provides a relative understanding about data variability regardless of units or scale.

17
Q

What are 2 issues with variance, and hence why it can be better to use SD?

A

First, because the deviations of scores from the mean are ‘squared’ (this is done to deal with the negative values), this gives more weight to extreme scores. If our data contains outliers, this can give undo weight to these scores.

Secondly, the variance is not in the same units as the scores in our data set (variance is measured in the units squared). This means we cannot place it on our frequency distribution and cannot directly relate its value to the values in our data set.

Calculating the standard deviation rather than the variance rectifies this problem.

18
Q

What is the standard error of the mean (SEM)?

How is it calculated?

A

The standard error of the mean is an inferential statistic used to estimate the population mean i.e it sees how well you know the TRUE mean of the population

It is always smaller than the SD

It is a measure of the spread expected for the mean of the observations - i.e. how ‘accurate’ the calculated sample mean is from the true population mean

SEM = s / square root (n)

s = standard deviation of the sample mean
n = sample size

19
Q

What is the confidence interval?

A

Sample results, such as the mean value, are typically accompanied by a confidence interval. This interval indicates the precision of the sample estimate and suggests a range in which the true population mean is likely to be found with a certain probability.

It is expressed as a %

Note:
It’s crucial to understand that this does not mean that the mean from all possible samples will be within this interval 95% of the time. Instead, it means that if numerous samples were taken and a 95% confidence interval was calculated for each, about 95% of those intervals would encompass the true population mean.

20
Q

How is the confidence interval calculated?

A

Mean +/- (Critical Value x SEM)

Note: critical value for 95% confidence interval is 1.96 with a large sample size

21
Q

Standard Deviation, or Standard Error of the Mean?

1) Quantifies scatter (how much the data varies)
2) quantifies how precisely you know the mean
3) gets smaller as samples get larger
4) uses the same units as the data
5) Can’t be larger than the other
6) Takes into account the other one, plus the sample size

A

1) SD
2) SEM
3) SEM
4) Both
5) SEM can’t be larger than SD
6) SEM takes into account SD

22
Q

What is the interquartile range and what does the middle of this represent?

A

The Interquartile range (also called the mid spread) is equal to the difference between the 3rd and 1st quartiles.

The MEDIAN is the middle of this

23
Q
A