measures of central tendency Flashcards
1
Q
measures of central tendency
A
- In fact, in our audit the 1 paper that was exactly reproducible was reproducible because they shared the R code, and the manuscript was written using quarto.
- So we could just re-run the code.
2
Q
mode
A
- most frequent value in a set of measurements
- the easiest way to spot the mode is to draw a plot and then look for the tallest bar
- a set of numbers can have more than one mode
- the mode is the only definition of typical value that works for data that is measured at the nominal/categorical level
- when it comes to truly continuous variables, such as height, the mode is often not very informative, why?
- because each value in the dataset is probably unique
- for this reason, the mode is rarely used for continuous variables measured at the interval or ratio levels
3
Q
median
A
- the middle value where half the measurements are above that value and half the measurements are below
- the easiest way to work out the median is to sort our data
- if there is two mid-points, you find the value halfway between the two
- to be able to calculate a meaningful median, the variable must be measured on at least the ordinal level
- if we had categorical data like eye colour, then it wouldn’t make sense to ask for the median between a set of four blue eyes and 3 green eyes.
4
Q
mean
A
- adding up all the values and then dividing this by the number of values
- this is what most people think of when we talk about the average
5
Q
mean vs median
A
- both the mean and the median have their advantages and disadvantages
- the mean is easier to work with from a mathematical point of view
- means taken from different samples of the same populations tend to be more similar to each other
- the mean is sensitive to extreme values in a way the median is not
6
Q
sample means and population means
A
- so far we’ve just talked about describing the typical value in a set of measurements that we have - our sample
- but we want to do with statistics is to make inferences about populations from the information that we get from samples
- if your interested in the average height of people in the UK the “easy” way to find an answer to this question is to measure all the people in the UK and then work out the average height
- If you cant, you could instead select a smaller group, or subset, of people from the UK. Measure the height of people in this group, and then try to use this information to figure out plausible values for the average height of people in the UK.
- In this example, the group you’re making claims about is the population, and then the sample is a subset of this population.
7
Q
theoretical populations
A
- We often talk about populations as if they’re a set of actually existing things that we can take our sample from - e.g., all living humans.
- But populations don’t have to be sets of actually existing things. Instead, they can be the set of possible things from which our samples can be drawn.
- Let’s say we want to collect a sample of 2 dice rolls.
- To collect our sample, we take a die and roll it twice.
- We can then work out the typical value from these rolls.
- Our sample is the set of 2 dice rolls we’ve collected, but what is our population?
- One way to think of our population is as the set of possible outcomes that could occur if we rolled the dice twice.
- If our population is all possible rolls of two dice then what is the mean of the population?
- We can easily draw out all the possible things that will happen if we roll a dice twice:
- From this, we can count up how many times we get a total 2,3,4, etc from two dice rolls.
- We’d find that 6 sequences lead to a total of 7
- A total of 7 gives a mean of 3.5
8
Q
theoretical populations 2
A
- We can work out the population mean of two dice rolls because we know something about the data generating process.
- Our samples are just a set of instances of data generated by this process.
- Applying this idea to something like the Stroop task we say that:
- Our population isn’t all living humans but all humans that might have lived, might be living now, and might be living in the future.
- Our samples are just instances of data generated by the process that goes on in peoples brains when they do the Stroop task.
- For the Stroop task we cant just work out exactly what the data generating process looks like.
- So we collect samples to try to characterise it.
9
Q
from samples to populations
A
- Our sample is the subset of the population.
- If we want to go from our sample to the population then ideally our sample mean should resemble our population mean.
- But if real life situations we don’t know the population mean, so how would we know whether our sample mean resembles it?
10
Q
a sample of samples
A
- The samples don’t always line up exactly with the population mean.
- Sometimes its higher, sometimes its lower. Sometimes its closer and sometimes its further way.
- Because we don’t know the population we’d never know whether any particular sample was close, far, higher, or lower than the population mean.
- Even though we cant say that a particular sample is close to the population, there is something else we can say.
- We can say how sample means will behave on average - the sample mean will on average be the same as the population mean.
11
Q
the average of the sample means
A
- If treat each sample mean from 50 people as a measurement.
- As we collect more samples, we average together the sample means.
- The average of the sample means will eventually be the same as the population mean.