Statistical theory 2 Flashcards
What are we making inferences about when sampling from a population?
We assume there is a population
The population is an abstract concept, it is the people we want to know about
What are ‘estimated population parameters’?
How is this done?
Our best guess of the true population parameter based on our sample of data
This is done with an ‘estimator’
What are the characteristics of a good estimator?
Unbiased (they’re too low as often as they are too high)
Consistent (given enough data, they’ll eventually get the right data)
Low variance (tend to stay ‘pretty close’ to the true value)
How do you estimate the population mean?
Generate a random sample from the population and find the mean of that
This is the sample mean
Can only be assumed if the sample is random
In statistics, what are the three types of mean that we need to understand?
For each of them, describe:
What are their symbols?
What are they?
Do we know its value?
In statistics, what are the three types of standard deviation that we need to understand?
For each of them, describe:
What are their symbols?
What are they?
Do we know its value?
What is the sampling distribution of the Mean?
The population of Means from the set of samples taken from a population
If multiple experiments are done, the mean of all those experiments combined
What does SEM stand for?
Standard error of the Mean
Why is using the sampling distribution of the mean important?
It is less variable than the original distribution
As your sample size grows, the variance (i.e. uncertainty about the mean) does what?
Goes down
What does central limit theorem state?
The sampling distribution of a mean becomes normal as long as you’re averaging lots of independent things
The sampling distribution of the mean will always be normal (and its standard deviation is given by the SEM) regardless of the shape of the true distribution
This means we can calculate a confidence interval around the sample mean which covers the true mean 95% of the time
What is a confidence interval (CI)?
What is it typically?
A range that we’re confident covers the mean
Typically is 95%
Which one of these phrasings are correct?
Why?
The bottom one.
Top one implies that the ‘true mean’ is the thing you’re making probabilistic claims about. But the true mean is not a repeatable event, so frequentists can’t say this.
What is the formula for confidence intervals in R?
ciMean()