Statistical theory 2 Flashcards

Question 1

Q

What are we making inferences about when sampling from a population?

Answer

A

We assume there is a population

The population is an abstract concept, it is the people we want to know about

Question 2

Q

What are ‘estimated population parameters’?

How is this done?

Answer

A

Our best guess of the true population parameter based on our sample of data

This is done with an ‘estimator’

Question 3

Q

What are the characteristics of a good estimator?

Answer

A

Unbiased (they’re too low as often as they are too high)

Consistent (given enough data, they’ll eventually get the right data)

Low variance (tend to stay ‘pretty close’ to the true value)

Question 4

Q

How do you estimate the population mean?

Answer

A

Generate a random sample from the population and find the mean of that

This is the sample mean

Can only be assumed if the sample is random

Question 5

Q

In statistics, what are the three types of mean that we need to understand?

For each of them, describe:

What are their symbols?

What are they?

Do we know its value?

Question 6

Q

In statistics, what are the three types of standard deviation that we need to understand?

For each of them, describe:

What are their symbols?

What are they?

Do we know its value?

Question 7

Q

What is the sampling distribution of the Mean?

Answer

A

The population of Means from the set of samples taken from a population

If multiple experiments are done, the mean of all those experiments combined

Question 8

Q

What does SEM stand for?

Answer

A

Standard error of the Mean

Question 9

Q

Why is using the sampling distribution of the mean important?

Answer

A

It is less variable than the original distribution

Question 10

Q

As your sample size grows, the variance (i.e. uncertainty about the mean) does what?

Answer

A

Goes down

Question 11

Q

What does central limit theorem state?

Answer

A

The sampling distribution of a mean becomes normal as long as you’re averaging lots of independent things

The sampling distribution of the mean will always be normal (and its standard deviation is given by the SEM) regardless of the shape of the true distribution

This means we can calculate a confidence interval around the sample mean which covers the true mean 95% of the time

Question 12

Q

What is a confidence interval (CI)?

What is it typically?

Answer

A

A range that we’re confident covers the mean

Typically is 95%

Question 13

Q

Which one of these phrasings are correct?

Why?

Answer

A

The bottom one.

Top one implies that the ‘true mean’ is the thing you’re making probabilistic claims about. But the true mean is not a repeatable event, so frequentists can’t say this.

Question 14

Q

What is the formula for confidence intervals in R?