Data Analysis week 3 Flashcards
What is the purpose of taking samples
Making an estimation for the population value by calculation the sample value.
When is a sample random
If every element of the population has an equal chance of being included in the sample.
How do you measure if an estimation is close to the population value
By precision and bias.
What does a high precision mean
A high precision means the estimate of a population value is not that different for different samples.
What does high bias mean
High bias means the value of the estimate is very different from the population value.
What does low precision mean
Low precision means the values of the estimates of the population value are very different for different samples.
What does low bias mean
Low bias means the value of the estimate is not that different from the population value.
In the following function in R to draw samples from a dataset, what does n stand for: s <- slice_sample( diamonds, n=50 )
The number of observations from the dataset you put in one sample.
What is a sampling distribution
A series of estimates obtained from a (large) number of repeated independent samples.
What can you say about the sampling distribution if the sampling distribution is biased
The mean of the sampling distribution is different from the mean of the population (the population value).
What is standard error and what does it explain
Standard error is the standard deviation of the sampling distribution. It summarizes the precision of an estimate in one number. Calculates how likely we are to get the real value wrong.
What influence does the sample size (generally) have on bias
If the sample size increases, the bias (generally) decreases.
What influence does sample size have on precision (and variation between sample sizes)
If the sample size increases, the precision increases, because the variation between samples decreases
What influence does the sample size have on the standard error (in factors)
If the sample size increases with factor k^2, the standard error decreases with factor k
When is an estimate asymptotically biased
If the estimate gets less and less biased if the sample size increases.
For what is bootstrapping a method
Bootstrapping is a method to estimate the sampling distribution.
What does bootstrapping result in and what does this name stand for
Bootstrapping results in a bootstrap distribution and this is an estimate of the sampling distribution.
What method of sampling do you use in bootstrapping and how does this work
Sampling with replacement. You draw a sample from the population and from this sample you draw a new samples of the same sample size.
What is the 95% interval, how can you interpret this and about what does this tell us something
The 95% confidence interval is the interval between the 2.5/100 and the 97.5/100 quantiles of the bootstrap distribution. You can interpret this as: we can be 95% certain that the estimate of the bootstrap distribution is within the standard error. This tells us something about the precision of the bootstrap distribution.