Week 4 - Bootstrap Flashcards
1
Q
General format for CIs
A
estimate ± quantile × se(estimate)
If we know the quantiles -> we can calculate a CI
Using CLT or knowing sampling distribution
However, Bootstrap is an alternative method if we can’t use CLT
2
Q
Bootstrap
A
- Create new datasets (bootstrap samples) by randomly picking data points from the original dataset, allowing the same point to be picked more than once (with replacement)
- Each new dataset will have the same size as the original one
- Repeat this process many times to get approximate sampling distribution
- We can this bootstrapping to understand uncertainty of estimate and calculate CI
3
Q
If sample size is not the same as original
A
Lead to unreliable and biased estimates
- small sample size -> large variability (bigger CI)
- large sample size -> small variability (tight CI) (not accurate)
4
Q
Why use bootstrap?
A
Bootstrap uses all of data and thus is more versatile if we have non-normal data
- When we have smaller dataset
- Non-normal distribution
- can be applied to any point estimator
- Can incorporate skewness
5
Q
Difference between simulation and bootstrap
A
Simulation starts with an assumed or known model (e.g., normal distribution, Poisson process) to generate data
While bootstrap relies on the original dataset as the only “population” available