Bootstrapping Flashcards
1
Q
what is bootstrapping?
A
any test/metric that uses random sampling with replacement
2
Q
What is the empirical distribution function?
A
the distribution function associated with the empirical measure of a sample
3
Q
What is resampling?
A
any method for:
- estimating the precision of sample statistics (medians, variances, perecentiles) by using subsets of data (jackknifing) or drawing randomly (bootstrapping)
- validating models using random subsets (bootstrapping, cross validation)
- exchange labels on data points (for significance tests) = permutation tests
4
Q
intuition for bootstrap
A
- infer info about a population by resampling the sample data
- the ‘population’ is the sample and the quality of inference using resampled data can be measured
5
Q
what is variability?
A
aka dispersion, scatter
- is the extent to which a distribution is stretched or squeezed
- measures: variance, std deviation, interquantile range (IQR=Q3[75%] -Q1[25%]), median absolute deviation
6
Q
consistent? consistency?
A
- terms restricted to cases where the same procedure can be applied to any number of data items
7
Q
statistic/sample statistic
A
- single measure of some attribute of a sample
- calculated by applying a function (statistical algorithm) to the set of data = values of the items of the sample
8
Q
what is point estimation?
A
- use of sample data to calculate a single value (a ‘statistic’) which is to serve as the best guess/best estimate of an unknown population parameter
9
Q
recommendations for boostrap
A
- when the distribution of the statistic of interest is unknown or complex
- when the sample size for the unknown statistic is insufficient
- when power calculations have to be performed, and a small pilot sample is available
- MUST be sure that the distribution is NOT a power law/heavy tailed
10
Q
how to do bootstarp (simple case)?
A
- using MonteCarlo algorithm: resample with replacement, use the same data set size as the original, calculate the statistic of interest, repeat to increase estimate’s precision
11
Q
other bootstrap types?
A
- bayesian
- parametric
- wild
- gaussian process regression
- smooth