14. Bootstrapping Flashcards

Question 1

Q

What is bootstrapping?

Answer

A

Process of resampling with replacement from the original data to generate multiple resamples of the same n as the original data

Question 2

Q

What are assumption violations?

Answer

A

Make it difficult to draw conclusions from linear models which can impact estimates or inferences

Question 3

Q

What are two possible explanations behind assumption violations?

Answer

A

Model misspecification - Violated as model is not correct

Failed to include interaction
Failed to include a non-linear (higher-order) effect

Non-linear transformations of outcome and/or predictors

Often related to non-normal residuals and non-linearity
Can be helped by transformation of predictors and outcomes

Question 4

Q

What is the generalised linear model used for?

Answer

A

When outcomes are not continuous or normally distributed not because of an error in measurement but because they would not be expected to be

e.g. binary variables

Question 5

Q

What solves the issue of poor inferences that violated assumptions can lead to?

Answer

A

Bootstrapped inference - creates a more reliable building block for inferences

Question 6

Q

What is a ‘good sample’?

Answer

A

If a sample of n is drawn at random, it will be unbiased and representative of the whole population

Point estimates from these samples will be good estimated of population parameter (data that describes the entire population)

Question 7

Q

What is a sampling distribution?

Answer

A

Take a sample size form population and calculate estimate of population parameter

Doing this repeatedly creates sampling distribution

Mean of sampling distribution = Good approximation of population parameter

To quantify sampling variation = can refer to SD of sampling distribution (which is SE)

Question 8

Q

What are the two possible solutions to getting enough sense of the variability in sample estimates when collecting samples from a population? (explain both processes)

Answer

A

Theoretical solution

Collect one sample
Estimate the standard error using the formula

Bootstrap solution

Collect one sample
Mimic the act of repeated sampling from the population by repeated resampling with replacement from the original sample
Estimate the standard error using the standard deviation of the distribution of resample statistics

Question 9

Q

What is a bootstrap distribution?

Answer

A

Distribution of statistics following bootstrapping made up of each resample

Question 10

Q

How do you get a bootstrap distribution?

Answer

A

Start with an initial sample of size n.

Take k resamples (sampling with replacement) of size n, and calculate your statistic on each one.

As k→∞, the distribution of the k resample statistics begins to approximate the sampling distribution.

Question 11

Q

What size should each bootstrap sample be?

Answer

A

The same as the original n

Question 12

Q

What is bootstrap standard error?

Answer

A

Bootstrap SE = SD of bootstrap distribution

Question 13

Q

What is a confidence interval?

Answer

A

Defines plausible range for population parameter

To estimate need…

A confidence level
A measure of sampling variability (e.g. SE/bootstrap SE)

Question 14

Q

What is a % confidence interval?

Answer

A

Across repeated samples, [x]% confidence intervals would be expected to contain the true population parameter value.

So out of 100 samples, 95 would contain true population mean

This is subtly different from saying that we are 95% confident that the true mean is inside our interval. The 95% probability is related to the long-run frequencies of our intervals.

Question 15

Q

How do you calculate confidence intervals?

Answer

A

68/95/99 Rule:

Sampling distributions become normal, so there are fixed properties of normal distributions

68% density falls within 1 SD of mean
95% of density falls within 1.96 SD of mean
99.7% of density falls within 3 SD of mean

For 95%:

Lower bound = mean - 1.96*SE
Upper bound = mean + 1.96*SE

Question 16

Q

How do we compute a bootstrap distribution of any statistic (other than the mean)?

Answer

Study These Flashcards

A

We can calculate βcoefficients, R2, F-statistics etc.

In each case we generate a resample
Run the linear model
Save the statistic of interest
Repeat this K times
Generate the distribution of K statistics of interest.

Question 17

Q

How is bootstrapping done in r?

Answer

Study These Flashcards

A

Boot from the car package

Steps:

Run Model
Load car
Run Boot
See summary results
Calculate confidence interval

Question 18

Q

What does each argument in the function Boot() mean?

Answer

Study These Flashcards

A

f = statistics
R = number of bootstrap resamples
ncores = number of cores, indicates whether calculations are performed in parallel

Question 19

Q

How do you compute standard error of a bootstrap?

Answer

Study These Flashcards

A

Sigma(SD) / Square root of n

14. Bootstrapping Flashcards

(19 cards)