Multiple Imputation Flashcards

Question 1

Q

What does multiple imputation allow that single imputation doesn’t?

Answer

A

Allows investigator to obtain valid assessments of uncertainty

Question 2

Q

Basic idea of multiple imputation?

Answer

A

Impute each missing value several times, thus creating M>1 complete data sets

Question 3

Q

Draw the schematic for multiple imputation

Answer

A

see notes

Question 4

Q

Outline the three steps in multiple imputation in as much detail as possible

Answer

A

1)
- create M copies of incomplete data set
- use an appropriate method to impute missing values in each copy (same method for each copy)
- imposed data sets are composed of fixed proportion (observed data) and a missing proportion (imputed values)
- each copy will be different
2)
- for each complete copy of data, carry out statistical analysis as you would if no missing data
- store parameter estimates and variances (or variance-covariance matrix if more than one parameter)
- estimate of θ obtained by m-th complete data set is θhat(m) and estimated variance by U(m)
3)
- results of M analyses are combined into single analysis that takes into account the imputation

Question 5

Q

Give the combined estimate of θ

Answer

A

θhat(MI) = 1/M * sum θhat(m)

Question 6

Q

Give the between imputation variability

Answer

A

B = 1/(M-1) ( sum [ (θhat(m) - θhat(MI) ) ^2 ] )

Question 7

Q

Give the within imputation variability

Answer

A

Wbar = 1/M * sum [ U(m) ]

Question 8

Q

Give overall variability in multiple imputation

Answer

A

Vmi = Wbar + B + B/M

Question 9

Q

How do you find the (1-α)100% confidence interval for multiple imputation

Answer

A

θhatMI ± tv(α/2) * sqrt(Vmi)

Question 10

Q

Summarise where the variability in θhatMI comes from

Answer

A

Wbar, variance since we’re taking a sample
B, extra variance since missing values in the sample
3, B/M, extra simulation variance caused by the fact θhatMI itself is estimated for finite M

Question 11

Q

What are traditional choices for M?

Answer

A

3, 5 or 10

Question 12

Q

What is the relative efficiency of using M samples?

Answer

A

M / (M + λ), where λ is the fraction of missing data