Multiple Imputation Flashcards
What does multiple imputation allow that single imputation doesn’t?
Allows investigator to obtain valid assessments of uncertainty
Basic idea of multiple imputation?
Impute each missing value several times, thus creating M>1 complete data sets
Draw the schematic for multiple imputation
see notes
Outline the three steps in multiple imputation in as much detail as possible
1)
- create M copies of incomplete data set
- use an appropriate method to impute missing values in each copy (same method for each copy)
- imposed data sets are composed of fixed proportion (observed data) and a missing proportion (imputed values)
- each copy will be different
2)
- for each complete copy of data, carry out statistical analysis as you would if no missing data
- store parameter estimates and variances (or variance-covariance matrix if more than one parameter)
- estimate of θ obtained by m-th complete data set is θhat(m) and estimated variance by U(m)
3)
- results of M analyses are combined into single analysis that takes into account the imputation
Give the combined estimate of θ
θhat(MI) = 1/M * sum θhat(m)
Give the between imputation variability
B = 1/(M-1) ( sum [ (θhat(m) - θhat(MI) ) ^2 ] )
Give the within imputation variability
Wbar = 1/M * sum [ U(m) ]
Give overall variability in multiple imputation
Vmi = Wbar + B + B/M
How do you find the (1-α)100% confidence interval for multiple imputation
θhatMI ± tv(α/2) * sqrt(Vmi)
Summarise where the variability in θhatMI comes from
- Wbar, variance since we’re taking a sample
- B, extra variance since missing values in the sample
3, B/M, extra simulation variance caused by the fact θhatMI itself is estimated for finite M
What are traditional choices for M?
3, 5 or 10
What is the relative efficiency of using M samples?
M / (M + λ), where λ is the fraction of missing data