Single Imputation Methods Flashcards
Describe complete case analysis (note not an imputation method)
- excludes data for any case that has one or more missing value
- data treated as if cases with missing values simply aren’t there
When is CCA good?
Can produce unbiased estimates of regression coefficients provided missingness is a function of a prediction variable and not a response variable
Describe available case analysis (note not an imputation method)
- (X1, X2) data matrix where only X2 is subject to missingness
- all cases would be used for mean and variance of X1, but only complete cases contribute to estimates for X2 and covariance of X1 and X2
Problems with ACA?
- not clear which sample size should be used to calculate standard errors
- estimates can be biased if data not MCAR
What is the main issue with single imputation methods?
Standard errors of estimates tend to be too low, hence single imputation is not recommended
Describe mean imputation
- each missing value is imputed by overall mean of observed values for that variable
- can use mode for categorical data
Why is mean imputation worse than CCA?
Reduces variability of the data
Describe conditional mean imputation
- replaces missing value with a predicted conditional mean from a regression equation
- first step is to fit a regression model, where the variable with the missing values is regressed on the observed ones, then find predictions for missing values
Issue with conditional mean imputation?
There is no random error
Describe stochastic regression imputation
- regression model gets added error term, which is a random value from N(0 , σhat^2)
Does stochastic regression imputation fix the issue of attenuating standard errors?
No
Describe hot deck imputation
- imputes values from “similar” individuals
- replaces the missing value with a random draw from a subsample of individuals that have similar values on a set of matching variables
What is the issue with hot deck imputation
Can produce substantially biased estimates of correlations and regression coefficients, and under estimates standard errors
Describe last observation carried forward
- in longitudinal studies, imputes missing value with its value at a previous time point
- assumes that values do not change after the last observed measurement or during the intermittent period where scores are missing
Issues with last observation carried forward?
May produce biased results even when MCAR