Week 12 - Missing Data Analysis Flashcards
List wise deletion
Dropping participant from the analysis who don’t have complete scores on all the variables in the model
Need nonmissing scores on all variables
It reduces the sample size and makes it harder to find a significant effect
Lower power, waste data and exacerbate bias ( especially when data is nonignorable)
Balanced vs unbalanced data
Balanced design - same number of cells in the analysis
- make computation easier
Why are scores missing initially?
Participant factor - mortality, attrition ( in longitudinal design)
Experimenter factor - clerical error, malfunction
Balance between not coercing people into giving answers and making it to easy to respond
What are old approaches of missing data
List wise deletion Pairwise deletion Mean substitution ( mean imputation) Regression imputation Last value carried forward
Pairwise deletion
Only available for correlation and factor analysis
Use all cases available for each pair of variables
Regression Imputation
Replace missing data with predicted score from regression based on all available cases
Standard error too small
Last value carried forward
No longer valid
Approach to longitudinal design
Attrition (drop out) lose data point ( if drop out of the third wave, 2nd wave score will replace third wave)
Intention to treat analysis
Problems with old approaches
Underestimating error variance
SE too small
CI too narrow
Type 1 error too high
Previous approach to missingness
Lessen the impact of missingness (nuisance factor)
Rubin and little approach to missingness
Estimate missingness statistically
Mechanism of missingness is important
Types of missingness
Ignorable and nonignorable
Ignorable - fewer constraints on type of analysis, reduced bias however still have problems with power (problems with precision)
Non ignorable - listwise deletion will lead to problems with bias and precision
Three Types of missing data
MCAR - Missing completely at random
MAR - Missing at random
MNAR - Missing not at random
MCAR
Ignorable
Probability of it being missing on a given variable is not conditional on itself or on other variables in the data set
Cause of missingness completely outside of data
MAR
Ignorable
Probability of being missing on given variable not conditional on itself but IS conditional on other variables in the data set
eg. older people less likely to respond to question on sex
MNAR
Non-ignorable
Probability of being missing on given variable is conditional on itself , missingness predicted what would have been said
eg. embarrassed to answer question because of what it would have been (often effect the outcome variable)
Can lead to big bias
Problematic when trying to estimate population prevalence of behaviour or state (people who are too sick, too drunk are missing from the analysis because of what they would have answered)
Approaches to Missing Data
Listwise deletion (MNAR) Multiple Imputation Direct maximum likelihood (Modelling approach)
Listwise deletion for MNAR
Can be used however if high proportion is missing will lead to low power
Bias unacceptable for MAR and MNAR
- Will under or over estimate the regression weights
Direct Maximum Likelihood
Use all available data and the modelling is built into the procedure
Statistical way to deal with missingness
Used in SEM and mixed effects regression
Can be used on MAR
Multiple Imputation
Fill in missing data with values that include extra random variance
Overcome barriers of using the sample mean or other variance (Impoverished scores that will lead to type 1 error - underestimation)
Can us for MAR
Steps to Multiple Imputation
1) Do regression imputation where missing scores are replaced with predicted scores from the regression on all available cases (Imputation model)
2) Add random error to the imputed score
3) repeat the process on seperate times to make (m) of these data sets
4) run desired stats on each of the (m) data sets
5) Take the model parameters of interest (M and b-weight) average them and use the SE to calculate the statistical test
Can you tell which type of missingness is present?
Can only test MCAR
- if planning to use listwise deletion
No way of telling for MAR and MNAR
- Have to use logical and theoretical understanding
What is the best way to deal with missing data?
Good clear items that have been piloted
Follow up on non-responders
Mandatory questioning
if missingness below .5 approach does not matter
-0 unlikely to make difference between parameter estimates