Week 12 - Missing Data Analysis Flashcards
List wise deletion
Dropping participant from the analysis who don’t have complete scores on all the variables in the model
Need nonmissing scores on all variables
It reduces the sample size and makes it harder to find a significant effect
Lower power, waste data and exacerbate bias ( especially when data is nonignorable)
Balanced vs unbalanced data
Balanced design - same number of cells in the analysis
- make computation easier
Why are scores missing initially?
Participant factor - mortality, attrition ( in longitudinal design)
Experimenter factor - clerical error, malfunction
Balance between not coercing people into giving answers and making it to easy to respond
What are old approaches of missing data
List wise deletion Pairwise deletion Mean substitution ( mean imputation) Regression imputation Last value carried forward
Pairwise deletion
Only available for correlation and factor analysis
Use all cases available for each pair of variables
Regression Imputation
Replace missing data with predicted score from regression based on all available cases
Standard error too small
Last value carried forward
No longer valid
Approach to longitudinal design
Attrition (drop out) lose data point ( if drop out of the third wave, 2nd wave score will replace third wave)
Intention to treat analysis
Problems with old approaches
Underestimating error variance
SE too small
CI too narrow
Type 1 error too high
Previous approach to missingness
Lessen the impact of missingness (nuisance factor)
Rubin and little approach to missingness
Estimate missingness statistically
Mechanism of missingness is important
Types of missingness
Ignorable and nonignorable
Ignorable - fewer constraints on type of analysis, reduced bias however still have problems with power (problems with precision)
Non ignorable - listwise deletion will lead to problems with bias and precision
Three Types of missing data
MCAR - Missing completely at random
MAR - Missing at random
MNAR - Missing not at random
MCAR
Ignorable
Probability of it being missing on a given variable is not conditional on itself or on other variables in the data set
Cause of missingness completely outside of data
MAR
Ignorable
Probability of being missing on given variable not conditional on itself but IS conditional on other variables in the data set
eg. older people less likely to respond to question on sex
MNAR
Non-ignorable
Probability of being missing on given variable is conditional on itself , missingness predicted what would have been said
eg. embarrassed to answer question because of what it would have been (often effect the outcome variable)
Can lead to big bias
Problematic when trying to estimate population prevalence of behaviour or state (people who are too sick, too drunk are missing from the analysis because of what they would have answered)