Week 6 Missing Data Flashcards
What am I meant to know this week?
Understand how missing data can be problematic and how they can be addressed
Identify the three common classifications of missing data and how they differ
Understand how multiple imputation (MI) is one robust way to deal with missing data and the general
steps involved in this process
What are the different sources of uncertainty related to MI
How to examine proportion and patterns of missing data visually using plots
What is convergence and how is it relevant to MI
Understand how to analyse MI datasets and interpret their pooled results
What are MAR/MNAR/MCAR?
The three common classifications of missing data
What is MCAR (missing completely at random)?
When the reason or mechanism for missingness is *completely independent* of whatever our estimate is.
E.g looking at parameter of interest - so missingness is independent of our outcome of interest
However, when it is missing NOT at random (MNAR) then the missingness mechanism is associated with out mechanism - the missingness IS dependent on the unobserved values. So that missing data is important.
Why are assumptions for missing data important - important to get them right?
Beause in order to make inferences about our results, then we have to pick right assumptions
But we can NEVER test the validity of the assumptions - no empirical way to determine which missing data assumption is correct. So spend time at beginning analyses thinking about assumptions and justifying
How do we explore the assumptions - what type of analysis?
Sensitivity analysis e.g impultation vs. listwise deletion
What does missingness mean?
Just whether or not data is missing. The characterisitc.
If just MAR.. missing at random..
THe reason for missingness is CONDITIONALLY independent of estimate - which means when missingness is independent of unobserved values AKA it could just depend only on the values of variable X that we were able to collect.
But remember, the big differentiation here is when it’s MNAR. This means when the msisigness mechanism IS associated witht he estimate. So the missigness IS dependent on the unobserved values
This is a problem as we don’t know what those are! If reaosn for missingness has to do with what those values would be if we knew them… problem. ALso that would be an assumption.
Say a dog eats any homework and doesnt care what kind of homework it is, what is that?
MCAR
If the dog didnt care about attribute of homework itself but attribute of something related, like the fact it was a students homeowork
MAR. It’s still at random because dog cares its a students homework
If the dog wanted to eat BAD homework, the homework is missing, we cant tell if good or bad, but we assume dog only eats bad homework.. what is this?
Missingness of homewokr is related to the VALUE of the homework if we had observed it. But we hadn’t, its missing, So this is MNAR
How do we go about asking ourselves about missing data in our data set?
- First look at your data. What variables in data is missing?
- Is there any theoretrical reason for why the FACT they are missing for some participants might be related to what the missing values COULD be?
3.
If the data is missing (MAR) can you recover unbiased estimates doing listwise deletion?
No listwise delete. If we use complete cases, data WILL be biased and it WILL matter if we use this. So with MAR it is possible to recover unbiased estimates if the right other variables are present. SO need to impute the data awith MAR
If the data is missing not at random (MNAR) can you recover unbiased estimates doing listwise deletion?
No
With MCAR you can listwise delete and wouldnt matter but youd have lower power and ethical issue throwing out data.
If the data is missing not at random (MNAR) can you recover unbiased estimates doing listwise deletion?
No. Just don’t want to use complete case wise deletion as data will be biased.
Can recover but the right other variables must be present. Some other combo of variables that we condition properly on allow for us to be unbiased.
What are the general steps involved in the MI process
- Start with the incomplete data (the raw dataset with missing data).
- Generate m datasets with no missingness, by filling in different plausible values for any missing data. We will discuss this more later.
- Perform the analysis of interest on each imputed dataset. That is, the analysis of interest is repeated m times. This generates m different Q^ estimates and associated standard errors.
- Pool the results from the analyses run on each imputed dataset to generate an overall estimate, Q¯.