Week 6 Missing Data Flashcards

1
Q

What am I meant to know this week?

A

 Understand how missing data can be problematic and how they can be addressed
 Identify the three common classifications of missing data and how they differ
 Understand how multiple imputation (MI) is one robust way to deal with missing data and the general
steps involved in this process
 What are the different sources of uncertainty related to MI
 How to examine proportion and patterns of missing data visually using plots
 What is convergence and how is it relevant to MI
 Understand how to analyse MI datasets and interpret their pooled results

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are MAR/MNAR/MCAR?

A

The three common classifications of missing data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is MCAR (missing completely at random)?

A

When the reason or mechanism for missingness is *completely independent* of whatever our estimate is.

E.g looking at parameter of interest - so missingness is independent of our outcome of interest

However, when it is missing NOT at random (MNAR) then the missingness mechanism is associated with out mechanism - the missingness IS dependent on the unobserved values. So that missing data is important.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Why are assumptions for missing data important - important to get them right?

A

Beause in order to make inferences about our results, then we have to pick right assumptions

But we can NEVER test the validity of the assumptions - no empirical way to determine which missing data assumption is correct. So spend time at beginning analyses thinking about assumptions and justifying

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do we explore the assumptions - what type of analysis?

A

Sensitivity analysis e.g impultation vs. listwise deletion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does missingness mean?

A

Just whether or not data is missing. The characterisitc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

If just MAR.. missing at random..

A

THe reason for missingness is CONDITIONALLY independent of estimate - which means when missingness is independent of unobserved values AKA it could just depend only on the values of variable X that we were able to collect.

But remember, the big differentiation here is when it’s MNAR. This means when the msisigness mechanism IS associated witht he estimate. So the missigness IS dependent on the unobserved values

This is a problem as we don’t know what those are! If reaosn for missingness has to do with what those values would be if we knew them… problem. ALso that would be an assumption.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Say a dog eats any homework and doesnt care what kind of homework it is, what is that?

A

MCAR

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

If the dog didnt care about attribute of homework itself but attribute of something related, like the fact it was a students homeowork

A

MAR. It’s still at random because dog cares its a students homework

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

If the dog wanted to eat BAD homework, the homework is missing, we cant tell if good or bad, but we assume dog only eats bad homework.. what is this?

A

Missingness of homewokr is related to the VALUE of the homework if we had observed it. But we hadn’t, its missing, So this is MNAR

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do we go about asking ourselves about missing data in our data set?

A
  1. First look at your data. What variables in data is missing?
  2. Is there any theoretrical reason for why the FACT they are missing for some participants might be related to what the missing values COULD be?
    3.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

If the data is missing (MAR) can you recover unbiased estimates doing listwise deletion?

A

No listwise delete. If we use complete cases, data WILL be biased and it WILL matter if we use this. So with MAR it is possible to recover unbiased estimates if the right other variables are present. SO need to impute the data awith MAR

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

If the data is missing not at random (MNAR) can you recover unbiased estimates doing listwise deletion?

A

No

With MCAR you can listwise delete and wouldnt matter but youd have lower power and ethical issue throwing out data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

If the data is missing not at random (MNAR) can you recover unbiased estimates doing listwise deletion?

A

No. Just don’t want to use complete case wise deletion as data will be biased.

Can recover but the right other variables must be present. Some other combo of variables that we condition properly on allow for us to be unbiased.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the general steps involved in the MI process

A
  1. Start with the incomplete data (the raw dataset with missing data).
  2. Generate m datasets with no missingness, by filling in different plausible values for any missing data. We will discuss this more later.
  3. Perform the analysis of interest on each imputed dataset. That is, the analysis of interest is repeated m times. This generates m different Q^ estimates and associated standard errors.
  4. Pool the results from the analyses run on each imputed dataset to generate an overall estimate, Q¯.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

 What are the different sources of uncertainty related to MI

A

The uncertainty in the estimate of our statistic of interest from multiply imputed data, Q¯ is due to:

(1) sampling variation V¯,
(2) uncertainty in how we imputed the missing data B, and
(3) uncertainty because we did not generate an infinite number of imputed datasets, Bm.

17
Q

 What are the different sources of uncertainty related to MI

A

The uncertainty in the estimate of our statistic of interest from multiply imputed data, Q¯ is due to:

(1) sampling variation V¯,
(2) uncertainty in how we imputed the missing data B, and
(3) uncertainty because we did not generate an infinite number of imputed datasets, Bm.

18
Q

What is the between-imputed dataset variation (equation) doing?

A

B captures the variance in the estimates, Q^. Because the observed, non-missing data never change from one imputed dataset to another, the only reason that Q^ will change is when the plausible values imputed for the missing data change. Thus B is an estimate of the uncertainty in Q¯ due to missing data.

19
Q

What is the overall variance estimate (uncertainty estimate) (equation) doing?

A

V¯ is the average uncertainty estimate of Q^ across the multiply imputed datasets.

20
Q

What is the overall estimate equation showing?

A

This is simply the average of the Q^ estimates from each individual imputed dataset. What is more unique and forms an important difference between multiple imputation and other simpler methods (e.g., single mean imputation) is the variance estimate.

21
Q

How does overall variance estimate differ to overal estimate?

A

Overal variance estimate is talking about variance but the overall estimate is looking at the coefficient.

22
Q

In this equation, which is the FIXED and RANDOM intercept?

A

The 1 + is the fixed

The 1| ID is the random