Week 6 Missing Data Flashcards

Question 1

Q

What am I meant to know this week?

Answer

A

 Understand how missing data can be problematic and how they can be addressed
 Identify the three common classifications of missing data and how they differ
 Understand how multiple imputation (MI) is one robust way to deal with missing data and the general
steps involved in this process
 What are the different sources of uncertainty related to MI
 How to examine proportion and patterns of missing data visually using plots
 What is convergence and how is it relevant to MI
 Understand how to analyse MI datasets and interpret their pooled results

Question 2

Q

What are MAR/MNAR/MCAR?

Answer

A

The three common classifications of missing data

Question 3

Q

What is MCAR (missing completely at random)?

Answer

A

When the reason or mechanism for missingness is *completely independent* of whatever our estimate is.

E.g looking at parameter of interest - so missingness is independent of our outcome of interest

However, when it is missing NOT at random (MNAR) then the missingness mechanism is associated with out mechanism - the missingness IS dependent on the unobserved values. So that missing data is important.

Question 4

Q

Why are assumptions for missing data important - important to get them right?

Answer

A

Beause in order to make inferences about our results, then we have to pick right assumptions

But we can NEVER test the validity of the assumptions - no empirical way to determine which missing data assumption is correct. So spend time at beginning analyses thinking about assumptions and justifying

Question 5

Q

How do we explore the assumptions - what type of analysis?

Answer

A

Sensitivity analysis e.g impultation vs. listwise deletion

Question 6

Q

What does missingness mean?

Answer

A

Just whether or not data is missing. The characterisitc.

Question 7

Q

If just MAR.. missing at random..

Answer

A

THe reason for missingness is CONDITIONALLY independent of estimate - which means when missingness is independent of unobserved values AKA it could just depend only on the values of variable X that we were able to collect.

But remember, the big differentiation here is when it’s MNAR. This means when the msisigness mechanism IS associated witht he estimate. So the missigness IS dependent on the unobserved values

This is a problem as we don’t know what those are! If reaosn for missingness has to do with what those values would be if we knew them… problem. ALso that would be an assumption.

Question 8

Q

Say a dog eats any homework and doesnt care what kind of homework it is, what is that?

Question 9

Q

If the dog didnt care about attribute of homework itself but attribute of something related, like the fact it was a students homeowork

Answer

A

MAR. It’s still at random because dog cares its a students homework

Question 10

Q

If the dog wanted to eat BAD homework, the homework is missing, we cant tell if good or bad, but we assume dog only eats bad homework.. what is this?

Answer

A

Missingness of homewokr is related to the VALUE of the homework if we had observed it. But we hadn’t, its missing, So this is MNAR

Question 11

Q

How do we go about asking ourselves about missing data in our data set?

Answer

A

First look at your data. What variables in data is missing?
Is there any theoretrical reason for why the FACT they are missing for some participants might be related to what the missing values COULD be?
3.

Question 12

Q

If the data is missing (MAR) can you recover unbiased estimates doing listwise deletion?

Answer

A

No listwise delete. If we use complete cases, data WILL be biased and it WILL matter if we use this. So with MAR it is possible to recover unbiased estimates if the right other variables are present. SO need to impute the data awith MAR

Question 13

Q

If the data is missing not at random (MNAR) can you recover unbiased estimates doing listwise deletion?

Answer

A

No

With MCAR you can listwise delete and wouldnt matter but youd have lower power and ethical issue throwing out data.

Question 14

Q

If the data is missing not at random (MNAR) can you recover unbiased estimates doing listwise deletion?

Answer

A

No. Just don’t want to use complete case wise deletion as data will be biased.

Can recover but the right other variables must be present. Some other combo of variables that we condition properly on allow for us to be unbiased.

Question 15

Q

What are the general steps involved in the MI process

Answer

A

Start with the incomplete data (the raw dataset with missing data).
Generate m datasets with no missingness, by filling in different plausible values for any missing data. We will discuss this more later.
Perform the analysis of interest on each imputed dataset. That is, the analysis of interest is repeated m times. This generates m different Q^ estimates and associated standard errors.
Pool the results from the analyses run on each imputed dataset to generate an overall estimate, Q¯.

Question 16

Q

 What are the different sources of uncertainty related to MI

Answer

A

The uncertainty in the estimate of our statistic of interest from multiply imputed data, Q¯ is due to:

(1) sampling variation V¯,
(2) uncertainty in how we imputed the missing data B, and
(3) uncertainty because we did not generate an infinite number of imputed datasets, Bm.

Question 17

Q

 What are the different sources of uncertainty related to MI

Answer

A

The uncertainty in the estimate of our statistic of interest from multiply imputed data, Q¯ is due to:

(1) sampling variation V¯,
(2) uncertainty in how we imputed the missing data B, and
(3) uncertainty because we did not generate an infinite number of imputed datasets, Bm.

Question 18

Q

What is the between-imputed dataset variation (equation) doing?

Answer

A

B captures the variance in the estimates, Q^. Because the observed, non-missing data never change from one imputed dataset to another, the only reason that Q^ will change is when the plausible values imputed for the missing data change. Thus B is an estimate of the uncertainty in Q¯ due to missing data.

Question 19

Q

What is the overall variance estimate (uncertainty estimate) (equation) doing?

Answer

A

V¯ is the average uncertainty estimate of Q^ across the multiply imputed datasets.

Question 20

Q

What is the overall estimate equation showing?

Answer

A

This is simply the average of the Q^ estimates from each individual imputed dataset. What is more unique and forms an important difference between multiple imputation and other simpler methods (e.g., single mean imputation) is the variance estimate.

Question 21

Q

How does overall variance estimate differ to overal estimate?

Answer

A

Overal variance estimate is talking about variance but the overall estimate is looking at the coefficient.

Question 22

Q

In this equation, which is the FIXED and RANDOM intercept?

Answer

A

The 1 + is the fixed

The 1| ID is the random