Week 3 - Basic Data Cleaning/Missing Data Flashcards

Question 1

Q

3 sources of missing data

Answer

A

From some participants
From some variables
From a subset of people/measures (only some participants didn’t provide a response on a particular variable)

Question 2

Q

What question is asked when there is data missing from some participants?

Answer

A

Are people who didn’t provide data somehow different from those who did provide data?

Question 3

Q

Question that is asked when there is data missing from some variables?

Answer

A

Why would people not provide data here? Is the study affected?

Question 4

Q

Question that is asked when data is missing from a subset of people/measures

Answer

A

Why would only some people withhold a response to some items?

Question 5

Q

Why care about missing data?

Answer

A

Can influence how representative that sample is of the population we wish to generalise to
1. Undermines validity
- Estimated parameters might not be equivalent to population parameters
- If estimated parameters are biased, it undermines validity in the study
- No longer a true reflection of what’s happening in the population
2. Can compromise statistical power
- Reducing sample size
- Influencing ability to correctly detect an effect if it exists

Question 6

Q

What checks are required to determine how problematic missing data might be?

Answer

A

If participants and variables have been adequately assessed and determining the pattern of missing data

Question 7

Q

Why is it a problem if some participants haven’t been adequately sampled?

Answer

A

It’s as if haven’t participated in study at all:
- Reduces power
- If due to systematic reasons, the validity of study is undermined

Question 8

Q

Why is it a problem if some variables haven’t been adequately assessed?

Answer

A

It’s as if haven’t assessed the item at all:
- Can compromise ability to address research questions (especially if on variables of interest)
- If due to systematic reasons, the validity of our study is undermined and can lead to inaccurate conclusions

Question 9

Q

What is meant by biased estimated parameters

Answer

A

e.g., absence of high scorers underestimates the mean. In addition, relationships between that variable and other variables are likely to be weakened because of restriction of range (majority of scores are low)

Question 10

Q

Pattern of missing data:
- can’t predict when score will be missing from dataset
- can’t predict what the value of datapoint would be, given that it is missing

DATA NOT MISSING SYSTEMATICALLY

Answer

A

Missing Completely At Random (MCAR)

Question 11

Q

Pattern of missing data:
- can predict when a score will be missing from dataset
- cannot predict what the value of datapoint would have been

DATA ARE MISSING SYSTEMATICALLY BUT DOESNT INTRODUCE BIAS

Answer

A

Missing At Random (MAR)

results still not generalisable because basing conclusions on majority low endorsement of masculinity, for example, but not introducing a huge amount of bias. Not getting findings completely wrong

Question 12

Q

Pattern of missing data:
- may (or may not) be able to predict when a score will be missing from dataset
- Can predict what the value of the datapoint is likely to be

DATA ARE MISSING SYSTEMATICALLY AND DOES INTRODUCE BIAS

Answer

A

Missing Not At Random (MNAR)
e.g., if they’re not giving data then they’re going to be a high scorer

Question 13

Q

How to deal with inadequately sampled participants?

Answer

A

Can delete them as basically as if didn’t participate anyway (if not attributed to systematic factors and unlikely to bias estimates of parameters)

Question 14

Q

2 types of deletion strategies when dealing with MAR and MCAR data

Answer

A

Listwise deletion
Pairwise deletion

Question 15

Q

Deletion strategy that removes any participants with any missing data from all analyses

Question 16

Q

When is listwise deletion generally only seen as appropriate

Answer

A

When missing less than 5% of data

Question 17

Q

Deletion strategy that removes participants with any missing data only from all analyses involving variables where they are missing data

Question 18

Q

Major drawback of pairwise deletion

Answer

A

Can distort population parameter estimates: by having different people in different analyses -> distorts overall picture

Question 19

Q

Review slide - substitution/estimation strategies

Question 20