Week 3 - Basic Data Cleaning/Missing Data Flashcards

1
Q

3 sources of missing data

A
  1. From some participants
  2. From some variables
  3. From a subset of people/measures (only some participants didn’t provide a response on a particular variable)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What question is asked when there is data missing from some participants?

A

Are people who didn’t provide data somehow different from those who did provide data?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Question that is asked when there is data missing from some variables?

A

Why would people not provide data here? Is the study affected?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Question that is asked when data is missing from a subset of people/measures

A

Why would only some people withhold a response to some items?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Why care about missing data?

A

Can influence how representative that sample is of the population we wish to generalise to
1. Undermines validity
- Estimated parameters might not be equivalent to population parameters
- If estimated parameters are biased, it undermines validity in the study
- No longer a true reflection of what’s happening in the population
2. Can compromise statistical power
- Reducing sample size
- Influencing ability to correctly detect an effect if it exists

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What checks are required to determine how problematic missing data might be?

A

If participants and variables have been adequately assessed and determining the pattern of missing data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Why is it a problem if some participants haven’t been adequately sampled?

A

It’s as if haven’t participated in study at all:
- Reduces power
- If due to systematic reasons, the validity of study is undermined

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Why is it a problem if some variables haven’t been adequately assessed?

A

It’s as if haven’t assessed the item at all:
- Can compromise ability to address research questions (especially if on variables of interest)
- If due to systematic reasons, the validity of our study is undermined and can lead to inaccurate conclusions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is meant by biased estimated parameters

A

e.g., absence of high scorers underestimates the mean. In addition, relationships between that variable and other variables are likely to be weakened because of restriction of range (majority of scores are low)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Pattern of missing data:
- can’t predict when score will be missing from dataset
- can’t predict what the value of datapoint would be, given that it is missing

DATA NOT MISSING SYSTEMATICALLY

A

Missing Completely At Random (MCAR)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Pattern of missing data:
- can predict when a score will be missing from dataset
- cannot predict what the value of datapoint would have been

DATA ARE MISSING SYSTEMATICALLY BUT DOESNT INTRODUCE BIAS

A

Missing At Random (MAR)

results still not generalisable because basing conclusions on majority low endorsement of masculinity, for example, but not introducing a huge amount of bias. Not getting findings completely wrong

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Pattern of missing data:
- may (or may not) be able to predict when a score will be missing from dataset
- Can predict what the value of the datapoint is likely to be

DATA ARE MISSING SYSTEMATICALLY AND DOES INTRODUCE BIAS

A

Missing Not At Random (MNAR)
e.g., if they’re not giving data then they’re going to be a high scorer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How to deal with inadequately sampled participants?

A

Can delete them as basically as if didn’t participate anyway (if not attributed to systematic factors and unlikely to bias estimates of parameters)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

2 types of deletion strategies when dealing with MAR and MCAR data

A

Listwise deletion
Pairwise deletion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Deletion strategy that removes any participants with any missing data from all analyses

A

Listwise

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

When is listwise deletion generally only seen as appropriate

A

When missing less than 5% of data

17
Q

Deletion strategy that removes participants with any missing data only from all analyses involving variables where they are missing data

A

Pairwise

18
Q

Major drawback of pairwise deletion

A

Can distort population parameter estimates: by having different people in different analyses -> distorts overall picture

19
Q

Review slide - substitution/estimation strategies

A
20
Q

Identify the pattern of missing data

A

MCAR

21
Q

Identify the pattern of missing data

A

MNAR

22
Q

Identify the pattern of missing data

A

MAR

23
Q

Identify the pattern of missing data

A

MNAR

24
Q

Identify the pattern of missing data

A

MCAR