Week 1 Data screening and missing values Flashcards

1
Q

What is the difference between MCAR, MAR and MNAR?

A

Missing completely at random – missing and not dependent on anything.

Missing at Random – Missing, has a rate generally due to circumstances (male vs. female, SES etc.

Missing not at random – There is an identified reason it is missing (embarrassment etc.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Dealing with missing values: What are the pros and cons of row deletion?

A

Pro - Very simple

Con – May not be by random, and if its not it can introduce bias (end up with more males than females for example).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Dealing with missing values: What are the pros and cons of mean/median imputation?

A

Pro – simple

Cons – Artificially reduce variability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is cohen’s d? and the effect sizes?

A

Cohen’s d is a measurement of the effect size that indicates the meaning of the relationship between variables. 0.20 = small, 0.50 = medium, 0.80 = large

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Power (ability of a test to find an effect when it actually exists) can be affected by what 3 things?

A

Sample size
Effect size (cohen’s d)
Alpha (p value)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is p value?

A

It is a measure of the probability that an observed difference could have occurred just by random chance. Alpha (p value) increases, power increases. For example, p < .05 has more power than p < .001 as there is more chance of detecting an effect or relationship.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How can you screen for out of range/miscoded data?

A

Frequency distribution table.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do you identify a univariate outlier?

A

Calculate the z score. ‘Z’ scores have cut-off points that correspond to ‘p’ values and you can define someone as being a statistical outlier if their ‘z’ score is outside p < .001 (Z=+/-3.29) or p < .05 (Z=+/-1.96).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Decisions relating to univariate outliers depend on…?

A
  1. patterns of answers to other variables
  2. expectations that arise from your knowledge of the area (past research and theory).
  3. sample size.
  4. statistical technique you intend to use.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is system missing data?

A

System missing data is where you find a blank, or perhaps a dot, in the cell where someone has not provided a response.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is discrete missing data?

A

Discrete missing data is where you give SPSS a value for the system to help determine why it is missing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the two main reasons that data can be missing?

A

Random reason – Random reasons are those reasons which are different for each respondent (accidentally missed a q, dropped out of study, ran out of time, didn’t know the answer).
Systematic reason - Systematic reasons occur when more than one person missed responding, but for the same reason. Those reasons effect some or all respondents systematically (unable to answer q, Q is inappropriate, problems with the Q, equipment failure).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What can we do once we understand the missing values?

A
Delete cases (listwise)
Delete variable (with many MV’s)
Replace them with something else.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are some popular methods for replacing MV’s in MCAR?

A

Mean
Expectation imputation
Regression imputation
Full information maximum likelihood.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does the univariate output tell us?

A

Gives a rough count of missing variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does the ‘separate variance T-test’ output tell us?

A

For each variable, this puts all the missing values in one group and all the present values in another, then compares these groups on the other variables in the file.

17
Q

What does the ‘missing and tabulated patterns’ output tell us?

A

This gives you a list of cases with missing values. An “X” on this table indicates missing data.

18
Q

What does the Little MCARS EM estimated statistics output tell us?

A

These are the computed means, SDs, etc. based on the available data and again based on SPSS’s “best guess” for what the missing values might have been. A non-significant test means the data is plausibly missing completely at random (MCAR). A significant test (p < .05) means that the missingness has a pattern. When there is a pattern to the missingness, the results of your analysis may be biased.