Week 1 Data screening and missing values Flashcards

Question 1

Q

What is the difference between MCAR, MAR and MNAR?

Answer

A

Missing completely at random – missing and not dependent on anything.

Missing at Random – Missing, has a rate generally due to circumstances (male vs. female, SES etc.

Missing not at random – There is an identified reason it is missing (embarrassment etc.)

Question 2

Q

Dealing with missing values: What are the pros and cons of row deletion?

Answer

A

Pro - Very simple

Con – May not be by random, and if its not it can introduce bias (end up with more males than females for example).

Question 3

Q

Dealing with missing values: What are the pros and cons of mean/median imputation?

Answer

A

Pro – simple

Cons – Artificially reduce variability

Question 4

Q

What is cohen’s d? and the effect sizes?

Answer

A

Cohen’s d is a measurement of the effect size that indicates the meaning of the relationship between variables. 0.20 = small, 0.50 = medium, 0.80 = large

Question 5

Q

Power (ability of a test to find an effect when it actually exists) can be affected by what 3 things?

Answer

A

Sample size
Effect size (cohen’s d)
Alpha (p value)

Question 6

Q

What is p value?

Answer

A

It is a measure of the probability that an observed difference could have occurred just by random chance. Alpha (p value) increases, power increases. For example, p < .05 has more power than p < .001 as there is more chance of detecting an effect or relationship.

Question 7

Q

How can you screen for out of range/miscoded data?

Answer

A

Frequency distribution table.

Question 8

Q

How do you identify a univariate outlier?

Answer

A

Calculate the z score. ‘Z’ scores have cut-off points that correspond to ‘p’ values and you can define someone as being a statistical outlier if their ‘z’ score is outside p < .001 (Z=+/-3.29) or p < .05 (Z=+/-1.96).

Question 9

Q

Decisions relating to univariate outliers depend on…?

Answer

A

patterns of answers to other variables
expectations that arise from your knowledge of the area (past research and theory).
sample size.
statistical technique you intend to use.

Question 10

Q

What is system missing data?

Answer

A

System missing data is where you find a blank, or perhaps a dot, in the cell where someone has not provided a response.

Question 11

Q

What is discrete missing data?

Answer

A

Discrete missing data is where you give SPSS a value for the system to help determine why it is missing.

Question 12

Q

What are the two main reasons that data can be missing?

Answer

A

Random reason – Random reasons are those reasons which are different for each respondent (accidentally missed a q, dropped out of study, ran out of time, didn’t know the answer).
Systematic reason - Systematic reasons occur when more than one person missed responding, but for the same reason. Those reasons effect some or all respondents systematically (unable to answer q, Q is inappropriate, problems with the Q, equipment failure).

Question 13

Q

What can we do once we understand the missing values?

Answer

A

Delete cases (listwise)
Delete variable (with many MV’s)
Replace them with something else.

Question 14

Q

What are some popular methods for replacing MV’s in MCAR?

Answer

A

Mean
Expectation imputation
Regression imputation
Full information maximum likelihood.

Question 15

Q

What does the univariate output tell us?

Answer

A

Gives a rough count of missing variables.

Question 16

Q

What does the ‘separate variance T-test’ output tell us?

Answer

Study These Flashcards

A

For each variable, this puts all the missing values in one group and all the present values in another, then compares these groups on the other variables in the file.

Question 17

Q

What does the ‘missing and tabulated patterns’ output tell us?

Answer

Study These Flashcards

A

This gives you a list of cases with missing values. An “X” on this table indicates missing data.

Question 18

Q

What does the Little MCARS EM estimated statistics output tell us?

Answer

Study These Flashcards

A

These are the computed means, SDs, etc. based on the available data and again based on SPSS’s “best guess” for what the missing values might have been. A non-significant test means the data is plausibly missing completely at random (MCAR). A significant test (p < .05) means that the missingness has a pattern. When there is a pattern to the missingness, the results of your analysis may be biased.

Week 1 Data screening and missing values Flashcards

(18 cards)