Lecture 1 Flashcards
Why Explore Data?
- Greater effectiveness and efficiently in conducting and interpreting analysis
- Reduces ambiguity in interpretation
- Identify spurious relationships
What’s the difference between raw data and data input?
Raw Data:
- Measures should be valid and relevant
- Appropriate response scale and adequate range of responses
- Hard to ‘fix’ retrospectively
Data Imput:
- Tedious task when done manually
- Complete variable and value label tabs in the SPSS data input screen
- Data input errors have catastrophic effect on any analysis
What is MCAR? and what are some examples and criteria of it?
MCAR = Missing completely at random
Examples:
- Missed question on a questionnaire
- Equipment failure when collecting a data point.
MCAR data points don’t favor any variable:
- occurs infrequently
- having no relationship with IV’s or DV in the study
- Missingness is completely unsystematic and random
How is the MCAR assessed?
Using the Little’s MCAR chi square test
Null hypothesis (p> 0.05) is that the missing data is “missing completely at random” THIS IS GOOD!
It’s bad if p < 0.05 which suggests the missing data is not “randomly missing” which would require further investigation (MAR or MNAR)
What is MAR? and what are some examples and criteria of it?
MAR = Missing at Random
MAR data points:
- not random
- specific to sub-groups
- In individuals with specific characteristics (IV)
- DO NOT bias measurement of DV so ‘ignorable’ (Tabachnick & Fidell, 2018)
- Might influence external validity via population interference
What is an example of a MAR?
In a study of attitudes to bullying in children, children with poor reading skills may not complete study due to literacy problems
In real life can MAR assumptions be verified?
MAR assumptions cannot be verified because the information about the missing values is not available
What will happen if you exclude MAR data?
It will lead to biased estimates
What is MNAR? and what are some examples and criteria of it?
The most problematic “missingness”
MNAR data:
- Directly influences scores on the DV
- Produce bias in measurement and estimates
- An association exists between participant characteristics and DV; ‘non-ignorable’ (Tabachnick& Fiddell, 2018)
- Points cannot be easily estimated/imputed from sample data
What are some examples of MNAR?
Examples:
- Reading ability test where poor readers fail to respond to certain test items because they do not understand the text.
- Matched groups used to investigate effectiveness of intervention (face to face vs internet counselling). High dropout in one group would influence post-test scores on DV.
How can you tell whether is MAR or MNAR?
A t-test in SPSS determines if a group mean of participants with missing data on one variable (e.g. study load) differs significantly for their group mean without missing data on another variable (e.g. age)
If there are significant differences on pairs of variables (scale items), but these don’t affect the key DV, the data can be assumed to be MAR (But lacks external validity)
If the difference is found to affect the DV too, then data are MNAR (invalid results)
What is Listwise Deletion?
When would it be used?
What are some things to consider?
Listwise Deletion is a procedure that eliminates all cases with one or more missing values on the analysis variables.
It is used if the data is MCAR and missingness is <5%.
Some things to consider are that:
- It reduces power but interpretation is not affected if the sample size is large
Should question:
- What is the minimum sample size needed to run a planned analysis
- Will deletion of cases with missing values result in underpowered study?
What is Pairwise Deletion?
When would it be used?
What are some things to consider?
Pairwise Deletion is a statistical procedure that uses all available data in the data (keep the case, running analysis on only the remaining data)
It increases power, but influences error term based on sample size (less recommended)
Some things to consider:
- If there is are large amount of “missingness” on several variables you plan to use but not on variables of the most interest, consider EXCLUDING these problematic variables from analysis.
What is Data Imputation?
When would it be used?
What are some things to consider?
Data imputation is a technique in which missing data is filled in through different techniques.
This applies to the actual individual values (item) for a variable NOT THE VARIABLE TOTAL.
Consider replacing missing values if:
- Larger amounts of missing values (up to 20-25% - pie chart) when data MCAR
- Insufficient sample size leading to underpowered results
- Case deletion would introduce bias to estimates (Brick & Kalton, 1996)
Which is better modern missing data imputation techniques or deletion?
Why?
Modern missing data imputation techniques are preferable to deletion because they can reduce the impact of MNAR according to Graham (2009)