Week 1 - Data cleaning Flashcards

1
Q

What is Data cleaning?

A

data cleaning is to ensure that the data we have is correct and complete. This ensure that the results reflect accurately

If we did not do Data cleaning it can compromise this research

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How do we clean our DATA?

A
  • Observe the range – see if the fall within the limits.
  • Could be an issue of the excel formulas.
  • Human error
  • Use data software to ensure you don’t enter the incorrect scales in the software.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are some cleaning DATA approaches?

A
  • Listwise deletion
    Only use data that is complete with all variables
    Reduces sample size.
    In listwise deletion a case is dropped from an analysis because it has a missing value in at least one of the specified variables. The analysis is only run on cases which have a complete set of data
  • Casewise deletion
    In statistics, listwise deletion is a method for handling missing data. In this method, an entire record is excluded from analysis if any single value is missing
    Reduces sample size
  • Replace each missing value with variable mean/mode from the entire sample.
    Can replace with either the mean or mode
    Cause less variation in the data set
  • Replace each missing value with the participants existing scores for that scale
    Kind of like the Cronbach’s alpha, α
  • Multiple imputation methods
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What can we do about outliers ?

A

Outliers can cause skewness.

  • Exclude from analysis if it is agreed that the person is beyond the
    population.
  • If outlier is appropriate to keep you can trim the score back to keep
    the score within 1 to 2 SD. So the outlier will still be the highest or
    lowest score but within the bounds of 1 to 2 SD.
    Different places have different SD cut offs for outliers ranging for
    1,2 and 3 SD
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is MCAR?

A

MCAR – Missing completely at random

We can test if the data is missing completely at random with a statistical test. It is called Little’s missing completely at random (MCAR) test- if the p value from the test is <.05, the data are not MCAR and listwise deletion of these cases may lead to biased/inaccurate conclusions. Imputation is justified when Little’s MCAR is < .05

Beyond scope to get into MCAR at this point of the course.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

DATA cleaning protocols

A

For example, one could remove outliers where the other will trim the outlier to stay with 2 to 3 SD.
It is important to note that if you do one, for example trim back the data you need to leave it as is. You cant go back and remove the data to see if that will align with what you want to see a little better.
This is deemed as Scientific Fraud

How well did you know this?
1
Not at all
2
3
4
5
Perfectly