Week 3/4- Cleaning your data Flashcards
What is data cleaning about?
Making sure that there are no errors or other problems or issues in the database
What is straight-lining?
When a respondent marks the same response in almost all the items
what are inconsistent answers?
When a respondent gives different answers to similar questions
what do suspicious response patterns do to the validity of the data?
Reduces it
how should data ENTRY errors be corrected?
By going back to the original survey
What should you do if you have data entry errors but cant go back to the original survey eg. f2f interviews
The data error should be deleted
what are outliers?
Values that are situated far from all other observations
how can we check for outliers?
boxplots
If there is no clear explanation for outliers what should you do with them?
Retain them
What are the two levels at which missing data can occur?
Entire surveys are missing (survey non-response) or respondents have not answered all the items (item non-response)
what are the three categories of missing data?
1) Missing completely at random- Best type
2) Missing at random
3) Non random missing data- worst type
what is the range?
the difference between the highest and lowest values in a dataset
what is the interquartile range?
the difference between the 3rd and 1st quartile
What is variance?
tells us how strongly observations vary around the mean
what does low variance show us?
that the observations tend to be very close to the mean