Week 3/4- Cleaning your data Flashcards

1
Q

What is data cleaning about?

A

Making sure that there are no errors or other problems or issues in the database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is straight-lining?

A

When a respondent marks the same response in almost all the items

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what are inconsistent answers?

A

When a respondent gives different answers to similar questions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what do suspicious response patterns do to the validity of the data?

A

Reduces it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

how should data ENTRY errors be corrected?

A

By going back to the original survey

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What should you do if you have data entry errors but cant go back to the original survey eg. f2f interviews

A

The data error should be deleted

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what are outliers?

A

Values that are situated far from all other observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

how can we check for outliers?

A

boxplots

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

If there is no clear explanation for outliers what should you do with them?

A

Retain them

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the two levels at which missing data can occur?

A

Entire surveys are missing (survey non-response) or respondents have not answered all the items (item non-response)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what are the three categories of missing data?

A

1) Missing completely at random- Best type
2) Missing at random
3) Non random missing data- worst type

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what is the range?

A

the difference between the highest and lowest values in a dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what is the interquartile range?

A

the difference between the 3rd and 1st quartile

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is variance?

A

tells us how strongly observations vary around the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what does low variance show us?

A

that the observations tend to be very close to the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does high variance show us?

A

indicates that the observations tend to be very far apart

17
Q

what are measures of dispersion?

A

Provides researchers with information about the variability of the data (how far the values are spread out)