Data Cleaning Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

Dataset must have these 5 characteristics

A

Validity
Accuracy
Completeness
Consistency
Uniformity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Four common issues revolving Data Cleaning

A

Unnecessary Data
Missing Data
Irregular Data
Inconsistent Data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Three types of Unnecessary Data

A

Uninformative/Repetitive
Irrelevant
Duplicates

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

PII

A

Should be removed; any information that can identify an individual user

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

3 ways to classify missing data

A

Missing Completely at Random
Missing at Random
Missing not at Random

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Missing Completely at Random

A

no relationship between missing data and data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Missing at Random

A

missing data is related to the observed but not unobserved data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Missing not at Random

A

Missing Data is related to the unobserved data, not the observed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How can you handle missing data?

A

Drop feature
Impute the value
Flag the missing info

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Irregular Data

A

outliers - found with IQR rule
data contradicting business or domain knowledge. findings and data from reliable sources, or intuition

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Inconsistent Data

A

data not in consistent form or syntax,
Capitalization
Formats
Misspelled words

How well did you know this?
1
Not at all
2
3
4
5
Perfectly