Pre-Processing Flashcards

1
Q

Why is pre-processing needed?

A

It is needed to ensure that data is accurate, complete, consistent, timely, believable and interpretable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the major preprocessing activities?

A
  • Data cleaning
  • Data intergration
  • Data reduction
  • Data transformation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are examples of noisy data?

A
  • Truncated field
  • Text incorrectly spilt accross cells
  • Incorrect data types
  • Data that doesnt make logical sense
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is inconsistent data?

A

Data that contains infomation that has different representations or has values that dont make sense with the rest of the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is noisy data?

A

Data that contains additional needless infomation called noise

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are examples of inconsistent data?

A
  • Different naming representations
  • Different date formats
  • Inconsistency between cells
  • Sharing unique values
  • Outliers
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are disguised missing values?

A

Missing values that take the default value predetermined by the program. To determine if this has occured, look for suspicious occurances in the data set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is missing or incomplete data?

A

Data that is missing values in cells

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What does MCAR stand for?

A

Missing completely at random. Probability of missing data on a variable is unrelated to any other variable or the variable itself

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does MNAR stand for?

A

Missing not at random. Missing values related to the values of that variable itself even after controlling for other variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are some examples of causes of missing data?

A
  • Equipment malfunction
  • Not recorded due to missunderstanding
  • May not be considered important at time of entry
  • Deliberate
How well did you know this?
1
Not at all
2
3
4
5
Perfectly