Chapter 19 Flashcards
What are ETL pieces
Transformation etc
What is data cleansing
Dirty data should remove to go to dataware house
What is GIGO
Stands for “Garbage In, Garbage Out.” GIGO is a computer science acronym that implies bad input will result in bad output.
What is dirty data
It is relative term. It means data does not confirm its value.
Who tells data dirty or clean
The person who have domain knowledge
What is toddler employee
Example of dirty data. Employee too much young to get a job
What is un-born employee
Employee DOB is less than Date of joining
What is govt decision making
Investment of govt where there is no need and it is loss of money
What is direct mall marketing
Failure of advertisement campaign and loss of money
What are lighter side of dirty data
- Toddler Employee
- Un-born Employee
What are 3 classes of anomalies
- Syntactically dirty data
- Semantically dirty data
- Coverage anomalies
What are sub classes of syntactically dirty data
- Lexical errors
- Irregularities
What are sub classes of Semantically dirty data
- Integrity constraint violation
- Business rule contradiction
- Duplication
What are Coverage anomalies
- Missing attributes
- Missing Records
What are lexical errors
There is problem in structure of data and storage problem