Chapter 44 Flashcards
What is Data profiling
Data profiling is a process which involves gathering of information about column through execution of certain queries with intention to identify erroneous records. It should also provide us a detailed view about the quality of data.
What is standardization
Standardization process involves the consistency of number and types of columns, date formats, and storing conventions in database
What we identify during data profiling
- Total number of values in a column
- Number of distinct values in a column
- Domain of a column
- Values out of domain of a column
- Validation of business rules
How much time we should perform data profiling for the effectiveness of transformation
Twice. Once before transformation and once after transformation
What function is used for + or - in dates
CAST function
What is golden copy
original copy