Chapter 18 Flashcards
What is incremental refresh
Update on data in incremental change from operation system to DWH
What are 2 sources for DWH
- Modern system
- Legacy system
What is CDC
Change data capture (CDC) is the process of capturing changes made at the data source and applying them throughout the enterprise.
What is legacy system
System that used in 90’s and still in use to some extent
What are CDC in modern systems
- Time stamps
- Triggers
- Partitioning
What is time stamp CDC
When ever there is a DML operation, a transaction is store for telling its date and time in a separate column.
What is trigger
When ever there is a DML operation, a record with time stamp is stored in a separate file and it can be used in extraction.
What is partitioning
Data is logically divided in to partitions. And whenever we need data for some period, we target that period table
What are CDC in legacy systems
- Changes recorded in tapes
- Changes read and removed from tapes
- Problem with reading a journal tape are many
(All operations are recorded on tapes instead tables)
What are advantages of CDC in legacy system
- No incremental online
- The log tape captures all update processing
- Log tape processing can be taken off-line
- No haste to make waste
What are major transformation types
- Formal revision
- Decoding of fields
- Calculated and derived value
- Splitting of single fields
- Character set conversion
- Unit of measurement conversion
- Date/time conversion
- Summarization
- Key restructuring
- DeDuplication
What is merging
Collect information from different columns and get them in one place
What is aggregation
Suppose we already have some calculations. Now we make combinations of that pre-defined calculation and get results. That result is called aggregation.
What is deduplication
Remove duplication in transformation process