Data Warehous Flashcards
Types of Data Warehousing architecture
1) Centralized (default)
2) Component - Based
Centralized Data Warehouse
Single Database
One stop shopping
2 types of data marts
Dependent - Depend on a datawarehouse
Independent - draws information from a source database
2 variations of ETL
Initial
Incremental
- Normally one time only
- Created right before DW goes live
- All relevant data necessary for BI & analytic
- Redo if DW “blows up”
Initial ETL
What to bring into the DW
- Data definitely needed for BI & analytics
- Data probably needed for BI & analytics
- Historical Data
- periodically “refreshed” the DW
- Adds new data
- Modifies Data
- Special handling for deleted data
Incremental ETL
4 major Incremental ETL patters
- Append (adds new data to end)
- In place update (updates current data with new info)
- Complete Replacement
- Rolling Append (when you add new data is deleted the oldest data)
2 Current MOST USED incremental ETL patters
- Append
- In place Update
2 Data Transformation overarching goals
Uniformity
Restructuring
6 common transformation models
- Data Value Unification
- Data type and size unification
- De-duplication
- Dropping columns (vertical slicing)
- Value-based row filtering (horizontal slicing)
- Correcting Known Errors
In DW
Measurements = ?
Dimensional Context = ?
Facts
Dimensions
4 types of Fact Tables
- transaction
- periodic snapshot
- accumulating snapshot
- factless
Record facts (measurement) from transactions
Transaction Fact Table
Track a given measurement at regular intervals
Periodic Snapshot Table
Track the progress of a business process through formally defined stages
Accumulating Snapshot fact tables
1) Record occurrence of a transaction that has no measurements
2) Record coverage or eligibility relationships
Factless Fact table
What is it called when you filter down the data coming from the source databases to only bring in the data that you need.
Change Data Capture
Change Data Capture techniques
- Transactional Data Timestamp
- Database Logs
- Last resort: database scan-and-compare