ETL Process Flashcards
-takes the raw data from operational systems and prepares it for the data warehouse
extract, transformation,and load (ETL)
-need handle different extract schedules
-need to join data from multiple sources for a single dimension.
-need to determine correct to use when multiple versions of a field are avaiable from different sources.
ETL Requirements
-reading and understanding source data and copying needed data into data staging for further processing.
-implemented using gateways, standard interfaces, third-party products or custom development
extraction
(logical extraction approaches)
-entire data source extrated-no need to track changes to data.
full
(logical extraction approaches)
-only data that has changed during specified time interval is extracted
-more efficient, but requires identifying chages to data in operational source systems.
incremental
(extraction in databases)
-captures changes made to tables either synchronously (using triggers) or asynchronously (usin redo log files).
change data capture
(extraction in databases)
-can be used if operational system had these indicatimg date/time a row was last modified.
timestamps
(extraction in databases)
-can be used if source system use range partitioning with a date key
partitioning
(extraction in databases)
-typicallt used in conjuction with timestamp columns - triiger updates timestamp column upon each change on the source table
trigger
online
-can be done via a reference to anohter dataase or schema.
offiline
-extraction process access data in structures outside source system.
–flat files (via SQL Plus, OCI or Pro C programs)
–SSIS packages
–export data utility (show example)
–backup files (via SQL Server Backup Utility)
Extraction in SQL Server
online
-extraction process connects directly to source system to extract data.
-can be done via a databse link or direct schema reference.
offiline
-extraction process access data in structures outside source system.
–flat files (via SQL Plus, OCI or Pro C programs)
–SSIS packages
–export data utility (show example)
–backup files (via SQL Server Backup Utility)
extraction in Oracle
(extraction)
-applications such a informatca and datastage can connect directly to Oracle using Native Oracle drivers.
-queries are designed in a GUI interface, then run by a server engine when
third-party tools
(common types of business rules -transformation)
-only keep records with certain data values or ranges
filtering
(common types of business rules -transformation)
-totaling recording into a summary record
summarizing
(common types of business rules -transformation)
-combining two or more source system record into one output or target record.
merging