ETL Process Flashcards

1
Q

-takes the raw data from operational systems and prepares it for the data warehouse

A

extract, transformation,and load (ETL)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

-need handle different extract schedules
-need to join data from multiple sources for a single dimension.
-need to determine correct to use when multiple versions of a field are avaiable from different sources.

A

ETL Requirements

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

-reading and understanding source data and copying needed data into data staging for further processing.
-implemented using gateways, standard interfaces, third-party products or custom development

A

extraction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

(logical extraction approaches)
-entire data source extrated-no need to track changes to data.

A

full

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

(logical extraction approaches)
-only data that has changed during specified time interval is extracted
-more efficient, but requires identifying chages to data in operational source systems.

A

incremental

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

(extraction in databases)
-captures changes made to tables either synchronously (using triggers) or asynchronously (usin redo log files).

A

change data capture

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

(extraction in databases)
-can be used if operational system had these indicatimg date/time a row was last modified.

A

timestamps

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

(extraction in databases)
-can be used if source system use range partitioning with a date key

A

partitioning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

(extraction in databases)
-typicallt used in conjuction with timestamp columns - triiger updates timestamp column upon each change on the source table

A

trigger

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

online
-can be done via a reference to anohter dataase or schema.
offiline
-extraction process access data in structures outside source system.
–flat files (via SQL Plus, OCI or Pro C programs)
–SSIS packages
–export data utility (show example)
–backup files (via SQL Server Backup Utility)

A

Extraction in SQL Server

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

online
-extraction process connects directly to source system to extract data.
-can be done via a databse link or direct schema reference.
offiline
-extraction process access data in structures outside source system.
–flat files (via SQL Plus, OCI or Pro C programs)
–SSIS packages
–export data utility (show example)
–backup files (via SQL Server Backup Utility)

A

extraction in Oracle

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

(extraction)
-applications such a informatca and datastage can connect directly to Oracle using Native Oracle drivers.
-queries are designed in a GUI interface, then run by a server engine when

A

third-party tools

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

(common types of business rules -transformation)
-only keep records with certain data values or ranges

A

filtering

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

(common types of business rules -transformation)
-totaling recording into a summary record

A

summarizing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

(common types of business rules -transformation)
-combining two or more source system record into one output or target record.

A

merging

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

(common types of business rules -transformation)
-converting codes to English description or data warehouse standard descriptions

A

transposing

17
Q

(common types of business rules -transformation)
-converting codes from differen sources to data warehouse standard values

A

codifying

18
Q

(common types of business rules -transformation)
-applying mathemtical formulas to rpoduce data to be stored in the data warehouse

A

derivations

19
Q

(common types of business rules -transformation)
-ensuring the quality of the data-that it is consistent, is of a known, recognized value, and conforms to the metadata definition for it.

A

cleansing

20
Q

-can be just about any type of data conversion
-during load are most efficient.
-can be done when fact data is inserted or as an update after the fact record has been loaded.
-don’t ignore or limit transformations because they are difficult!
-the difficult it is, the more important it is for the data warehouse designer to calculate if if possible.

A

transformation

21
Q

(transformation applications)
-each transformation implemented as a separate SQL operation, creating a temporary staging table.

A

multistage

22
Q

(transformation applications)
-“transform-whole-load”-transformations performed during loading process, eliminating need to create temporary staging tables.

A

pipelined

23
Q

process of loading transformed records into the warehouse.
-additional preprocessing may be done during load.
–sort
–aggregate
–compute views
–build indexes
–partition
-huge volumes of data to be loaded, yet small time window (usually at night) when the warehouse can be taken off-line.
-typically handled by batch load utlities

A

load

24
Q

(loading in oracle)
-SQL Loader
-external tables
-OCI and Direct Path APIs
-export/import

A

loading mechanisms

25
Q

(loading in oracle)
-loads data from flat files into Oracle
-can perform basic data transformations

A

SQL Loader