ETL Concepts & Ab Initio Basics Flashcards

1
Q

What does ETL stand for?

A

Extract, Transform, Load.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the purpose of ETL?

A

To access and manipulate source data and load it into target systems.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Why is ETL important for organisations?

A

It integrates scattered data from various platforms and architectures for unified reporting and applications.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the main steps in ETL?

A

Validate → Clean → Transform → Aggregate → Load.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the purpose of the “Extract” step?

A

Retrieve data from various source systems.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What happens during “Transform”?

A

Data is cleaned, validated, and formatted for target systems.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does “Load” do in ETL?

A

Inserts transformed data into the target system, like a data warehouse.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Name three common ETL tools and their companies.

A

Informatica by Informatica Corporation.
DataStage by IBM.
Talend by Talend Software Company.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is Ab Initio?

A

A GUI-based parallel processing tool for ETL, supporting distributed and parallel data processing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the primary component of Ab Initio?

A

A graph, which contains components and flows for data processing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is GDE in Ab Initio?

A

Graphical Development Environment – used to build ETL applications.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the role of the Co>Operating System in Ab Initio?

A

It manages graph execution and translates Ab Initio code into OS-specific commands.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is Enterprise Meta Environment (EME)?

A

A version control and data management system in Ab Initio.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is Data Parallelism?

A

Splits data into divisions for simultaneous processing by multiple components.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is Pipeline Parallelism?

A

Multiple components process the same data simultaneously within a graph.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is Component Parallelism?

A

Different graph components process data simultaneously on separate branches.

17
Q

What are Database Components in Ab Initio?

A

Used to read from or write to databases (e.g., run SQL, input table, output table).

18
Q

What are Transform Components?

A

Apply functions for data integration (e.g., join, reformat, sort, filter by expression).

19
Q

What are Dataset Components?

A

Handle serial or multi-file data (e.g., input file, output file, lookup file).

20
Q

In an ETL use case, what happens during the “Extract” step?

A

Raw data is collected from source systems, like city names, titles, and hire dates.

21
Q

How is data transformed in ETL?

A

It is cleaned and mapped to target system formats using tools like Ab Initio GDE.

22
Q

What does the “Load” step achieve?

A

Data is inserted into the target system with additional attributes, like EmployeeID and DepartmentCode

23
Q

Why is validation important in ETL?

A

To ensure the accuracy and integrity of extracted data.

24
Q

Why use a sandbox in Ab Initio?

A

To save and manage graph components for repeatable ETL workflows.

25
How can distributed processing improve ETL performance?
It allows components to run on multiple servers simultaneously.