ETL Concepts & Ab Initio Basics Flashcards
What does ETL stand for?
Extract, Transform, Load.
What is the purpose of ETL?
To access and manipulate source data and load it into target systems.
Why is ETL important for organisations?
It integrates scattered data from various platforms and architectures for unified reporting and applications.
What are the main steps in ETL?
Validate → Clean → Transform → Aggregate → Load.
What is the purpose of the “Extract” step?
Retrieve data from various source systems.
What happens during “Transform”?
Data is cleaned, validated, and formatted for target systems.
What does “Load” do in ETL?
Inserts transformed data into the target system, like a data warehouse.
Name three common ETL tools and their companies.
Informatica by Informatica Corporation.
DataStage by IBM.
Talend by Talend Software Company.
What is Ab Initio?
A GUI-based parallel processing tool for ETL, supporting distributed and parallel data processing.
What is the primary component of Ab Initio?
A graph, which contains components and flows for data processing.
What is GDE in Ab Initio?
Graphical Development Environment – used to build ETL applications.
What is the role of the Co>Operating System in Ab Initio?
It manages graph execution and translates Ab Initio code into OS-specific commands.
What is Enterprise Meta Environment (EME)?
A version control and data management system in Ab Initio.
What is Data Parallelism?
Splits data into divisions for simultaneous processing by multiple components.
What is Pipeline Parallelism?
Multiple components process the same data simultaneously within a graph.
What is Component Parallelism?
Different graph components process data simultaneously on separate branches.
What are Database Components in Ab Initio?
Used to read from or write to databases (e.g., run SQL, input table, output table).
What are Transform Components?
Apply functions for data integration (e.g., join, reformat, sort, filter by expression).
What are Dataset Components?
Handle serial or multi-file data (e.g., input file, output file, lookup file).
In an ETL use case, what happens during the “Extract” step?
Raw data is collected from source systems, like city names, titles, and hire dates.
How is data transformed in ETL?
It is cleaned and mapped to target system formats using tools like Ab Initio GDE.
What does the “Load” step achieve?
Data is inserted into the target system with additional attributes, like EmployeeID and DepartmentCode
Why is validation important in ETL?
To ensure the accuracy and integrity of extracted data.
Why use a sandbox in Ab Initio?
To save and manage graph components for repeatable ETL workflows.