Scheduling data processing jobs using Cron and Apache Airflow Flashcards
What is Apache Airflow?
Apache Airflow is an open-source platform to programmatically author, schedule, and monitor workflows.
What is a DAG in Airflow?
DAG stands for Directed Acyclic Graph, and it represents a workflow, detailing the tasks and their dependencies in Airflow.
What are Tasks in Airflow?
Tasks are the basic units of execution in Airflow and are defined as individual steps in a DAG.
How do you define a DAG in Airflow?
A DAG is defined using Python code, with the DAG class from the airflow module.
What is an Operator in Airflow?
Operators are templates that define individual tasks in a DAG. They determine what the tasks do, such as running a script, executing a Bash command, or calling an API.
What is a Sensor in Airflow?
Sensors are a special type of operator that will keep running until a certain condition is met. They are used to wait for external events.
What is the purpose of the Airflow Scheduler?
The Scheduler is responsible for scheduling the DAGs and ensuring the tasks are executed according to their schedule.
What is the Airflow Web UI?
The Web UI is a graphical interface provided by Airflow to help users monitor and manage DAGs and tasks.
How does Airflow handle task dependencies?
Task dependencies are managed by setting upstream and downstream tasks in the DAG definition using the»_space; and «_space;operators.
What is a Task Instance in Airflow?
A Task Instance represents a specific run of a task in a DAG, characterized by its execution date and state.
What are XComs in Airflow?
XComs, or Cross-Communication, allow tasks to share small amounts of data, enabling inter-task communication.
What is a Task Group in Airflow?
Task Groups allow for grouping related tasks within a DAG, improving organization and readability.
What are Hooks in Airflow?
Hooks are interfaces to external systems, providing methods to interact with databases, cloud services, and other systems.
How can you handle retries and failures in Airflow tasks?
You can configure retries and failure handling using parameters like retries, retry_delay, and retry_exponential_backoff in the task definition.
What is the Airflow Executor?
The Executor is a key component that determines how task instances are executed. Popular executors include the SequentialExecutor, LocalExecutor, and CeleryExecutor.