Tasks in Airflow - Sheet1 (1) Flashcards
- What is a Task in Airflow?
A Task is the basic unit of execution in Airflow. They are arranged into DAGs and then have upstream and downstream dependencies set between them to express the order they should run in.
- What are the three basic kinds of Task in Airflow?
Operators, Sensors, and a TaskFlow-decorated @task. Operators are predefined task templates, Sensors wait for an external event to happen, and a TaskFlow-decorated @task is a custom Python function packaged up as a Task.
- How are tasks related to each other in Airflow?
Tasks in Airflow are related to each other through their dependencies. These dependencies can be upstream or downstream tasks. Upstream tasks directly precede other tasks, while downstream tasks need to be a direct child of the other task.
- How can dependencies between tasks be declared in Airflow?
Dependencies can be declared using the»_space; and «_space;(bitshift) operators, or the set_upstream and set_downstream methods.
- How does information pass between Tasks in Airflow?
Tasks don’t pass information to each other by default and run independently. If you want to pass information from one Task to another, you should use XComs.
- What is a Task Instance in Airflow?
An instance of a Task is a specific run of that task for a given DAG (and thus for a given data interval). It represents the stage of the lifecycle the task is in.
- What are the possible states for a Task Instance in Airflow?
The states are none, scheduled, queued, running, success, shutdown, restarting, failed, skipped, upstream_failed, up_for_retry, up_for_reschedule, deferred, and removed.
- What is a timeout in Airflow?
A timeout is a maximum runtime for a task. If it’s breached, the task times out and AirflowTaskTimeout is raised. Sensors also have a timeout parameter which is relevant for sensors in reschedule mode.
- What is an SLA in Airflow?
An SLA, or a Service Level Agreement, is an expectation for the maximum time a Task should be completed relative to the Dag Run start time. If a task takes longer than this to run, it is then visible in the “SLA Misses” part of the user interface, as well as going out in an email of all tasks that missed their SLA.
- What are special exceptions in Airflow?
Airflow provides two special exceptions that can be raised from within custom Task/Operator code: AirflowSkipException will mark the current task as skipped and AirflowFailException will mark the current task as failed ignoring any remaining retry attempts.
- What are Zombie/Undead Tasks in Airflow?
Zombie tasks are tasks that are supposed to be running but suddenly died. Undead tasks are tasks that are not supposed to be running but are, often caused when you manually edit Task Instances via the UI.
- What is Executor Configuration in Airflow?
Some Executors allow optional per-task configuration. This is achieved via the executor_config argument to a Task or Operator.
- How can dependencies between tasks be declared using the bitshift operators?
You can declare dependencies between tasks using the bitshift operators (» and «) in Airflow. For example: first_task»_space; second_task»_space; [third_task, fourth_task]
- What is the recommended way to declare dependencies between tasks in Airflow?
The recommended way to declare dependencies between tasks in Airflow is to use the bitshift operators (» and «), as they are easier to read in most cases.
- What is the default behavior of a Task in Airflow regarding upstream tasks?
By default, a Task will run when all of its upstream (parent) tasks have succeeded in Airflow.
- How can you modify the behavior of a Task in Airflow to add branching or change its dependencies?
You can modify the behavior of a Task in Airflow by using Control Flow techniques. These techniques allow you to add branching, specify which upstream tasks to wait for, or change behavior based on the current run’s history.
- What are the two ways to declare dependencies between tasks in Airflow?
The two ways to declare dependencies between tasks in Airflow are using the bitshift operators (» and «) or the more explicit set_upstream and set_downstream methods.