Tasks in Airflow - Sheet1 (1) Flashcards

1
Q
  1. What is a Task in Airflow?
A

A Task is the basic unit of execution in Airflow. They are arranged into DAGs and then have upstream and downstream dependencies set between them to express the order they should run in.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q
  1. What are the three basic kinds of Task in Airflow?
A

Operators, Sensors, and a TaskFlow-decorated @task. Operators are predefined task templates, Sensors wait for an external event to happen, and a TaskFlow-decorated @task is a custom Python function packaged up as a Task.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q
  1. How are tasks related to each other in Airflow?
A

Tasks in Airflow are related to each other through their dependencies. These dependencies can be upstream or downstream tasks. Upstream tasks directly precede other tasks, while downstream tasks need to be a direct child of the other task.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q
  1. How can dependencies between tasks be declared in Airflow?
A

Dependencies can be declared using the&raquo_space; and &laquo_space;(bitshift) operators, or the set_upstream and set_downstream methods.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q
  1. How does information pass between Tasks in Airflow?
A

Tasks don’t pass information to each other by default and run independently. If you want to pass information from one Task to another, you should use XComs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q
  1. What is a Task Instance in Airflow?
A

An instance of a Task is a specific run of that task for a given DAG (and thus for a given data interval). It represents the stage of the lifecycle the task is in.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q
  1. What are the possible states for a Task Instance in Airflow?
A

The states are none, scheduled, queued, running, success, shutdown, restarting, failed, skipped, upstream_failed, up_for_retry, up_for_reschedule, deferred, and removed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q
  1. What is a timeout in Airflow?
A

A timeout is a maximum runtime for a task. If it’s breached, the task times out and AirflowTaskTimeout is raised. Sensors also have a timeout parameter which is relevant for sensors in reschedule mode.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q
  1. What is an SLA in Airflow?
A

An SLA, or a Service Level Agreement, is an expectation for the maximum time a Task should be completed relative to the Dag Run start time. If a task takes longer than this to run, it is then visible in the “SLA Misses” part of the user interface, as well as going out in an email of all tasks that missed their SLA.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q
  1. What are special exceptions in Airflow?
A

Airflow provides two special exceptions that can be raised from within custom Task/Operator code: AirflowSkipException will mark the current task as skipped and AirflowFailException will mark the current task as failed ignoring any remaining retry attempts.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q
  1. What are Zombie/Undead Tasks in Airflow?
A

Zombie tasks are tasks that are supposed to be running but suddenly died. Undead tasks are tasks that are not supposed to be running but are, often caused when you manually edit Task Instances via the UI.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q
  1. What is Executor Configuration in Airflow?
A

Some Executors allow optional per-task configuration. This is achieved via the executor_config argument to a Task or Operator.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q
  1. How can dependencies between tasks be declared using the bitshift operators?
A

You can declare dependencies between tasks using the bitshift operators (» and «) in Airflow. For example: first_task&raquo_space; second_task&raquo_space; [third_task, fourth_task]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q
  1. What is the recommended way to declare dependencies between tasks in Airflow?
A

The recommended way to declare dependencies between tasks in Airflow is to use the bitshift operators (» and «), as they are easier to read in most cases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q
  1. What is the default behavior of a Task in Airflow regarding upstream tasks?
A

By default, a Task will run when all of its upstream (parent) tasks have succeeded in Airflow.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q
  1. How can you modify the behavior of a Task in Airflow to add branching or change its dependencies?
A

You can modify the behavior of a Task in Airflow by using Control Flow techniques. These techniques allow you to add branching, specify which upstream tasks to wait for, or change behavior based on the current run’s history.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q
  1. What are the two ways to declare dependencies between tasks in Airflow?
A

The two ways to declare dependencies between tasks in Airflow are using the bitshift operators (» and «) or the more explicit set_upstream and set_downstream methods.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q
  1. What is the purpose of XComs in Airflow?
A

XComs are used in Airflow to pass information between tasks. They provide a way to share data, such as results or intermediate values, from one task to another within a workflow.

19
Q
  1. What is an SLA miss in Airflow?
A

An SLA miss occurs when a task in Airflow takes longer to complete than the defined SLA (Service Level Agreement) timeframe. It is tracked and can be seen in the Airflow UI and can trigger notifications or alerts.

20
Q
  1. How can you set an SLA for a task in Airflow?
A

To set an SLA for a task in Airflow, you can pass a datetime.timedelta object to the task’s “sla” parameter. This defines the maximum time allowed for the task to complete relative to the DAG Run start time.

21
Q
  1. What are AirflowSkipException and AirflowFailException used for?
A

AirflowSkipException is used to mark the current task as skipped, while AirflowFailException is used to mark the current task as failed, ignoring any remaining retry attempts. These exceptions provide control over the task’s state from within custom Task/Operator code.

22
Q
  1. What are Zombie tasks and Undead tasks in Airflow?
A

Zombie tasks in Airflow are tasks that are supposed to be running but suddenly died or became inactive. Undead tasks are tasks that are not supposed to be running but are active due to manual editing of Task Instances via the Airflow UI.

23
Q
  1. How can you configure task-specific settings for certain Executors in Airflow?
A

You can configure task-specific settings for certain Executors in Airflow using the “executor_config” argument provided to a Task or Operator. This allows you to specify executor-specific parameters or configurations for individual tasks.

24
Q
  1. What is the purpose of a Task in Airflow?
A

A Task in Airflow represents a unit of work or a specific action that needs to be executed as part of a workflow. It can be an operator, a sensor, or a custom Python function decorated as a Task.

25
Q
  1. What is the relationship between Tasks and Operators in Airflow?
A

In Airflow, Tasks and Operators are somewhat interchangeable concepts. Operators are predefined task templates that can be used to build most parts of a DAG (Directed Acyclic Graph), and when called in a DAG file, they become Tasks.

26
Q
  1. What is the role of Sensors in Airflow?
A

Sensors are a special subclass of Operators in Airflow that are designed for waiting and monitoring external events or conditions before proceeding with the execution of downstream tasks. They help synchronize the workflow with external dependencies.

27
Q
  1. How are Tasks and Operators related to the BaseOperator class in Airflow?
A

In Airflow, Tasks and Operators are subclasses of Airflow’s BaseOperator class. The BaseOperator provides a foundation for defining and executing tasks, and it encapsulates common functionality and attributes that are shared among tasks and operators.

28
Q
  1. What is the purpose of upstream and downstream tasks in Airflow?
A

Upstream tasks in Airflow are the tasks that directly precede a given task. Downstream tasks are the tasks that are directly dependent on a given task. These relationships define the order and dependencies between tasks within a DAG.

29
Q
  1. How can you declare dependencies between tasks using the set_upstream and set_downstream methods in Airflow?
A

To declare dependencies between tasks using the set_upstream and set_downstream methods in Airflow, you can use the following syntax: first_task.set_downstream(second_task), third_task.set_upstream(second_task). These methods explicitly define the upstream and downstream dependencies between tasks.

30
Q
  1. What is the purpose of the execution_timeout attribute in Airflow?
A

The execution_timeout attribute in Airflow allows you to set a maximum runtime for a task. If the task exceeds this duration, it will time out and AirflowTaskTimeout will be raised.

31
Q
  1. What is an SLA miss callback in Airflow?
A

An SLA miss callback is a function that can be defined in Airflow to run custom logic when a task misses its SLA (Service Level Agreement) timeframe. It receives parameters such as the DAG, task list, blocking task list, and SLA information.

32
Q
  1. How can you configure per-task settings for specific Executors in Airflow?
A

In Airflow, per-task settings for specific Executors can be configured using the executor_config argument provided to a Task or Operator. This allows you to specify executor-specific configurations, such as the Docker image for tasks running on the KubernetesExecutor.

33
Q
  1. What is the purpose of Control Flow in Airflow?
A

Control Flow in Airflow allows you to modify the behavior of tasks and express complex workflow logic. It enables you to add branching, specify conditional dependencies, and control task execution based on the history or context of the current run.
34. How can you pass information between tasks in Airflow?

34
Q
  1. What is the purpose of TaskFlow in Airflow?
A

TaskFlow is a feature in Airflow that allows you to define tasks as custom Python functions decorated with @task. It provides a more flexible and expressive way to create tasks compared to using predefined operators.

35
Q
  1. What is the relationship between TaskFlow and Task in Airflow?
A

TaskFlow tasks, decorated with @task, are a type of Task in Airflow. They represent custom Python functions packaged as tasks within a DAG. TaskFlow tasks can be used alongside other types of tasks and operators in a workflow.

36
Q
  1. What are the possible states of a Task Instance in Airflow?
A

The possible states of a Task Instance in Airflow are: none, scheduled, queued, running, success, shutdown, restarting, failed, skipped, upstream_failed, up_for_retry, up_for_reschedule, deferred, and removed. These states represent the various stages in the lifecycle of a task during execution.

37
Q
  1. How can you handle task timeouts in Airflow?
A

To handle task timeouts in Airflow, you can set the “execution_timeout” attribute of a task to a datetime.timedelta value. This defines the maximum allowed runtime for the task. If the task exceeds this duration, it times out and AirflowTaskTimeout is raised.

38
Q
  1. What is the purpose of SLAs in Airflow?
A

SLAs (Service Level Agreements) in Airflow define the maximum time a task should be completed relative to the DAG Run start time. They provide a measure of performance and can trigger alerts or notifications when a task exceeds its defined SLA timeframe.

39
Q
  1. What are Zombie tasks in Airflow?
A

Zombie tasks in Airflow are tasks that are supposed to be running but suddenly died or became inactive. They are typically detected during periodic checks performed by Airflow, and the necessary actions are taken to clean them up or handle their failures.

40
Q
  1. What are Undead tasks in Airflow?
A

Undead tasks in Airflow are tasks that are not supposed to be running but are active. They often occur when Task Instances are manually edited via the Airflow user interface, resulting in tasks being in an active state even when they shouldn’t be.

41
Q
  1. What is Executor Configuration in Airflow?
A

Executor Configuration in Airflow allows you to set per-task configurations for specific Executors. It provides a way to customize the execution environment and behavior of tasks by specifying executor-specific parameters or settings.

42
Q
  1. How can you set an SLA for a task in Airflow?
A

To set an SLA for a task in Airflow, you can use the “sla” parameter of the Task or Operator. By specifying a datetime.timedelta value, you define the maximum time allowed for the task to complete relative to the DAG Run start time.

43
Q
  1. What are the two ways to declare task dependencies in Airflow?
A

In Airflow, task dependencies can be declared using the bitshift operators (» and «) or the set_upstream and set_downstream methods. Both approaches allow you to establish upstream and downstream relationships between tasks.

44
Q
  1. What is the purpose of an SLA miss callback in Airflow?
A

An SLA miss callback in Airflow is a function that gets triggered when a task misses its SLA timeframe. It provides a way to handle and react to SLA violations, allowing you to customize actions such as sending notifications, logging, or performing specific tasks.