Airflow Core Concepts - Sheet1 Flashcards
- What is a DAG in Airflow?
A DAG (Directed Acyclic Graph) is the core concept of Airflow, collecting Tasks together, organized with dependencies and relationships to say how they should run.
- What is the role of a DAG in Airflow?
The DAG in Airflow doesn’t care about what is happening inside the tasks
- What does a basic DAG define?
A basic DAG defines the tasks and dictates the order in which they have to run, and which tasks depend on what others. It also states how often to run the DAG.
- What are the three ways to declare a DAG in Airflow?
The three ways to declare a DAG in Airflow are: use a context manager, use a standard constructor, passing the DAG into any operators you use, or use the @dag decorator to turn a function into a DAG generator.
- What do DAGs need to run in Airflow?
DAGs need Tasks to run in Airflow, and those usually come in the form of either Operators, Sensors, or TaskFlow.
- What does a Task/Operator in a DAG usually depend on?
A Task/Operator in a DAG usually has dependencies on other tasks (those upstream of it), and other tasks depend on it (those downstream of it).
- How are individual task dependencies declared in a DAG?
Individual task dependencies in a DAG can be declared using the»_space; and «_space;operators, or using the more explicit set_upstream and set_downstream methods.
- What is the cross_downstream method used for in a DAG?
The cross_downstream method is used in a DAG to make two lists of tasks depend on all parts of each other.
- What is the chain method used for in a DAG?
The chain method is used in a DAG to chain together dependencies, or to create pairwise dependencies for lists of the same size.
- How does Airflow load DAGs?
Airflow loads DAGs from Python source files, which it looks for inside its configured DAG_FOLDER. It takes each file, executes it, and then loads any DAG objects from that file.
- Can you define multiple DAGs per Python file?
Yes, you can define multiple DAGs per Python file, or even spread one very complex DAG across multiple Python files using imports.
- What does Airflow consider when searching for DAGs inside the DAG_FOLDER?
When searching for DAGs inside the DAG_FOLDER, Airflow only considers Python files that contain the strings airflow and dag (case-insensitively) as an optimization.
- How to consider all Python files when searching for DAGs inside the DAG_FOLDER?
To consider all Python files when searching for DAGs inside the DAG_FOLDER, you should disable the DAG_DISCOVERY_SAFE_MODE configuration flag.
- What is an .airflowignore file?
An .airflowignore file is a file inside your DAG_FOLDER, or any of its subfolders, which describes patterns of files for the loader to ignore.
- How to control if a python file needs to be parsed by Airflow in a more flexible way?
If the .airflowignore does not meet your needs and you want a more flexible way to control if a python file needs to be parsed by Airflow, you can plug your callable by setting might_contain_dag_callable in the config file.
- What’s the difference between context manager and standard constructor for DAG declaration?
The context manager automatically adds the DAG to any tasks inside it implicitly while the standard constructor requires the DAG to be passed into any operators used.
- How to use a context manager for DAG declaration?
You can use a context manager for DAG declaration with the with statement and the DAG function.
- How to use a standard constructor for DAG declaration?
You can use a standard constructor for DAG declaration by explicitly defining the DAG and passing it into any operators you use.
- What is the @dag decorator for?
The @dag decorator is used to turn a function into a DAG generator in Airflow.
- What’s the purpose of task dependencies in Airflow?
Task dependencies in Airflow dictate the order of task execution based on the dependencies between different tasks.
- How to use the»_space; and «_space;operators for task dependencies?
The»_space; and «_space;operators are used to specify downstream and upstream dependencies respectively between different tasks.
- How to use the set_upstream and set_downstream methods for task dependencies?
The set_upstream and set_downstream methods are used to specify upstream and downstream dependencies respectively between different tasks.
- What does the cross_downstream function do?
The cross_downstream function is used to specify dependencies between two lists of tasks where every task in the first list is dependent on every task in the second list.
- How to use the chain method for task dependencies?
The chain method is used to specify a series of dependencies between tasks where each task is dependent on the previous one.