Nextflow Flashcards
DSL
Domain specific language: “a computer language specialized to a particular application domain. This is in contrast to a general-purpose language, which is broadly applicable across domains.” (wikipedia). Nextflow comes with a Nextflow DSL - the language used to write pipeline scripts.
DSL2
Nextflow came out with a major update to its DSL (pipeline code language) around 2020. It’s called DSL2. Pipelines written in the old syntax are called DSL1 and are no longer supported by the latest Nextflow releases.
Process
Base unit of a Nextflow pipeline. Has inputs, outputs and a script block, among other things. The script block can be any language, but is usually bash and usually just runs a bioinformatics command line tool.
Module
A Nextflow script that can be shared and imported into a pipeline. Typically contains just one process, but can have more
Workflow/Sub-Workflow
Workflow can be a generic term for a pipeline, but also has special meaning when writing Nextflow code. All DSL2 Nextflow pipelines (workflows?) must include a workflow. They can also be named, chained together, and run independently. A workflow collects processes and channel logic into a single unit. A sub-workflow is a workflow that is called by another workflow.
Task
A task is a unit of execution by Nextflow. A process may have a template script and input / output channels. When you run the pipeline, Nextflow will generate a task for every set of inputs to a process, resolving the template script with any variables and then running it as a task. So a single process can spawn many tasks. You can think of a task as a process instance.
Nextflow Workflow
See: Nextflow Pipeline
Nextflow Pipeline
See: Nextflow Workflow
Nextflow Channels
Nextflow channels are the magic pipes that connect each process block together. Outputs from a process task go into a channel. That channel can then be used as an input to another, different process. Channels are special data-flow variables.
Operator
Special functions (methods) to work with channels. For example, to filter / fork / reduce channels and many more things.
Executor
An interface between the Nextflow pipeline and the underlying compute infrastructure. Nextflow has many executors to support many compute environments (AWS Batch, Azure Batch, Google Cloud Batch, Slurm, LSF, Kubernetes, etc…)
xpack
Seqera provides Nextflow xpack licenses which are paid extensions to Nextflow.
Nextflow head job
Nextflow orchestrates the execution of numerous tasks, which means when you run a Nextflow pipeline, you can have hundreds of thousands of tasks running at the same time. Though they’re all managed to some extent by Nextflow, there’s only one Nextflow process, which we refer as the head job when we’re in a cloud computing environment or in an HPC. Though some people may choose to run Nextflow in the login node of a cluster, the best practice is to submit it as a job (and then, the term head job) that, in turn, will submit more jobs for every task.