Batch Flashcards
What is Batch?
Enables you to run batch computing workloads on the AWS Cloud. It is a regional service that simplifies running batch jobs across multiple AZs within a region.
Features of Batch
-Batch manages compute environments and job queues, allowing you to easily run thousands of jobs of any scale using EC2 and EC2 Spot.
-Batch chooses where to run the jobs, launching additional AWS capacity if needed.
-Batch carefully monitors the progress of your jobs. When capacity is no longer needed, it will be removed.
-Batch provides the ability to submit jobs that are part of a pipeline or workflow, enabling you to express any interdependencies that exist between them as you submit jobs.
Batch components?
-Jobs
-Job Definitions
-Job Queues
-Compute Environment
What are Jobs?
A unit of work (such as a shell script, a Linux executable, or a Docker container image) that you submit to Batch.
How do Jobs reference other jobs?
Jobs can reference other jobs by name or by ID, and can be dependent on the successful completion of other jobs.
How many job types are there and what are they?
Single for single Job
Array for array Job of size 2 to 10,000
Overview of an Array Job.
An array job shares common job parameters, such as the job definition, vCPUs, and memory. It runs as a collection of related, yet separate, basic jobs that may be distributed across multiple hosts and may run concurrently.
What are Multi-node Parallel Jobs?
Multi-node parallel jobs enable you to run single large-scale, tightly coupled, high-performance computing applications and distributed GPU model training jobs that span multiple Amazon EC2 instances.
Features/Facts of Batch with Multi-node Parallel Jobs?
-Batch lets you specify up to five distinct node groups for each job. Each group can have its own container images, commands, environment variables, and so on.
-Each multi-node parallel job contains a main node, which is launched first. After the main node is up, the child nodes are launched and started. If the main node exits, the job is considered finished, and the child nodes are stopped.
-Not supported on computing environments that use Spot Instances.
What is a job dependency and how many dependencies exist per job?
-A job may have up to 20 dependencies.
-fFor Job depends on, enter the job IDs for any jobs that must finish before this job starts.
Array Jobs only For N-To-N Job Dependencies.
(Array jobs only) For N-To-N job dependencies, specify one or more job IDs for any array of jobs for which each child job index of this job should depend on the corresponding child index job of the dependency.
(Array jobs only) Run children sequentially
(Array jobs only) Run children sequentially creates a SEQUENTIAL dependency for the current array job. This ensures that each child index job waits for its earlier sibling to finish.
How many states are there?
7
What are the states?
-SUBMITTED – a job that has been submitted to the queue, and has not yet been evaluated by the scheduler.
-PENDING – a job that resides in the queue and is not yet able to run due to a dependency on another job or resource.
-RUNNABLE – a job that resides in the queue has no outstanding dependencies and is, therefore, ready to be scheduled to a host. Jobs in this state are started as soon as sufficient resources are available in one of the computing environments that are mapped to the job’s queue.
-STARTING – jobs have been scheduled to a host and the relevant container initiation operations are underway.
-RUNNING – the job is running as a container job on an Amazon ECS container instance within a computing environment. When the job’s container exits, the process exit code determines whether the job succeeded or failed. An exit code of 0 indicates success, and any non-zero exit code indicates failure.
-SUCCEEDED – the job has successfully completed with an exit code of 0. The job state for SUCCEEDED jobs is persisted for 24 hours.
-FAILED – the job has failed all available attempts. The job state for FAILED jobs is persisted for 24 hours.
What can you do with Jobs that fail?
You can apply a retry strategy to your jobs and job definitions that allows failed jobs to be automatically retried.