Unit 14 Orchestration, MLOps and Job Scheduling Flashcards

Objectives: >describe the difference between orchestration and scheduling >describe common tools for orchestration and scheduling >discuss the value of MLOps

1
Q

What is K8S?

A

Kubernetes is an open-source container orchestration system for automating software deployment, scaling, and management. It is often used in container based environments that need to scale to meet user needs and is useful for inference in AI clusters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the difference between orchestration and scheduling?

A

Orchestration is container-based, designed for micro-services, and adapted for AI. It scales up/down for inferencing, manages entire workflows and processes, and load balances to distribute traffic across containers. Scheduling is bare-metal based, supports containers, and is designed for HPC. It has advanced scheduling features built-in, assigns tasks and jobs to available resources, and determines hosts with available resources to run containers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is SLURM?

A

SLURM (Simple Linux Utility for Resource Management) is an open-source cluster management and job scheduling system for Linux clusters, widely used by supercomputers and computing clusters around the world. It is highly scalable, fault-tolerant, and requires no kernel modifications. SLURM efficiently schedules jobs across a subset of cluster resources, including CPUs and GPUs, making it ideal for high-performance computing (HPC) tasks and AI training.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is MLOps?

A

MLOps is short for Machine Learning Operations. MLOps tools help to improve user productivity and speed up workflow, maximize utilization of resources, and allow projects to scale. MLOps tools help to bring consistency and repeatability to AI and ML workloads.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

list Kubernetes Components

A

Node(узел): Server added to K8S cluster.
Cluster: A collection of one or more servers
Container: a self-contained, deployable application
Pod: a container and associated meta-data
Volume: attached storage that can be shared in K8S
Service: Networking and ports. A mechanism for connecting applications over a network and managing ports.
Workload mgmt: managing workloads (applications packaged in containers) running in a cluster. In K8s, workloads are collections of objects like Job, DaemonSet, Deployment, and CronJob.
* Job: Launches a Pod to perform a one-time task and then completes.
* DaemonSet: Ensures a copy of a Pod runs on every (or selected) node, often used for monitoring or network agents.
* Deployment: Maintains a specified number of Pod replicas and redeploys them if deleted, ensuring Pods are in the desired state.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

NVIDIA GPU Operator

A

NVIDIA GPU Operator - an open-source tool that provides IT infrastructure teams with the necessary resources and tools to efficiently manage and deploy GPUs in Kubernetes environments.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Container Engine

A

Container Engine - for instance, docker, virtualization software makes developing and deploying applications much easier, packages applications with all necessary dependencies configuration system tools ,and runtime.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

NVIDIA Network Operator

A

NVIDIA Network Operator a tool installed on top of GPU operator can be installed on top of GPU operator to enable GPU direct RDMA remote direct memory access.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Name MLOps Partners

A

Runai
Paperspace
DeterminedAI

How well did you know this?
1
Not at all
2
3
4
5
Perfectly