GenAI Flashcards
What instances for computer vision?
G5
high GPU power from optimized NVIDIA GPUs, low latency, high bandwidth
can process lots of real-time imaging data
computer vision
what is it and what AWS services are relevant
allows computers to interpret and analyze visual data
captures images, algorithims process, makes decisions like identifying objects, classifying images, tracking movements
Amazon Rekognition: For image and video analysis.
SageMaker: To train custom vision models.
EC2 G5 Instances: Ideal for GPU-intensive computer vision tasks.
difference between traditional genAI and agentic AI
genAI responds to inputs to provide content generation – reactive but no actions without direct user interaction, no memory or goal setting
agentic AI - extends GenAI by combining content generation with autonomy, can initiate and complete tasks, make decisions, interact with systems without human input
agentic AI can automate end to end processes
task & workflow orchestration
relevant AWS services
task orchestration – manages how tasks are triggered, executed, connected
workflow orestration – organizes + manages sequence of tasks like a roadmap
services:
-step functions: orchestrates workflows by connecting tasks in a flowchart
-AWS lambda: executes individual tasks
-AWS EventBridge: detects events, decides which service/task is then best for the job and routes it there
EventBridge vs. Lambda
eventbridge detects events then routes to a task/service, Lambda takes this events and executes a task
example of AWS service stack for a computer vision agent
API gateway - doc upload
eventbridge
instance recommendation for complex physics simulations
what types of applications?
p5 instances for power, speed, scalability
H100 GPUs in P5 instances explicitly optimized for speed + precision
applications: robotics, AVs, fluid dynamics (think air or liquid analysis in automotive, healthcare, aerospace)
instances for training vs inference workloads
training: P5 (think power, precision, physics) on NVIDIA H100
inference: G5 or Inferentia-powered Inf1 instances (think G = guess = infer, G-rated = more mellow) // moderate power
distilling and pruning
enable you to make models smaller/more compact without sacrificing quality (this means it can use less compute resources)
common libraries for GenAI
PyTorch and TensorFlow - two of the most popular open-source machine learning (ML) frameworks. These tools provide developers with the libraries, tools, and building blocks to create, train, and deploy machine learning models
PyTorch/TensorFlow are the foundational engines/base layers that run the math and train the AI
Hugging Face/Nemo are the apps that use PyTorch/Tensor Flow to do the calculations then handle the other hard parts of AI
Elastic Fabric Adapter
increases speed (lower latency, higher throughput) for AI/ML/HPC applications by enabling high speed data transfers + fast communication across EC2 instances (allows them to talk to each other)
used for distributed model training (deep learning models), distributed ML frameworks (tensorcore)
main aws services for GenAI
EC2 Instances with GPUs: For GenAI workloads, P3 or P4 EC2 instances with NVIDIA GPUs are commonly used for training deep learning models. Inf1 instances (based on AWS Inferentia chips) are designed for high-performance inference workloads.
Amazon SageMaker: A fully managed service for building, training, and deploying machine learning models, including GenAI models. SageMaker offers tools like SageMaker Studio, SageMaker Training, and SageMaker Pipelines for end-to-end ML workflows.
AWS Lambda: For serverless, event-driven inference of GenAI models in real time, especially for lightweight model deployment.
Amazon S3: Used for storing training data, model weights, and large datasets used in GenAI.
Amazon EFS or FSx: For distributed storage when handling large-scale datasets and models.
AWS Deep Learning AMIs: Pre-configured AMIs with popular deep learning frameworks like TensorFlow, PyTorch, and MXNet, making it easy to get started with GenAI models.
HPC
use of powerful computing systems to solve complex computational problems like scientific/climate/financial modeling
why GenAI startups would use image repository
images allow developers to package their AI models, dependencies, and environments into containers, making it easy to deploy them consistently across different environments (development, testing, production).
when GenAI would use spot instance
non-critical deep learning workloads that are not time sensitive like batch training or data preprocessing
most significant (90%) savings
when GenAI would use reserved instances
if company will have consistent, predictable workloads (like regular inference or long term production deployments), like app with AI models being offered via API 24/7
1 or 3 year commitment, 75% savings
Latest AWS ML/AI Instances
Amazon EC2 P5, P5e and P5en Instances, G5, P4, Trn, Infr
elastic fabric adapter
high performance networking layer for high speed data transfers between memory of EC2 instances (bypasses CPUs)
increases speed (lower latency, higher throughput) for AI/ML/HPC applications
used for (summary):
- parallel processing, machine learning, deep learning
used for (detailed)
- distributed model training (deep learning models, LLMs)
- running ML inference at scale by synchronizing model inference servers across instance
- distributed ML frameworks (tensorflow, pytorch, hugging face)
- parallel AI / multi-node workflows
Main EC2 Instances for machine learning
- P5 instances (lot of customers use to train): intel sapphire rapids CPU and NVIDIA H100 or H200 Tensor Core CPUs. For deep learning and EFA (elastic fabric adapter) applications
- G5 instances: graphics intensive (i.e. computer vision) and ML inference. NVIDIA A10G tensor core GPUs
- Trn1 and Inf2
EC2 capacity blocks for ML and which are supported?
reserve GPU instances for a future date to run your machine learning (ML) workloads – think pre-order / upfront payment to secure the capacity
Capacity Blocks supports Amazon EC2 P5en, P5e, P5, and P4d, Trn2 and Trn1 instances
*think Revolve pre-order - guarantee order at certain date