MLA-C01 Flashcards
Cert Exam Study
Before you can use auto scaling, you must have already created an Amazon SageMaker ______________.
model endpoint.
You can have multiple model _____________for the same endpoint.
versions
Amazon SageMaker ____________ provides tools to help explain how machine learning (ML) models make predictions.
Clarify
An ____________can be thought of as the answer to a Why question that helps humans understand the cause of a prediction.
explanation
On AWS, AI/ML practitioners can use Amazon Sagemaker ____________, which uses Shapley values to help answer how different variables influence model behavior.
Clarify
Debug model output tensors from machine learning training jobs in real time and detect non-converging issues using Amazon SageMaker ____________.
Debugger
___________is the extent to which you can explain the internal mechanics of an ML or deep learning system in human terms.
Explainability
Amazon SageMaker _________produces metrics that measure the predictive quality of machine learning model candidates.
Autopilot
The ratio of the number of correctly classified items to the total number of (correctly and incorrectly) classified items.
Accuracy
measures how well an algorithm predicts the true positives (TP) out of all of the positives that it identifies.
Precision
uses natural language processing (NLP) to extract insights about the content of documents. It develops insights by recognizing the entities, key phrases, language, sentiments, and other common elements in a document.
Amazon Comprehend
a text translation service that uses advanced machine learning technologies to provide high-quality translation on demand. use to translate unstructured text documents or to build applications that work in multiple languages.
Amazon Translate
a fully managed, automatic speech recognition (ASR) service that makes it easy for developers to add speech to text capabilities to their applications.
Amazon Transcribe
a cloud service that converts text into lifelike speech. You can use to develop applications that increase engagement and accessibility.
Amazon Polly
a cloud-based image and video analysis service that makes it easy to add advanced computer vision capabilities to your applications.
Amazon Rekognition
a fully managed service that uses statistical and machine learning algorithms to deliver highly accurate time-series forecasts.
Amazon Forecast
an AWS service for building conversational interfaces for applications using voice and text.
Amazon Lex
a fully managed machine learning service that uses your data to generate item recommendations for your users. It can also generate user segments based on the users’ affinity for certain items or item metadata.
Amazon Personalize
a machine learning (ML) service that automatically extracts text, handwriting, and data from scanned documents.
Amazon Textract
an intelligent search service that uses natural language processing and advanced machine learning algorithms to return specific answers to search questions from your data.
Amazon Kendra
allows you to conduct a human review of machine learning (ML) systems to guarantee precision.
Amazon Augmented AI (A2I)
uses machine learning (ML) to make it easier for customers to accurately detect anomalies in their metrics.
Amazon Lookout for Metrics
a fully managed service enabling customers to identify potentially fraudulent activities and catch more online fraud faster.
Amazon Fraud Detector
a fully managed, generative-AI powered assistant that you can configure to answer questions, provide summaries, generate content, and complete tasks based on your enterprise data.
Amazon Q Business
Amazon Polly is the Opposite of Amazon ____________.
Transcribe
______________measures how many actual positives were predicted as positive.
Recall
_____________is the harmonic mean of precision and recall.
F1-measure
It measures the ability of the model to predict a higher score for positive examples as compared to negative examples.
AUC (Area Under Curve)
_________is a method used in machine learning to reduce errors in predictive data analysis.
Boosting
____________improves machine models’ predictive accuracy and performance by converting multiple weak learners into a single strong learning model.
Boosting
____________ are data structures in machine learning that work by dividing the dataset into smaller and smaller subsets based on their features
Decision trees
Boosting creates an ____________model by combining several weak decision trees sequentially.
ensemble
In ________, data scientists improve the accuracy of weak learners by training several of them at once on multiple datasets. In contrast, boosting trains weak learners one after another.
bagging
__________is a popular and efficient open-source implementation of the gradient boosted trees algorithm.
XGBoost
___________boosting is a supervised learning algorithm that tries to accurately predict a target variable by combining multiple estimates from a set of simpler models.
Gradient
Amazon SageMaker _____________ reduces data prep time for tabular, image, and text data from weeks to minutes.
Data Wrangler
With SageMaker ________________ you can simplify data preparation and feature engineering through a visual and natural language interface.
Data Wrangler
Sagemaker ____________ a no-code ML tool that helps business analysts generate accurate ML predictions without having to write code or without requiring any ML experience.
Canvas
Amazon SageMaker ____________ is a fully managed, purpose-built repository to store, share, and manage features for machine learning (ML) models.
Feature Store
___________are inputs to ML models used during training and inference.
Features
SageMaker ____________ tags and indexes feature groups so they are easily discoverable through the visual interface of Amazon SageMaker Studio.
Feature Store
Amazon SageMaker ____________ offers the most comprehensive set of human-in-the-loop capabilities, allowing you to harness the power of human feedback across the ML lifecycle to improve the accuracy and relevancy of models.
Ground Truth
You can complete a variety of human-in-the-loop tasks with SageMaker ___________, from data generation and annotation to model review, customization, and evaluation, either through a self-service or an AWS-managed offering.
Ground Truth
SageMaker _________helps identify potential bias during data preparation without writing code.
Clarify
SQL function used for anomaly detection on numeric columns in a stream
RANDOM_CUT_FOREST
is derived from “Linux” and “cluster
Lustre
a type of parallel distributed file system, for large-scale computing
Lustre
a fully managed Windows file system share drive
FSx for Windows File Server
a network drive you can attach to your instances while they run
EBS
Managed NFS (network file system) that can be mounted on many EC2
EFS
Data Warehouse vs Data Lake. Warehouse is ________________. Lake is ___________
Structured, Unstructured
Binary format that stores both the data and its schema
AVRO
Columnar storage format optimized for analytics.
Parquet
Find K “nearest” (most similar) rows and average their values
K Nearest Neighbor (KNN)
Find linear or non-linear relationships between the missing feature and other features
Regression
Duplicate samples from the minority class
Oversampling
Instead of creating more positive samples, remove negative ones
Undersampling
measures how “spread-out” the data is
Variance
________________ 𝜎 is just the square root of the variance.
Standard Deviation
Data points that lie more than one ___________________ from the mean can be considered unusual.
Standard Deviation
Bucket observations together based on ranges of values.
Binning
Create “buckets” for every category * The bucket for your category has a 1, all others have a 0
One Hot Encoding
______________ for deploying to edge devices
SageMaker NEO
___________values are the algorithm used to determine the contribution of each feature toward a model’s predictions
Shapley
Used on the final output layer of a
multi-class classification problem
Softmax
Choosing an activation function: For multiple classification, use _________on the output layer
softmax
Choosing an activation function: ________do well with Tanh
RNN’s
Choosing an activation function: For everything else
Start with ReLU
Choosing an activation function: _________for really deep networks
Swish
When you have data that doesn’t
neatly align into columns
* Images that you want to find features
within
* Machine translation
* Sentence classification
* Sentiment analysis
Convlution Neural Network (CNN)
RNN’s: what are they for?
Time-series data
When you want to predict future behavior based
on past behavior
Recurrent Neural Network (RNN)
Sequence to sequence, Sequence to vector, Vector to sequence, Encoder -> Decoder
RNN topologies
_________batch sizes tend to not get stuck in local minima
Small
____________batch sizes can converge on the wrong solution at
random
Large
_________learning rates can overshoot the correct solution
Large
____________learning rates increase training time
Small
- ________________techniques are
intended to prevent overfitting.
Regularization
Preventing overfitting in ML in general
* A regularization term is added as
weights are learned
L1 and L2 Regularization
- L1: sum of _______________
weights
L2: sum of ______________
square of weights
We need to understand true
positives and true negative, as well
as false positives and false
negatives.
confusion matrix
Percent of positives rightly predicted
Recall
AKA Correct Positives
Precision
Plot of true positive rate (recall) vs. false
positive rate at various threshold settings.
ROC Curve
The area under the ROC curve is… wait
for it..
AUC
Generate N new training sets by random sampling with
replacement
Bagging
Training is sequential; each classifier takes into account the
previous one’s success.
Boosting
Define the hyperparameters you care about and the ranges you
want to try, and the metrics you are optimizing for
Automatic Model Tuning
Don’t optimize too many hyperparameters at once
* Limit your ranges to as small a range as possible
* Use logarithmic scales when appropriate
* Don’t run too many training jobs concurrently
* This limits how well the process can learn as it goes
* Make sure training jobs running on multiple instances report the
correct objective metric in the end
Automatic Model Tuning: Best Practices
Stop training in a tuning job early if it is not improving the
objective significantly
Early Stopping
Uses one or more previous tuning jobs as a starting point
Warm Start
Automates:
* Algorithm selection
* Data preprocessing
* Model tuning
* All infrastructure
* It does all the trial & error for you
SageMaker Autopilot
Visual IDE for machine learning
SageMaker Studio
Create and share
Jupyter notebooks with
SageMaker Studio
* Switch between
hardware configurations
(no infrastructure to
manage)
SageMaker Notebooks
Organize, capture, compare, and search your ML jobs
SageMaker Experiments
Saves internal model state at periodical intervals
* Gradients / tensors over time as a model is trained
* Define rules for detecting unwanted conditions while training
* A debug job is run for each rule you configure
* Logs & fires a CloudWatch event when the rule is hit
SageMaker Debugger
Catalog your models, manage model
versions
* Associate metadata with models
* Manage approval status of a model
* Deploy models to production
* Automate deployment with CI/CD
SageMaker Model Registry
___________________ is a visualization
toolkit for Tensorflow or PyTorch
* Visualize loss and accuracy
* Visualize model graph
* View histograms of weight, biases over
time
* Project embeddings to lower
dimensions
* Profiling
Tensorboard
Compile & optimize training jobs on GPU instances
* Can accelerate training up to 50%
* Converts models into hardware-optimized instructions
* Tested with Hugging Face transformers library, or bring your own model
* Incompatible with SageMaker distributed training libraries
SageMaker Training Compiler
Retain and re-use provisioned
infrastructure
* Useful if repeatedly training a model to
speed things up
* Use by setting
KeepAlivePeriodInSeconds in your
training job’s resource config
Warm Pools
Creates snapshots during your training
* You can re-start from these points if necessary
* Or use them for troubleshooting, to analyze the model at different
points
* Automatic synchronization with S3 (from /opt/ml/checkpoint)
Checkpointing
Run automatically when using ml.g or
ml.p instance types
* Replaces any faulty instances
* Runs GPU health checks
* Ensures NVidia Collective
Communication Library is working
Cluster Health Checks and Automatic
Restarts
You can of course run multiple
training jobs in parallel
* “job parallelism”
* Individual training can also be
parallelized
* Distributed data parallelism
* Distributed model parallelism
Distributed Training
Network device attached to your
SageMaker instances
* Makes better use of your bandwidth
* Promises performance of an onpremises High Performance
Computing (HPC) cluster in the cloud
Elastic Fabric Adapter (EFA)
______________ produces a
weighted average of all
token embeddings. The
magic is in computing the
attention weights.
Self-attention
A mask can be applied
to prevent tokens from
“peeking” into future
tokens (words)
Masked Self-Attention
Chat!
* Question answering
* Text classification
* i.e., sentiment analysis
* Named entity recognition
* Summarization
* Translation
* Code generation
* Text generation
* i.e., automated customer service
Applications of Transformers
Tokenization, token encoding
* Token embedding
* Captures semantic relationships
between tokens, token similarities
* Positional encoding
* Captures the position of the token
in the input relative to other nearby
tokens
* Uses an interleaved sinusoidal
function so it works on any length
LLM Input processing
The stack of decoders outputs a
vector at the end
* Multiply this with the token
embeddings
* This gives you probabilities
(logits) of each token being the
right next token (word) in the
sequence
LLM Output processing