AWS ML Associate Flashcards
Performance metric: Measure the imbalance of positive outcomes between different facet values.
Difference in proportions of labels (DPL)
Performance metric: Identify the difference in the predicted outcome as an input feature changes.
Partial dependence plots (PDPs)
Performance metric: Quantify the contribution of each feature in a prediction.
Shapley values
What should you use for data processing if it involves Tensorflow or Pytorch?
SageMaker
What is the simplest way to prevent internet and data access to inference containers?
Sagemaker network isolation mode
Create a baseline to monitor a Sagemaker model’s bias drift. For instance, you want it to weigh personal income over credit history for loan approval. How do you do this?
Create a SHAP baseline using the ‘ModelExplainabilityMonitor’ class. Generate a feature attribution baseline which will trigger when the observed feature attribution occurs.
tool used to check for bias and explainability in datasets and models
SageMaker Clarify
used to visualize and analyze intermediate tensors. Identify specific poor classifications in a CNN and make adjustments to improve model performance.
SageMaker with TensorBoard
How do you strip PII from text-based user interactions
Amazon Comprehend
RNN training: Exploding gradients causing a convergence issue. What feature can help address this issue?
Sagemaker Training Compiler. Optomises DL models to accelerate training by more efficiently using ML GPU instances.
What instance types are supported by AWS Neuron SDKs for real-time inference on streaming video?
Inferentia instances (Inf2 family)
What are used to centralize and standardize model documentation.
SageMaker Model Cards
SageMaker Serverless Inference: What is the biggest consideration when deciding whether to use provisioned concurrency?
low-latency (avoiding cold-starts)
(CloudWatch) What feature in the Logs Insights page is helpful in finding infrastructure monitoring through-lines in your query results?
The Patterns tab
What is the primary purpose of Capacity Blocks for machine learning (ML)?
Reserve GPU instances for short-duration machine learning workloads on a future date.
When using an embedded question to query a vector database for RAG, what should be returned?
The full text - not embeddings - of the nearest neighbor documents to enhance the query
How can you use SageMaker Model Monitor to re-train your model?
Enable Data Capture, and use that data to retrain the model.
Exploratory data visualization that can be used to identify hidden patterns, (ralationship analysis) such as an increase in specific item purchases or periods of frequent transactions
Heat Map