Domain 4: ML Implementation and Operations Flashcards

1
Q

_____is a service that provides a record of actions taken by a user, role, or an AWS service. It simplifies compliance audits, security analysis, and operational troubleshooting by enabling event history, which allows you to view, search, and download recent AWS account activity.

A

AWS CloudTrail

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Event history: View the most recent account activity across your AWS infrastructure and troubleshoot operational issues.
CloudTrail Insights: Automatic detection of unusual activity in your account.
Data events: Record API calls made to specific AWS services such as Amazon S3 object-level APIs or AWS Lambda function execution APIs.
Management events: Record API calls that manage the AWS resources.

A

Key features of CloudTrail

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

_____ keeps an eye on every API call made to your AWS account and delivers a log file to an Amazon S3 bucket that you specify. These logs include details such as the identity of the API caller, the time of the API call, the source IP address, and the request parameters.

A

CloudTrail

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

_____ is a monitoring and observability service for AWS cloud resources and the applications you run on AWS. It can monitor AWS resources, such as EC2 instances, Amazon DynamoDB tables, and Lambda functions, and you can collect and access all your performance and operational data in the form of logs and metrics from a single platform.

A

Amazon CloudWatch

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Metrics: Collect and store key metrics, which are variables you can measure for your resources and applications.
Logs: Collect, monitor, and analyze log files from different AWS services.
Alarms: Watch for specific metrics and automatically react to changes.
Events: Respond to state changes in your AWS resources with EventBridge.

A

Key features of CloudWatch

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

This service allows you to set alarms and automatically react to changes in your AWS resources, and it also integrates with Amazon SNS to notify you when certain thresholds are breached.

A

CloudWatch

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Enable
Choose events
Specify S3 bucket
Turn on insights

A

How to get started w/ CloudTrail monitoring

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Set up metrics
Create alarms
Configure logging
Design dashboard

A

How to implement monitoring solutions with CloudWatch

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

To effectively monitor for errors and anomalies within your machine learning environment, you could set up a combination of _____ and _____.

A

CloudTrail and CloudWatch

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

By deploying applications across multiple Availability Zones, you can protect your applications from the failure of a single location.

A

High Availability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Multi-Region deployments can provide a backup in case of a regional service disruption.

A

Fault Tolerance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Different regions can serve users from geographically closer endpoints, reducing latency and improving the user experience.

A

Scalability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

For machine learning applications, having data processing and storage close to the data sources can reduce transfer times and comply with data sovereignty laws.

A

Data Locality

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

One or more discrete data centers within a region with redundant power, networking, and connectivity. They are physically separated by a meaningful distance, many kilometers, from any other.

A

Availability zone

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

You can deploy machine learning models using Amazon EC2 instances configured with _____, which can launch instances across multiple Availability Zones to ensure your application can withstand the loss of an AZ.

A

Auto Scaling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

For databases backing machine learning applications, a Multi-AZ deployment with _____ can provide high availability and automatic failover support.

A

Amazon RDS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Deploying applications _____ can protect against regional outages and provide geographic redundancy.

A

across multiple AWS Regions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

_____ allows you to replicate data between distant AWS Regions.

A

S3 cross-region replication (CRR)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

_____ can route traffic to different regions based on geographical location, which can reduce latency for end-users, and its Geoproximity routing lets you balance traffic loads across multiple regions.

A

Amazon Route 53

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Test Failover Mechanisms: Regularly test your failover to ensure that the systems switch to new regions or zones without issues.

Data Synchronization: Keep data synchronized across regions, considering the cost and traffic implications.

Latency: Use services such as Amazon CloudFront to cache data at edge locations and reduce latency.

Compliance and Data Residency: Be aware of compliance requirements and data residency regulations that may impact data storage and transfer.

Cost Management: Consider the additional costs associated with cross-region data transfer and storage.

A

Best Practices for Multi-Region and Multi-AZ Deployments

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

A _____ can be used to package and deploy machine learning applications consistently across different environments. By containerizing machine learning applications, you ensure that the application runs the same way, regardless of where it is deployed.

A

Docker

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

_____ provide a lightweight, standalone, and executable package of software that includes everything needed to run an application: code, runtime, system tools, system libraries, and settings.

A

Docker containers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Docker containers can be created and managed using services like _____.

A

Amazon Elastic Container Service (ECS), Amazon Elastic Kubernetes Service (EKS), or even directly on EC2 instances

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

A text document that contains all the commands a user could call on the command line to assemble an image.

A

Dockerfile

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

An immutable file that’s essentially a snapshot of a container. Images are built from the instructions for a complete and executable version of an application, which relies on the host OS kernel.

A

Docker Image

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

A runtime instance of a Docker image.

A

Docker Container

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Steps to create and deploy Docker containers in AWS for machine learning applications

A
  1. Install Docker on your local machine or AWS EC2 instance.
  2. Write a Dockerfile for your machine learning application.
  3. Build your Docker image with the docker build command.
  4. Run your Docker container locally to test with the docker run command.
  5. Push the Docker image to Amazon ECR.
  6. Deploy your Docker container on Amazon ECS or EKS.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

_______ are collections of EC2 instances that are treated as a logical grouping for the purposes of automatic scaling and management.

A

Auto Scaling groups

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

_____ monitors your applications and automatically adjusts capacity to maintain steady, predictable performance at the lowest possible cost.

A

AWS Auto Scaling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

T/F: Auto Scaling groups can help mange workload variability (significant computational resources for short periods, while inference may see spikes in demand) by adding/removing resources as needed.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

How do you deploy Auto Scaling groups?

A
  1. Create a launch template or launch configuration: Define the EC2 instance configuration that will be used by the Auto Scaling group. This includes the AMI, instance type, key pair, security groups, etc.
  2. Define the Auto Scaling group: Specify the minimum, maximum, and desired number of instances, as well as the availability zones for your instances. Attach the previously created launch template or launch configuration.
    Configure scaling policies:
  3. Establish the guidelines under which the Auto Scaling group will scale out (add more instances) or scale in (remove instances). Scaling policies can be based on criteria such as CPU utilization, network usage, or custom metrics.
  4. Set up notifications (optional): Create notifications to alert you when the Auto Scaling group launches or terminates instances, or when instances fail health checks.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

AWS provides _____ for Auto Scaling groups that you can use to monitor events such as:

  • EC2 instance launch or termination
  • Failed health checks
  • Scaling activities triggered by your policies
A

CloudWatch metrics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What are the main resources you’ll be rightsizing?

A

EC2 Instances: Virtual servers in Amazon’s Elastic Compute Cloud (EC2) service.
Provisioned IOPS: The input/output operations per second that a storage volume can handle.
EBS Volumes: Elastic Block Store (EBS) provides persistent block storage volumes for EC2 instances.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

For machine learning tasks, Amazon has specific instances like _____ that are optimized for GPU-based computations, which are ideal for training and inference in deep learning models.

A

the P and G series

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

When rightsizing EC2 instances, consider:

A

Compute Power: Match your instance’s CPU and GPU power to the training and inference needs of your machine learning model.
Memory: Choose an instance with enough RAM to handle your model and dataset.
Network Performance: Ensure the instance provides adequate network bandwidth for data transfer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

_____ are integral to the performance of the storage system, affecting how quickly data can be written to and read from the storage media.

A

IOPS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

ML workloads can be I/O-intensive, particularly during _____ phases.

A

dataset loading and model training

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

You should choose the volume type and size based on your _____ requirements.

A

ML workload

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

_____is an ongoing process. You should continuously monitor and analyze your AWS resource usage and performance metrics to identify opportunities to resize instances and storage options.

A

Rightsizing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

AWS offers tools such as _____ and _____ to track resource utilization and identify instances that are either over or under-utilized.

A

AWS CloudWatch and AWS Trusted Advisor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

Assess Regularly: Workloads can change over time, requiring different resources.
Use Managed Services: Managed services like Amazon SageMaker can automatically handle some rightsizing for you.
Consider Spot Instances: For flexible workloads, consider using spot instances, which can be cheaper but less reliable than on-demand instances.
Take Advantage of Autoscaling: Use autoscaling to adjust resources in response to changes in demand.

A

Practical Tips for Rightsizing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

What all is involved in rightsizing AWS resources?

A
  • Choosing the appropriate instance types, provisioned IOPS, and EBS volumes for your specific ML workloads.
  • Balancing between performance needs and cost optimization, ensuring that you’re using just the right amount of resources without underutilizing or overpaying.
  • Regularly monitoring and adjustments are key to maintaining an efficient and cost-effective AWS environment for machine learning applications.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

_____ plays a crucial role when it comes to managing incoming traffic across multiple targets, such as EC2 instances, in a reliable and efficient manner.

A

Load balancing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

Suitable for simple load balancing of traffic across multiple EC2 instances.

A

Classic Load Balancer (CLB)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

Best for advanced load balancing of HTTP and HTTPS traffic, providing advanced routing capabilities tailored to application-level content.

A

Application Load Balancer (ALB)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

Ideal for handling volatile traffic patterns and large numbers of TCP flows, offering low-latency performance.

A

Network Load Balancer (NLB)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

Helps you deploy, scale, and manage a fleet of third-party virtual appliances (such as firewalls and intrusion detection/prevention systems).

A

Gateway Load Balancer (GWLB)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

What is the most commonly used load balancer, and why?

A

Application Load Balancer due to its ability to make routing decisions at the application layer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

When deploying ML models, what does distributing inference requests across multiple model servers accomplish?

A
  • Ensures high availability and reduces latency for end users.
  • Helps distribute traffic across instances in different Availability Zones for fault tolerance.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

How is load balancing typically implemented in ML scenarios?

A
  1. Deployment of ML Models: You may have multiple instances of Amazon SageMaker endpoints or EC2 instances serving your machine learning models.
  2. Configuration of Load Balancer: An Application Load Balancer is configured to sit in front of these instances. ALB supports content-based routing, and with well-defined rules, you can direct traffic based on the inference request content.
  3. Auto Scaling: You can set up AWS Auto Scaling to automatically adjust the number of inference instances in response to the incoming application load.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

To ensure high availability, you will deploy your EC2 instances across multiple Availability Zones.

A

Availability Zones

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

The ALB periodically performs health checks on the registered instances and only routes traffic to healthy instances, ensuring reliability.

A

Health Checks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
53
Q

The EC2 instances will be part of an _____, which can automatically scale the number of instances up or down based on defined metrics such as CPU utilization or the number of incoming requests.

A

Auto Scaling group

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
54
Q

_____ adjusts the number of instances based on a target value for a specific metric.

A

Target tracking scaling policy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
55
Q

_____ increases or decreases the number of instances based on a set of scaling adjustments.

A

Step scaling policy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
56
Q

Monitors your load balancer and managed instances, providing metrics such as request count, latency, and error codes.

A

CloudWatch

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
57
Q

ALB can log each request it processes, which can be stored in S3 and used for analysis.

A

Access Logs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
58
Q

ALB supports request tracing to track HTTP requests from clients to targets.

A

Request Tracing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
59
Q

Amazon Machine Images (AMIs) serve as the templates for virtual servers on the AWS platform and are crucial for the rapid deployment of scalable, reliable, and secure applications.

A

Amazon Machine Images (AMIs)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
60
Q

An _____ contains all the necessary information to launch a virtual machine (VM) in AWS, including the operating system (OS), application server, applications, and associated configurations.

A

AMI

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
61
Q

Benefits of Using AMIs

A
  • Consistency: Ensures that each instance you launch has the same setup, reducing variability which leads to fewer errors.
  • Scalability: Streamlines the process of scaling applications by allowing new instances to be spun up with the same configuration.
  • Security: By pre-installing security patches and configuring security settings, you ensure compliance from the moment each instance is launched.
  • Version Control: You can maintain different versions of AMIs to rollback or forward to different configurations if needed.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
62
Q

A _____ is a type of AMI that is pre-configured with an optimal set of software and settings for a particular use case. It’s considered such because it’s a tested and proven baseline which teams can use as a stable starting point.

A

golden image

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
63
Q

Best Practices for Golden Images

A
  • Automation: Automate the creation and maintenance of golden images to reduce manual errors and save time.
  • Security Hardening: Implement security best practices within the image, including minimizing unnecessary software to reduce vulnerabilities.
  • Regular Updates: Continuously integrate latest security patches and updates.
    Versioning: Maintain versions of golden images to track changes over time and for audit purposes.
  • Immutable Infrastructure: Treat golden images as immutable; any change requires creating a new image rather than updating an existing one.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
64
Q

When working with AMI, you may need include:

A
  • Machine Learning Frameworks: Like TensorFlow, Keras, or PyTorch pre-installed and configured.
  • GPU Drivers: If leveraging GPUs for computation, ensure proper drivers and libraries are installed.
  • Data Processing Tools: Pre-installation of tools like Apache Spark or Hadoop if needed for data processing.
  • Optimized Libraries: Depending on your machine learning tasks, you might need optimized math libraries such as Intel MKL.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
65
Q

Data both in transit and at rest, should be _____.

A

encrypted

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
66
Q

AWS offers several mechanisms for encryption, such as _____ for managing keys and _____ for managing SSL/TLS certificates.

A

AWS KMS / AWS Certificate Manager

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
67
Q

Use the _____ to identify and right-size underutilized instances.

A

AWS Cost Explorer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
68
Q

_____ to optimize hyperparameters instead of manual experimentation.

A

SageMaker Automatic Model Tuning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
69
Q

When training machine learning models, _____ can significantly improve input/output operations and reduce training time.

A

caching data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
70
Q

_____ which are designed to be more efficient and scalable than their open-source equivalents.

A

Leverage AWS-optimized ML algorithms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
71
Q

Implement _____ and _____ for your machine learning models and datasets to facilitate recovery in case of failures.

A

automated backups and versioning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
72
Q

For long-running training jobs, use _____to save interim model states, which will allow you to resume from the last checkpoint rather than starting over in the event of a failure.

A

checkpointing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
73
Q

Implement _____ for automated testing, building, and deployment of ML models.

A

continuous integration and continuous delivery (CI/CD) pipelines

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
74
Q

Use _____ and _____ to automate the deployment of machine learning models trained with SageMaker.

A

AWS CodePipeline and CodeBuild

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
75
Q

Set up _____alarms on SageMaker endpoints to monitor the performance of your deployed models and trigger retraining workflows with ______ if necessary.

A

CloudWatch / AWS Step Functions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
76
Q

Use _____ in multiple AWS Regions and ____ to route traffic for high availability.

A

SageMaker Model Hosting Services / Route 53

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
77
Q

_____ is a service that turns text into lifelike speech. It utilizes advanced deep learning technologies to synthesize speech that sounds like a human voice, and it supports multiple languages and includes a variety of lifelike voices.

A

Amazon Polly

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
78
Q
  • Speech (TTS) in a variety of voices and languages.
  • Real-time streaming or batch processing of speech files.
  • Support for Speech Synthesis Markup Language (SSML) for adjusting speech parameters like pitch, speed, and volume.
A

Key features of Amazon Polly

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
79
Q
  • Creating applications that read out text, such as automated newsreaders or e-learning platforms.
  • Generating voiceovers for videos.
  • Creating conversational interfaces for devices and applications.
A

Use cases for Amazon Polly

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
80
Q

_____ is an AWS service for building conversational interfaces using voice and text. Powered by the same technology that drives Amazon Alexa, it provides an easy-to-use console for creating sophisticated, natural language chatbots.

A

Amazon Lex

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
81
Q
  • Natural lanuage understanding (NLU) and automatic speech recognition (ASR) to interpret user intent.
  • Integration with AWS Lambda to execute business logic or fetch data dynamically.
  • Seamless deployment across multiple platforms such as mobile apps, web applications, and messaging platforms.
A

Key features of Amazon Lex

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
82
Q
  • Customer service chatbots to assist with common requests or questions.
  • Voice-enabled application interfaces that allow for hands-free operation.
  • Enterprise productivity bots integrated with platforms like Slack or Facebook Messenger.
A

Amazon Lex use cases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
83
Q

_____ uses deep learning processes to convert speech to text quickly and accurately. It can be used to transcribe customer service calls, automate subtitling, and generate metadata for media assets to create a fully searchable archive.

A

Amazon Transcribe

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
84
Q
  • High-quality speech recognition that supports various audio formats.
  • Identification of different speakers (speaker diarization) within the audio.
  • Supports custom vocabulary and terms specific to particular domains or industries.
A

Key features of Amazon Transcribe

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
85
Q
  • Transcribing recorded audio from customer service calls for analysis and insight.
  • Automated generation of subtitles for videos.
  • Creating text-based records of meetings or legal proceedings.
A

Amazon Transcribe use cases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
86
Q

T/F: SageMaker covers classification, regression, and clustering.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
87
Q

Advantages of using Amazon SageMaker built-in algorithms include:

A

Ease of Use: These algorithms are pre-implemented and optimized for performance, allowing you to focus on model training and deployment without worrying about the underlying code.
Performance: Amazon SageMaker algorithms are designed to be highly scalable and performant, benefiting from AWS optimizations.
Integration: Built-in algorithms are tightly integrated with other SageMaker features, including model tuning and deployment.
Cost-Effectiveness: They can offer a cost advantage for certain tasks due to the efficiencies gained from optimization.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
88
Q

T/F: Amazon SageMaker supports various built-in algorithms like Linear Learner, XGBoost, and Random Cut Forest, among others.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
89
Q

T/F: Building custom machine learning models allows for greater flexibility and control over the architecture, features, and hyperparameters.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
90
Q

Building custom models is useful when:

A

Unique Requirements: Pre-built algorithms might not be suitable for specific tasks or data types.
Innovative Research: Custom experiments and novel architectures are necessary for cutting-edge machine learning research.
Domain Specialization: Highly specialized tasks may require custom-tailored solutions.
Performance Tuning: When the utmost performance is required, and you need to optimize every aspect of the model yourself.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
91
Q

Ease of Use: High
Model Complexity: Low to moderate
Specificity of Application: General use cases
Data Volume: High (optimized for scalability)
Performance Optimization: Pre-optimized, may be limited
Development Time: Shorter
Cost: Potentially lower
Integration with SageMaker: Full

A

When to use built-in algorithms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
92
Q

Ease of Use: Low to moderate
Model Complexity: High
Specificity of Application: Specialized/niche use cases
Data Volume: Variable
Performance Optimization: Full control
Development Time: Longer
Cost: Potentially higher
Integration with SageMaker: Requires custom setup

A

When to use custom models

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
93
Q

_____ also referred to as limits, are the maximum number of resources you can create in an AWS service. They’re are set by AWS to help with resource optimization, ensuring availability and preventing abuse of services. They can vary by service, and also by regions within a service.

A

Service quotas

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
94
Q

You can view and manage your AWS service quotas using _____, which provides a central location to manage quotas across your account.

A

the AWS Management Console, AWS CLI, or AWS Service Quotas API.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
95
Q

What command do you use to describe the service quotas using AWS CLI?

A

service-quotas list-service-quotas

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
96
Q

AWS provides two ways to request a quota increase:

A

through the AWS Service Quotas console or by opening a support case

97
Q
  1. Navigate to the Service Quotas console.
  2. Select the service you need a quota increase for.
  3. Choose the specific quota.
  4. Click on the “Request quota increase” button and fill out the required form.
A

How to increase quota using AWS Service Quota

98
Q

Best Practices for Managing Quotas

A

Monitor Usage: Regularly monitor your usage against your service quotas with CloudWatch metrics, AWS Budgets, or custom scripts.

Set Alarms: Create alarms in CloudWatch to notify you when you are approaching a service quota.

Clean Up Resources: Terminate resources you no longer need to free up your service quota for other tasks.

Implement Cost Control: By staying within limits, you also keep costs down, ensuring that you are only paying for the resources you need.

Plan for Scale: Understand how your quotas scale across multiple regions, as some resources might have regional service quotas.

99
Q

When selecting an instance for machine learning workloads on AWS, it’s important to _____.

A

match the instance type to the specific needs of your application

100
Q

Perfect for a variety of workloads, these instances offer a balance of compute, memory, and networking resources. The T and M series fall into this category and can be a cost-effective solution for smaller machine learning tasks.

A

General Purpose Instances

101
Q

The R and X series fall under this umbrella and are designed for memory-intensive applications like large in-memory databases. They can also be useful for ML tasks that require substantial memory, such as certain types of clustering and large-scale linear algebra operations.

A

Memory Optimized Instances

102
Q

For ML workloads involving deep learning and large-scale tensor calculations, GPU instances like the P and G series are highly recommended. They are equipped with powerful GPUs that significantly accelerate model training and inference.

A

GPU Instances

103
Q

These instances are billed by the second, with no long-term commitments or upfront payments. While they offer flexibility for sporadic or unpredictable workloads, they are often the most expensive option over time.

A

On-demand instances

104
Q

By committing to a one or three-year term, you can save up to 75% over equivalent on-demand capacity, and these instances are ideal for predictable workloads with steady state usage.

A

Reserved Instances

105
Q

Spot instances offer the opportunity to take advantage of unused EC2 capacity at discounts of up to 90% compared to on-demand prices. These instances can be interrupted by AWS with two-minute notification, making them suitable for flexible or fault-tolerant ML workloads.

A

Spot Instances

106
Q
  • Rightsizing instances by benchmarking your ML workloads to select the instance that best matches your performance and cost requirements.
  • Use managed services like Amazon SageMaker that can abstract away the complexities of managing infrastructure and provide cost efficiencies through features like SageMaker Managed Spot Training.
  • Continuously monitor and adjust your usage with tools like AWS Cost Explorer and AWS Budgets.
A

Cost Optimization Strategies for ML

107
Q

Using ____ is a cost-effective strategy for training deep learning models, especially when the workload has flexible start and end times.

A

AWS Spot Instances

108
Q

_____ simplifies the process of deploying batch computing jobs on AWS. When combined, these technologies facilitate an efficient and scalable approach to handle extensive computational tasks such as deep learning training at a fraction of the cost.

A

AWS Batch

109
Q

_____ allow you to take advantage of unused EC2 computing capacity at up to a 90% discount compared to On-Demand prices. However, these instances can be interrupted by AWS with two minutes of notification when AWS needs the capacity back.

A

Spot Instances

110
Q

Despite the possibility of interruptions, _____ are well-suited for deep learning workloads as they are often resilient to interruptions – models can save checkpoints to persistent storage, allowing training to resume from the last saved state.

A

Spot Instances

111
Q

_____ automates the deployment, management, and scaling of batch jobs. It dynamically provisions the optimal quantity and type of compute resources based on the volume and specific resource requirements of the batch jobs submitted.

A

AWS Batch

112
Q
  • Define a compute environment: Choose ‘Spot’ as the ‘Provisioning model’. Set the ‘Maximum price’ you’re willing to pay per instance hour, which can be up to the On-Demand rate.
  • Create a job queue: Link your compute environment to a job queue by specifying priority levels.
  • Define job definitions: Specify the Docker image to use, vCPUs, memory requirements, and the job role, which should include necessary permissions for the AWS resources your job will access.
  • Submit jobs to the queue: Jobs submitted to this queue are then placed into the Spot Instance-based compute environment you’ve configured.
A

How to configure your compute environment within AWS Batch to use spot resources

113
Q

Implement _____ in your training code so that your models can resume from the last saved state if interrupted. Store _____in Amazon S3 or EFS for durability.

A

Checkpoints

114
Q

Use Amazon S3 for storing your datasets. It’s a highly available and durable storage service that integrates well with AWS Batch and Spot Instances.

A

Data Locality

115
Q

Set your maximum spot bid price. If the spot market price exceeds your bid, your Spot Instance may be reclaimed.

A

Bid Pricing

116
Q

Spread your spot requests across multiple instance types and Availability Zones to reduce the likelihood of simultaneous interruptions.

A

Diverse Spot Requests

117
Q

Spot Fleet: Use Spot Fleet to manage a group of Spot Instances and On-Demand Instances to optimize for cost and availability.

A

Spot Fleet

118
Q

Have a strategy to automatically fall back to On-Demand Instances when Spot Instances are not available for extended periods

A

Fallback to On-Demand

119
Q

Monitor your Spot Instances and job execution using _____ to alert you when critical events occur (e.g., Spot Interruption notices).

A

AWS CloudWatch

120
Q

Use _____ in conjunction with CloudWatch alarms to automate checkpointing and job restarts.

A

AWS Lambda functions

121
Q

_____ control inbound and outbound traffic and ensure that the resources, such as Amazon SageMaker instances, Amazon EC2 instances hosting ML models, or databases storing training data, are secure and accessible only by authorized entities.

A

Security groups

122
Q

_____ control the traffic based on rules that you define. You can specify rules that allow traffic to and from your instances, typically configured as a list of allowed IP protocols, ports, and source or destination IP ranges.

A

Security groups

123
Q

T/F: Security groups deny all inbound traffic and allow all outbound traffic.

A

True

124
Q

These rules govern incoming traffic to your instance.

A

Inbound rules

125
Q

These rules control the network traffic that leaves your instance, which may include allowing instances to call external APIs or access other AWS services.

A

Outbound

126
Q

Protocol: TCP
Port Range: 22
Source: Your IP
For secure shell access to an instance.

A

SSH

127
Q

Protocol: TCP
Port Range: 8888
Source: Specific IP Range
Jupyter notebooks for ML.

A

Custom TCP

128
Q

Protocol: TCP
Port Range: 80
Source: 0.0.0.0/0
Allow web access to ML Dashboards.

A

HTTP

129
Q

T/F: It is a common practice to allow all outbound traffic, but for enhanced security, you should limit it to only the ports and protocols necessary for your application to function.

A

True

130
Q

Only open up the ports that are necessary for your application to function. For instance, if your ML model only requires HTTP access, avoid opening the SSH port.

A

Principle of Least Privilege

131
Q

IP Restrictions: Restrict the IP addresses able to access your instance. For business-critical ML systems, access should ideally be from known IP ranges.

A

IP Restrictions

132
Q

Use different security groups for different roles within your infrastructure. For example, an Amazon RDS instance holding your data might have different security requirements compared to your Amazon SageMaker endpoint.

A

Separate Groups for Different Roles

133
Q

Regularly review and update your security group rules to ensure they reflect your current requirements and are free from any legacy configurations that may introduce risks.

A

Regular Reviews and Updates

134
Q

Some AWS services, like AWS PrivateLink for Amazon SageMaker, allow you to keep traffic between your VPC and the service within the AWS network, which reduces exposure to the internet and improves security.

A

Integration with AWS Services

135
Q

Security groups are _____, meaning that if you send a request from your instance, the response traffic for that request is allowed to flow in regardless of inbound security group rules.

A

stateful

136
Q

Additionally, security groups are associated with _____, which means that you can assign multiple security groups to a single network interface for granular control.

A

network interfaces

137
Q

_____ enables you to manage access to AWS services and resources securely by creating and managing AWS users and groups, and use permissions to allow and deny their access to AWS resources.

A

IAM

138
Q

Individuals or services who are granted access to resources in your AWS account.

A

Users

139
Q

A collection of users under a set of permissions. Adding a user to a _____ grants them the permissions of that _____.

A

Groups

140
Q

A set of permissions that grant access to actions and resources in AWS. It does not have standard long-term credentials (password or access keys) associated with it. Instead, when you assume a _____, it provides you with temporary security credentials for your _____session.

A

Roles

141
Q

Documents that define permissions and can be attached to users, groups, or roles. They are written in JSON and specify what actions are allowed or denied.

A

Policies

142
Q
  1. Least Privilege
  2. Rotate Credentials
  3. Enable MFA
  4. Audit/Log IAM Events
A

Security best practices for IAM

143
Q
  • Use IAM access advisor to check service last accessed information, thereby identifying unused permissions that can be revoked.
  • Conditionally apply permissions based on tags attached to users or resources, minimizing overly broad permissions.
A

How to maintain compliant and secure ML environments

144
Q

_____ are resource-based policies that allow you to manage permissions for your S3 resources. They enable you to grant or deny access to your S3 buckets and objects to both AWS accounts and AWS services. Typically, these permissions revolve around operations such as s3:GetObject, s3:PutObject, and s3:DeleteObject, which relate to reading, writing, and deleting objects within an S3 bucket.

A

S3 bucket policies

145
Q
  1. Granting cross-account access: With bucket policies, you can define permissions that allow these external entities to access the required resources.
  2. Restricting access based on IP address: You might want to restrict access to your ML resources to requests originating from specific IP ranges, especially when dealing with sensitive data.
  3. Enforcing data encryption: Enforcing the use of encryption on uploads ensures that your machine learning data remains secure at rest, which is essential for maintaining data privacy and compliance with various regulations.
  4. Preserving data integrity and versioning: For ML models that rely on consistent data, you may use bucket policies to prevent accidental deletion and ensure versioning is enabled, keeping a record of all changes to the objects in an S3 bucket.
A

Use Cases for S3 Bucket Policies in ML

146
Q

Attached directly to S3 buckets vs.
Attached to IAM users, groups, or roles

A

S3/IAM Policy difference on scope

147
Q

Broad access control, Cross-account sharing vs.
Fine-grained permissions, User-specific access

A

S3/IAM Policy difference on use case

148
Q

Implicit (the attached bucket) vs.
Must explicitly include S3 resource ARNs

A

S3/IAM Policy difference on resource definition

149
Q

Up to 20 KB per bucket policy vs.
Up to 2 KB per IAM policy (inline or managed)

A

S3/IAM Policy difference on size limit

150
Q
  1. Grant least privilege
  2. Regularly audit and rotate keys
  3. Validate your JSON
  4. Use conditions for extra security
A

S3 Bucket Policy Best Practices

151
Q

A _____ is an isolated section of the AWS cloud where you can launch AWS resources in a virtual network that you define, and it closely resembles a traditional network that you might operate in your own data center, with the benefits of using the scalable infrastructure of AWS.

A

VPC

152
Q
  1. Isolation: They provide a logically isolated area within the AWS cloud, ensuring that resources launched within them are not accessible by other ____ by default.
  2. Customization: Users have complete control over the virtual networking environment, including selection of IP address range, creation of subnets, and configuration of route tables and network gateways.
  3. Security: Groups of rules, known as security groups and network access control lists (ACLs), provide security at the protocol and port access level.
  4. Connectivity: Options include connecting to the internet, to your own data centers, or to other VPCs, providing flexibility for various deployment scenarios.
A

Key Features of VPCs

153
Q

Dividing a VPC into _____ allows for efficient allocation of IP ranges based on the network design. These can be public (internet-facing) or private (no direct internet access).

A

subnets

154
Q

_____ act as a virtual firewall for instances, controlling inbound and outbound traffic. _____ provide an additional layer of security, controlling traffic at the subnet level.

A

Security groups / Network ACLs

155
Q

To access AWS services securely without traversing the internet, _____ can be used which allows private connections to AWS services.

A

VPC endpoints

156
Q

For instances in a private subnet that need to initiate outbound internet traffic, a _____ is necessary, ensuring that the instances can connect to the internet while remaining private and secure.

A

NAT (Network Address Translation) instance or gateway

157
Q

Data Transfer Costs: Transferring data between different AWS services or the internet can incur costs. _____ can help reduce data transfer charges.

A

Efficient VPC design

158
Q

Performance: The choice of VPC components can impact the performance of ML models, especially during training and inference. Carefully consider the _____.

A

placement of resources and routing choices

159
Q

T/F: As machine learning workloads grow, the VPC design should facilitate easy scalability, without compromising on security or performance.

A

True

160
Q

_____ is the process of converting data into a code to prevent unauthorized access. On AWS, encryption ensures the confidentiality and integrity of your data both at rest and in transit.

A

Encryption

161
Q

_____ provides server-side encryption with Amazon S3-managed keys (SSE-S3), AWS KMS-managed keys (SSE-KMS), or customer-provided keys (SSE-C).

A

Amazon S3

162
Q

_____ encrypts volumes with keys managed by the AWS Key Management Service (KMS) or customer-managed keys.

A

Amazon EBS

163
Q

Amazon RDS and Amazon Redshift also support encryption at rest using _____.

A

AWS KMS

164
Q

Encrypting data in transit protects your data as it moves between services or locations. Common protocols include _____.

A

Secure Sockets Layer (SSL) or Transport Layer Security (TLS)

165
Q

Amazon API Gateway for encrypting API calls
Amazon Elastic Load Balancing (ELB) for SSL/TLS encryption
AWS Direct Connect with VPN for secure connections to AWS

A

AWS services that support encryption in transit

166
Q

_____ is the process of either encrypting or removing personally identifiable information from a dataset so that the identity of data subjects cannot be readily inferred.

A

Anonymization

167
Q

Replacing sensitive data with unique identification symbols that retain essential information without compromising its security.

A

Tokenization

168
Q

Obfuscation of specific data within a database so that the data structure remains intact but the information is not easily identifiable.

A

Masking

169
Q

Reducing the granularity of the data, for example, by reporting age in ranges rather than specific values.

A

Generalization

170
Q

The process of replacing private identifiers with fake identifiers or pseudonyms.

A

Pseudonymization

171
Q

By comparing encryption with anonymization, we can see that _____ is reversible, provided you have the necessary keys, while _____ is designed to be irreversible in order to protect identity:

A

encryption / anonymization

172
Q

Common encryption algorithms:

A

AES, RSA, ECC

173
Q

Common anonymization algorithms:

A

Tokenization, Masking, Generalization

174
Q

Before you expose an endpoint, you first need to _____.

A

Train a model

175
Q

Once the endpoint is active, you can _____.

A

Send data for real-time predictions

176
Q

Use _____ functions to manage endpoints.

A

Boto3

177
Q

Boto3 functions can:

A
  1. list all endpoints
  2. describe a specific endpoint
  3. update an endpoint
  4. delete an endpoint
178
Q

T/F: SageMaker endpoints come with IAM role-based access control and data passed to and from is encrypted.

A

True

179
Q

You can run endpoints within a _____for additional network isolation.

A

VPC

180
Q

Exposing endpoints within AWS using Amazon SageMaker consists of _____, _____, _____, and _____.

A
  1. training models
  2. deploying them to endpoints
  3. interacting with these endpoints for predictions
  4. managing and securing these endpoints
181
Q

Machine learning models can be broadly classified into three categories:

A

supervised, unsupervised, and reinforcement learning models. These categories are based on how the models interact with the data presented to them.

182
Q
  1. Utilize labeled datasets to predict outcomes.
  2. Frequently used models include linear regression for continuous outputs, logistic regression for classification tasks, decision trees, and neural networks.
    Example: Predicting house prices based on features such as square footage, number of bedrooms, and location.
A

Supervised Learning Models

183
Q
  1. Work with unlabeled data to uncover hidden patterns.
  2. Common models are clustering algorithms like K-means and hierarchical clustering, and dimensionality reduction techniques like PCA (Principal Component Analysis).
    Example: Segmenting customers into groups based on purchasing behavior.
A

Unsupervised Learning Models

184
Q
  1. Learn optimal actions through trial and error by maximizing a reward function.
  2. Used in scenarios where decision-making is sequential and the environment is dynamic.
    Example: A chess-playing AI that improves by playing numerous games.
A

Reinforcement Learning Models

185
Q

For _____tasks, you could use metrics like accuracy, precision, recall, F1 score, and ROC AUC (Receiver Operating Characteristic Area Under the Curve).

A

classification

186
Q

For _____ tasks, common metrics include mean squared error (MSE), mean absolute error (MAE), and R-squared.

A

regression

187
Q

_____ are the parameters of the learning algorithm itself, and tuning them is essential to optimize model performance.

A

Hyperparameters

188
Q

_____ can be used to perform hyperparameter optimization by running multiple training jobs with different hyperparameter combinations to find the best version of a model.

A

AWS SageMaker Automatic Model Tuning

189
Q
  • Provides a fully managed service for building, training, and deploying machine learning models.
  • Includes features like Jupyter notebook instances, built-in high-performance algorithms, model tuning, and automatic model deployment in a scalable environment.
A

Amazon SageMaker

190
Q
  • Allows running code in response to events without provisioning or managing servers.
  • Can be used to trigger machine learning model inferences based on real-time data.
A

AWS Lambda

191
Q
  • Provides the ability to attach low-cost GPU-powered inference acceleration to Amazon SageMaker instances or EC2 instances.
  • Useful for reducing costs for compute-intensive inference workloads.
A

AWS Elastic Inference

192
Q
  1. Preprocess data efficiently using Amazon SageMaker Processing.
  2. Use AWS Glue for data cataloging and ETL (extract, transform, load) processes.
  3. Store and retrieve datasets with Amazon S3 (Simple Storage Service).
  4. Monitor model performance over time with Amazon SageMaker Model Monitor.
  5. Enhance security by using AWS Identity and Access Management (IAM) to control access to AWS resources.
A

Best Practices for ML Models on AWS

193
Q

_____ is commonly used to validate the effectiveness of predictive models. It’s a way to compare two or more versions of a model in parallel by exposing them to a real-time environment where they can be evaluated based on actual performance metrics.

A

A/B testing

194
Q

Before you can perform _____, you must train at least two variants of your machine learning model. Using SageMaker, you can train models using built-in algorithms or bring your own custom algorithms.

A

A/B testing

195
Q

After training your models, you can deploy them to an _____for A/B testing. SageMaker lets you deploy multiple models to a single endpoint and split the traffic between them.

A

endpoint

196
Q

Use _____ to monitor their performance in terms of accuracy, latencies, error rates, and other relevant metrics.

A

CloudWatch

197
Q

What are the three steps involved in implementing A/B testing?

A
  1. Model training
  2. Model deployment for A/B testing
  3. Monitor and analyze results
198
Q

A _____ is a sequence of steps designed to automatically refresh your machine learning model with new data, helping to keep the model relevant as the data changes.

A

retraining pipeline

199
Q
  1. Data Collection & Preprocessing: Collecting new data samples and applying the same preprocessing steps as the initial model training.
  2. Model Retraining: Using the updated dataset to retrain the model or incrementally update it using online learning techniques.
  3. Validation & Testing: Evaluating the performance of the model on a validation set to ensure it meets performance thresholds before deployment.
  4. Deployment: Replacing the existing model with the updated one in the production environment.
A

Steps involved in retraining pipeline

200
Q

Stores data and model artifacts securely.

A

Amazon S3

201
Q

Runs code in response to triggers such as a schedule event or a change in data.

A

AWS Lambda

202
Q

Offers a fully managed service that provides every developer and data scientist with the ability to build, train, and deploy machine learning models.

A

Amazon SageMaker

203
Q

Coordinates multiple AWS services into serverless workflows so you can build and update apps quickly.

A

AWS Step Functions

204
Q

Monitors your pipeline, triggering retraining based on a schedule or a specific event.

A

Amazon CloudWatch

205
Q

Prepares and loads data, perfect for the ETL (extract, transform, load) jobs required during preprocessing.

A

AWS Glue

206
Q
  1. New data arrives in an Amazon S3 bucket, triggering an AWS Lambda function.
  2. The Lambda function invokes an AWS Step Functions workflow which starts the retraining process.
  3. AWS Glue is used to prepare the data by performing ETL tasks and depositing the processed data back into S3.
  4. An Amazon SageMaker training job is initiated by Step Functions to retrain the model with the new data. The model artifacts are stored in another S3 bucket.
  5. Once the model is retrained, validation and testing are performed using SageMaker’s batch transform feature or live endpoint testing.
  6. If the performance of the new model meets predefined thresholds, a deployment is initiated, where the new model replaces the old one at the SageMaker endpoint.
  7. Amazon CloudWatch is used for monitoring the model’s performance and logging the entire pipeline’s steps for compliance and debugging purposes.
A

How a retraining pipeline might be implemented on AWS

207
Q

Periodically retrain the model with a batch of new data. This can be scheduled daily, weekly, etc., based on the problem requirements.

A

Batch Retraining

208
Q

Implement a streaming data solution where the model is continuously updated in near-real-time.

A

Continuous Retraining

209
Q

Use triggers such as deterioration in model performance or significant changes in incoming data to initiate retraining.

A

Trigger-based Retraining

210
Q

AWS CodePipeline and AWS CodeBuild, combined with SageMaker, can facilitate

A

Combining the concept of CI/CD with ML models to ensure that retraining pipelines have an automated, reliable flow.

211
Q

Tools like _____ and _____ can help detect when models start to perform poorly compared to their benchmarks, signaling when a retraining cycle should be initiated.

A

Amazon CloudWatch and SageMaker Model Monitoring

212
Q
  1. Missing Values: Ensure that your dataset does not have significant amounts of missing data. If it does, you can use techniques like imputation to fill in the gaps.
  2. Outliers: Extreme values can distort training and lead to poor performance. Detecting and handling outliers is a crucial part of the data preprocessing step.
  3. Feature Distribution: Check if your features have a distribution that your algorithm can work with effectively. Some algorithms, like neural networks, may require data normalization or standardization
A

Various data issues

213
Q

_____ occurs when the model performs well on training data but poorly on unseen data.

A

Overfitting

214
Q

_____ is when the model is too simple to capture the patterns in the data.

A

Underfitting

215
Q

Analyze _____ to diagnose underfitting or overfitting.

A

Learning Curves

216
Q

Plotting training and validation accuracy or loss over epochs can tell you _____.

A

if the model is learning as expected

217
Q

The learning rate in gradient descent algorithms must be chosen carefully to avoid _____.

A

overshooting the minimum, this is why tuning hyperparameters is crucial for model performance

218
Q

Debugging tools

A

SageMaker Debugger
CloudWatch Logs

219
Q

_____ makes it easy to monitor and visualize the training of machine learning models in real-time. It allows you to detect and analyze issues like vanishing gradients, overfitting, and poor weight initialization.

A

SageMaker Debugger

220
Q

_____ help you monitor and troubleshoot your models. It can collect and track metrics, collect and monitor log files, set alarms, and automatically react to changes in your AWS resources.

A

CloudWatch Logs

221
Q
  1. Divergent loss
  2. Slow training
  3. Poor generalization
A

Common problems when training models

222
Q

If the loss diverges instead of converging, it’s often due to _____.

A

a high learning rate or unstable optimization algorithm

223
Q

Slow training could be due to _____. Optimizing the computation graph or using a more powerful instance type may help.

A

inefficient data loading, suboptimal model architecture, or lack of hardware resources

224
Q

If the model performs well on the training data but poorly on test data, consider _____.

A

using regularization techniques, obtaining more training data, or simplifying the model

225
Q

Changes in data input patterns, which can degrade model performance over time.

A

data drift

226
Q

When model predictions become less accurate due to real-world changes.

A

model drift

227
Q

_____ is a monitoring service that provides you with data and actionable insights to monitor your applications. You can set alarms in this to notify you when certain thresholds are breached, and use it to to monitor the compute resources your models are using, like CPU and memory utilization.

A

AWS CloudWatch

228
Q

For monitoring machine learning models, _____ can detect and alert on data drift and other issues that may impact model performance. It continually checks deployed models against a baseline to detect deviations in data quality and automatically alerts you.

A

Amazon SageMaker Model Monitor

229
Q

You can also create customized alarms by _____.

A

defining specific metrics that are most indicative of your application’s health

230
Q

If the performance drop is due to resource constraints, you can use _____ to adjust the number of instances or compute capacity based on demand automatically. This can respond to increased latency or load by adding additional resources.

A

AWS Auto Scaling

231
Q

Data drift or changes in the external environment could lead to performance degradation. Amazon SageMaker can be set up for automatic retraining pipelines using _____. You can set triggers based on drift detection metrics indicating when retraining should occur.

A

SageMaker Pipelines

232
Q

You may need to update your model endpoints if a new model has been trained that better reflects current data trends. _____ make it possible to perform A/B testing or directly replace the existing model with minimal downtime.

A

Amazon SageMaker Endpoints

233
Q

If cost-related performance drops are an issue (e.g., due to downscaling instances for budget reasons), you can use _____for training or inferencing tasks, and this will let you take advantage of unused EC2 capacity at a discount.

A

Amazon EC2 Spot Instances

234
Q
  1. Accuracy: The ratio of correctly predicted instances to total instances.
  2. Precision: The ratio of true positives to the sum of true and false positives.
  3. Recall: The ratio of true positives to the sum of true positives and false negatives.
  4. F1 score: The harmonic mean of precision and recall.
  5. AUC-ROC: Area Under the Receiver Operating Characteristic Curve.
A

Metrics used to measure performance

235
Q
  1. Invocations: The number of times the model endpoint is called.
  2. Invocation errors: Errors encountered when calling a model.
  3. Latency: The response time of invocations.
A

Metrics monitored by CloudWatch

236
Q

AWS SageMaker is an end-to-end machine learning service, which also offers a specific feature for monitoring models called _____. This enables continuous monitoring of your machine learning models for data quality, model quality, and operational aspects, alerting on issues in real time.

A

Model Monitor

237
Q
  1. Data drift
  2. Model performance (e.g., accuracy, AUC-ROC)
  3. Feature attribute importance
A

Metrics tracked by Model Monitor

238
Q

_____ helps improve model transparency and explains predictions by identifying feature attributions that contribute to the prediction outcomes. Such insights can be instrumental in monitoring if the model is relying on the correct features and if it’s making predictions for the right reasons, which can impact performance.

A

Amazon SageMaker Clarify

239
Q
  1. Define clear metrics and thresholds: Know what “good” performance means for your particular model and use-case.
  2. Monitor in real time: Catch issues as they arise by monitoring model performance continuously.
  3. Implement robust logging: Capture all relevant data points to facilitate thorough analysis later.
  4. Handle model drift: Re-evaluate and re-train your models if the input data changes considerably over time.
  5. Ensure model explainability: Being able to explain your model’s decisions is essential for end-user trust and regulatory compliance.
A

Best practices for model monitoring