Machine Learning Quizzes - Nikolai Flashcards

Question

What is the primary purpose of Amazon SageMaker Model Monitor?

Answer 1

To monitor model quality and detect issues like data and model drift SageMaker Model Monitor automatically tracks the quality of deployed models and detects issues like data drift, model quality drift, bias, and feature attribution drift.

Answer 2

When the data received in production differs from the data used during model training. Data drift occurs when the data used to train the model starts to differ significantly from the data received in production, potentially impacting model performance.

Answer 3

Confusing Matrix A confusion matrix is used to evaluate the performance of a binary classification model by showing the true positives, true negatives, false positives, and false negatives.

Answer 4

To automate and manage machine learning workflows, including building, training, and deploying models. SageMaker Pipelines is a serverless workflow orchestration service designed for automating and managing machine learning workflows, including building, training, and deploying models.

Answer 5

Name, Steps, and Parameters A SageMaker Pipeline instance is composed of three components: Name (the unique identifier for the pipeline), Steps (which define the actions in the workflow), and Parameters (which allow customization of the pipeline’s behavior).

Answer 6

It handles tens of thousands of concurrent workflows, ensuring scalability. SageMaker Pipelines is designed to be highly scalable, capable of handling tens of thousands of concurrent machine learning workflows in production.

Answer 7

To assign a label to the entire image based on learned features. The main goal of image classification is to assign a label to the entire image by learning relevant features from the image data. The algorithm simplifies complex datasets by labeling unseen images based on learned patterns.

Answer 8

The algorithm learns from labeled data to predict outcomes. Supervised learning involves training a model on labeled data, where the outcome is known, and using this data to predict outcomes for new, unseen data.

Answer 9

To identify hidden patterns or groupings within unlabeled data. Unsupervised learning seeks to uncover hidden patterns or structures in unlabeled data without prior knowledge of outcomes.

Answer 10

The agent interacts with an environment and receives rewards or penalties based on its actions. In reinforcement learning, the agent interacts with an environment and learns by trial and error through receiving rewards or penalties for its actions.

Answer 11

Optimizing a robot's path in an uncertain environment. Reinforcement learning is particularly useful in dynamic and uncertain environments, like navigation for robots.

Answer 12

It evaluates how well an agent's actions align with its goal. The reward function evaluates the benefit or cost of actions taken by the agent, helping it to learn what actions lead to better outcomes.

Answer 13

The proportion of correctly classified instances out of all instances. Accuracy measures the proportion of correctly classified instances out of the total instances, regardless of their class.

Answer 14

F1 Score The F1 score is the harmonic mean of precision and recall, helping to balance both when assessing a model’s performance.

Answer 15

To indicate how much variance in the target variable is explained by the model. R-squared measures the proportion of variance in the target variable that is explained by the regression model, indicating how well the model fits the data.

Answer 16

Data remains within the AWS region of the API call Data, including prompts and responses, remains within the same AWS region where the API is called from, ensuring data security and compliance with regional data protection regulations.

Answer 17

Pay-as-you-go based on input and output tokens Amazon Bedrock offers a pay-as-you-go pricing model based on the number of input and output tokens processed. There is also a provisioned throughput mode for larger, more steady workloads.

Answer 18

Ability to deploy AI models without managing infrastructure The key benefit is the ability to deploy models without the need to manage the underlying infrastructure.

Answer 19

To create custom fraud detection models using machine learning. Amazon Fraud Detector helps in building custom machine learning models specifically designed to detect potential fraud.

Answer 20

Providing a human review workflow for machine learning predictions. Human-in-the-loop workflows Amazon Augmented AI (A2I) provides a human review system for machine learning predictions, allowing humans to review and validate predictions when necessary.

Answer 21

To analyze and extract insights from text using natural language processing (NLP). Analyzing sentiment in text data. Amazon Comprehend is an NLP service that analyzes text and extracts insights such as sentiment, key phrases, entities, and language.

Answer 22

Amazon Comprehend identifies entities in text, and A2I allows human review of the identified entries for validation. Amazon Comprehend can extract entities from text, and Amazon Augmented AI (A2I) can be used to allow human reviewers to validate or correct the extracted data.

Answer 23

To extract text, tables, and forms from scanned documents. Amazon Textract automatically extracts text, tables, forms, and other data from scanned documents, making it easy to analyze and process document content.

Answer 24

An enterprise search service that uses natural language to return relevant results. Amazon Kendra is an intelligent search service that allows users to search across internal documents and knowledge bases using natural language queries.

Answer 25

Recognizing faces, objects, and scenes in images and videos. Amazon Rekognition can detect faces, objects, and scenes in images and videos, as well as analyze video content for facial recognition and object detection. Rekognition doesn't perform sentiment analysis; it handles visual content analysis.

Answer 26

The unique identifier of an object within a bucket. In Amazon S3, a "key" uniquely identifies an object within a bucket and includes the full path to the object along with its file name.

Answer 27

Buckets and objects. Amazon S3's structure consists mainly of Buckets, which store Objects—the actual files.

Answer 28

Bucket names must be globally unique across all AWS accounts. S3 bucket names must be unique globally across all AWS accounts and regions.

Answer 29

Providing a fully managed extract, transform, and load (ETL) service. AWS Glue is a fully managed ETL service that makes it easy to prepare and load data for analytics. It simplifies the process of moving data between data stores and transforming it for analysis.

Answer 30

Data is collected over time and processed all at once. Batch data ingestion involves collecting data over a set period and then processing it all at once. This method is efficient for handling large volumes of data that do not require immediate processing.

Answer 31

To create and update the AWS Glue Data Catalog with table definitions from data stores. An AWS Glue crawler connects to your data stores, determines the schema for your data, and populates the AWS Glue Data Catalog with metadata tables. This makes your data searchable and queryable using services like Amazon Athena and Amazon Redshift Spectrum. AWS Glue crawlers do not execute ETL jobs; they are used to discover data and store metadata. ETL jobs are defined and run separately within AWS Glue.

Answer 32

To schedule and trigger AWS Glue jobs based on predefined events.

Answer 33

To monitor and log AWS Glue job performance in real-time. Monitoring and logging are managed through AWS CloudWatch and AWS Glue job metrics, not by crawlers.

Answer 34

To enhance query performance by reducing the amount of data scanned. Partitioning data in Amazon Athena allows queries to scan only the relevant partitions. This reduces the amount of data read during a query, improving performance and lowering costs.

Answer 35

To query and analyze data across multiple data sources without moving it. Federated queries in Athena let you run SQL queries across various data sources (like DynamoDB, RDS) without the need to extract or load the data into S3.

Answer 36

Compressing data using columnar storage formats like Parquet. Using compressed, columnar storage formats like Parquet reduces the data size and the amount of data scanned during queries, directly lowering costs.

Answer 37

It automates the creation and management of partitions, improving query performance. Partition projection automates the creation and management of partitions, which improves query performance by reducing the need to manually specify partitions and allowing Athena to quickly prune unnecessary partitions during query execution.

Answer 38

Handles each request independently without storing previous data. A stateless system processes each request without maintaining information from prior interactions.

Answer 39

Tracks the context of previous data ingestion events to prevent reprocessing the same data. Stateful ingestion systems maintain context, such as timestamps, to avoid reprocessing and enhance efficiency.

Answer 40

AWS Glue AWS Glue uses bookmarks to remember previously processed data, allowing for stateful ingestion and incremental loading.

Answer 41

Facilitates parallel processing and enhances query performance. Partitioning in AWS Glue allows for parallel processing and boosts query performance by enabling queries to access only the relevant partitions instead of scanning the entire dataset.

Answer 42

Python Shell Python Shell jobs are ideal for straightforward data transformations and tasks that do not need the complexity of a full Spark environment.

Answer 43

Spark is used for more complex, distributed data processing.

Answer 44

Enables users to create and manage dependencies between ETL jobs. AWS Glue Workflows allow the orchestration of multiple jobs and tasks in a specific sequence by managing complex ETL job dependencies.

Answer 45

Extract, transform, and load (ETL) is the process of combining data from multiple sources into a large, central repository called a data warehouse. ETL is a broad term for data processing tasks and doesn't specify the job type.

Answer 46

Scala is a language used in Spark for more advanced tasks but is not the simplest option for basic transformations.

Answer 47

Data cleaning and transformation Glue DataBrew offers a visual tool for preparing and transforming data without needing to code.

Answer 48

It automatically scales according to demand. AWS Lambda automatically adjusts its scaling to handle the number of requests, providing seamless scalability without user input.

Answer 49

Each execution is independent and does not save state from prior runs. AWS Lambda functions are stateless by design, meaning each invocation is isolated and doesn’t retain memory from previous executions.

Answer 50

Real-time data processing triggered by new file uploads to an S3 bucket. AWS Lambda is ideal for real-time, event-based tasks like processing files when they are uploaded to an S3 bucket. Lambda has a maximum execution time limit, making it unsuitable for long-running jobs.

Answer 51

For scheduled tasks like monthly data warehousing, services such as AWS Glue are better suited.

Answer 52

Kinesis Data Firehose Kinesis Data Firehose is specifically designed to stream data into AWS services like Amazon S3 and Redshift, making it ideal for this task.

Answer 53

Easier error handling and improved data throughput. KPL does not specifically enhance data security; it focuses more on efficient data production. KPL is tailored for integration with AWS services and doesn’t directly enhance compatibility with non-AWS streaming sources.

Answer 54

A processing unit with a fixed data throughput capacity. Shards provide a fixed amount of data throughput, typically measured in data and record rates per second.

Answer 55

Kinesis Data Streams is mainly used to capture and process large volumes of streaming data in real time, not directly for transferring data to storage services.

Answer 56

Kinesis Data Analytics is intended for analyzing and processing streaming data, not for delivering it to storage services.

Answer 57

Managed Apache Flink is used for advanced analytics on streaming data and does not focus on data delivery to storage.

Answer 58

A fully managed service that ingests real-time data and optionally transforms it before storing it. Kinesis Data Firehose is a managed service designed to collect, optionally transform, and load streaming data into AWS storage systems.

Answer 59

To perform real-time data transformation during data ingestion. AWS Lambda can be integrated with Kinesis Data Firehose to process and transform data in real-time during ingestion.

Answer 60

It batches incoming data to optimize data transfer to the destination. Buffering mechanism in Kinesis Data Firehose collects data into batches (based on size or time) to improve transfer efficiency and minimize API calls.

Answer 61

Orchestrating the distribution of data and tasks. The master node in an Amazon EMR cluster is responsible for managing the cluster, including coordinating the distribution of data and tasks among the other nodes.

Answer 62

Instance Store Instance Store is an ephemeral (lasting for short time) storage option, which is cost-effective for temporary data storage during EMR operations

Answer 63

EMR Serverless EMR Serverless automatically provisions and scales compute capacity based on the workload's needs.

Answer 64

Do not MANAGE servers; as-used basis Serverless computing is a cloud computing execution model that allocates machine resources on an as-used basis. Under a serverless model, developers can build and run applications without having to manage any servers and pay only for the exact amount of resources used.

Answer 65

To allow multiple versions of an object to be stored and retrieved. Enabling versioning on an S3 bucket allows multiple versions of an object to be stored, providing a way to recover and restore previous versions if needed.

Answer 66

By setting up a lifecycle policy to move objects to S3 Intelligent Tiering. S3 Intelligent-Tiering automatically moves objects between different access tiers based on usage patterns, optimizing cost.

Answer 67

S3 Glacier Instant Retrieval S3 Glacier Instant Retrieval is designed for rarely accessed data, but it still offers quick retrieval times when the data is needed.

Answer 68

To query and retrieve specific data from an object without downloading the entire object. S3 Select allows users to perform SQL-like queries to retrieve specific data from a larger object, saving time and reducing the cost of data retrieval.

Answer 69

Bucket Policies and IAM Permissions Bucket Policies and IAM Permissions control who has access to the S3 bucket and its objects, ensuring security.

Answer 70

A new object is uploaded to the bucket. S3 Event Notifications can be triggered when specific actions occur, such as an object being uploaded, deleted, or restored from Glacier.

Answer 71

Data as a product, with decentralized data ownership and domain-oriented architecture. In a Data Mesh, each domain owns and treats its data as a product, ensuring it is accessible and well-governed by those who know it best.

Answer 72

EBS is block storage, while EFS is file storage Amazon EBS is block-level storage, which is useful for databases and applications requiring raw storage, while EFS is a managed file system that allows concurrent access to files.

Answer 73

EFS allows concurrent access from multiple instances. The key advantage of EFS is that it supports concurrent access by multiple EC2 instances, making it ideal for shared file storage.

Answer 74

To define permissions and control access to AWS resources. IAM policies are documents that specify permissions, controlling which actions users and roles can perform on AWS resources.

Answer 75

An entity that defines a set of permissions for making AWS service requests. An IAM role is an identity with specific permissions that can be assumed by trusted entities, allowing them to make AWS service requests.

Answer 76

A JSON document that defines permissions for actions on AWS resources. An IAM policy is a JSON document that specifies permissions, defining what actions are allowed or denied on AWS resources.

Answer 77

To manage encryption keys for AWS services and applications. AWS KMS is a managed service that makes it easy to create and control encryption keys used to encrypt your data across AWS services and applications.

Answer 78

Amazon Macie Amazon Macie uses machine learning to recognize sensitive data such as personally identifiable information (PII) and intellectual property.

Answer 79

Protection against DDoS Attacks AWS Shield is a managed DDoS protection service that safeguards applications running on AWS. AWS Shield provides protection against Distributed Denial of Service (DDoS) attacks.

Answer 80

Storing and rotating database credentials, API keys, and other secrets. AWS Secrets Manager helps you securely store, distribute, and rotate credentials and keys.

Answer 81

To enable direct network connectivity between two VPCs. A VPC peering connection allows you to route traffic between two VPCs using private IP addresses, enabling direct connectivity.

Answer 82

Security Groups operate at the instance level; NACLs operate at the subnet level. Security Groups act as a virtual firewall for instances, controlling inbound and outbound traffic at the instance level. NACLs control traffic at the subnet level.

Answer 83

To securely connect an on-premises network to an AWS VPC over the internet. An AWS VPN connection enables a secure communication tunnel between your on-premises network and your AWS VPC over the public internet.

Answer 84

To monitor and log API calls made within your AWS account. AWS CloudTrail records API calls and events for your AWS account, providing a history of AWS API calls made through the AWS Management Console, SDKs, and command-line tools.

Answer 85

Tracking resource inventory and changes over time. AWS Config provides a detailed view of the configuration of AWS resources in your account and tracks changes over time.

Answer 86

Cost Reduction Cost Reduction is not a pillar; the correct pillar is Cost Optimization. (1) Operational excellence (2) Security (3) Reliability (4) Performance efficiency (5) Cost optimization

Answer 87

AWS CloudTrail is an AWS service that helps you enable operational and risk auditing, governance, and compliance of your AWS account.

Answer 88

A text file containing definitions of AWS resources and their configurations. CloudFormation template is a text file in JSON or YAML format that defines AWS resources and their configurations.

Answer 89

Fargate launch type is a serverless feature requiring no management of underlying infrastructure. Fargate is a serverless feature where AWS manages the underlying infrastructure.

Answer 90

It ensures tasks are evenly distributed across container instances to improve fault tolerance. The 'spread' strategy ensures tasks are evenly distributed across instances, which improves fault tolerance and availability.

Answer 91

To monitor and provide operational data from AWS resources. Amazon CloudWatch is primarily used to monitor and provide operational data, such as metrics, logs, and alarms, from AWS resources.

Answer 92

A feature that delivers real-time metrics to a specified AWS service or external system. CloudWatch Metrics Streams continuously stream real-time metrics data to a destination, such as an external system or AWS service, for further analysis or storage.

Answer 93

A specific metric breaching a defined threshold. Amazon CloudWatch Alarms are designed to monitor specific metrics (like CPU utilization, memory usage, etc.) and trigger notifications or actions when those metrics breach thresholds you define.

Answer 94

To convert text to lifelike speech. Amazon Polly is a service that converts text into lifelike speech using advanced deep learning technologies.

Answer 95

Automatically transcribing spoken language into text from audio files. Amazon Transcribe is designed to convert spoken language from audio files into accurate text transcriptions.

Answer 96

Transcribe can convert speech to text, and Polly can then convert that text back into speech. Amazon Transcribe can convert speech into text, and Amazon Polly can take that text and convert it back into speech, making it possible to process and re-synthesize spoken content.

Machine Learning Quizzes - Nikolai Flashcards

(120 cards)