Machine Learning Quizzes - Nikolai Flashcards

1
Q

What is the primary benefit of using Amazon SageMaker notebooks for machine learning tasks?

A

They provide an interactive environment with full managed infrastructure for machine learning tasks.

Amazon SageMaker notebooks are fully managed Jupyter notebooks that provide an interactive environment for managing machine learning tasks. They handle the setup and configuration of the underlying infrastructure, including servers, environments, and software.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Which feature of SageMaker notebooks ensures that your work and data are not lost between sessions?

A

Persistent Storage

SageMaker notebooks come with persistent storage, ensuring that data and work are saved and can be accessed later, even between sessions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a key difference between a SageMaker notebook instance and a SageMaker Studio notebook?

A

Studio notebooks are integrated within SageMaker Studio and allow managing multiple notebooks.

SageMaker Studio notebooks are part of SageMaker Studio, offering an integrated environment where users can manage multiple notebooks, access persistent storage, and perform tasks within a unified interface.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Which of the following is a key feature of SageMaker Data Wrangler?

A

It provides over 300 pre-configured data transformations out of the box.

SageMaker Data Wrangler provides more than 300 pre-configured transformations to help with tasks like handling missing values and encoding categorical data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is one of the key features of SageMaker Data Wrangler that helps in transforming categorical features into numerical ones?

A

One-hot encoding

One-hot encoding is a common technique used to convert categorical data into numerical data, which is essential for certain machine learning algorithms that require numerical input. SageMaker Data Wrangler offers this transformation out-of-the-box.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Which of the following is NOT a capability of SageMaker Data Wrangler?

A

Creating SQL queries to fetch data.

SageMaker Data Wrangler does not create SQL queries to fetch data. It focuses on importing data, transforming it, and visualizing it within the SageMaker environment.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the benefit of using the “human in the loop” feature in SageMaker Ground Truth?

A

It allows a combination of human input and machine learning models to improve labeling accuracy.

“Human in the loop” allows SageMaker Ground Truth to combine human feedback with machine learning models, improving labeling efficiency and accuracy over time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a primary function of SageMaker Ground Truth?

A

Labeling datasets for machine learning model training.

SageMaker Ground Truth is primarily used for labeling datasets, which is crucial for preparing data for machine learning model training.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is one of the main purposes of SageMaker Feature Store?

A

To store, manage, and share features for machine learning models.

SageMaker Feature Store is used to store, manage, and share features across different machine learning models and teams, helping streamline feature reuse and collaboration.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a key advantage of using built-in algorithms in Amazon SageMaker?

A

They are optimized for specific use cases and abstract away infrastructure management.

Built-in algorithms in SageMaker are optimized for specific tasks and come pre-packaged, making it easier to use without needing to manage infrastructure or perform complex setups.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Which type of learning is not one of the four main types of SageMaker’s built-in algorithms?

A

Reinforcement Learning

Reinforcement learning is not included in the four main categories of built-in algorithms in SageMaker.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the primary benefit of using SageMaker Jumpstart for machine learning development?

A

It provides access to pre-trained models and prebuilt solutions, reducing development time.

SageMaker Jumpstart allows users to quickly develop and deploy machine learning models by providing pre-trained models and prebuilt solutions, minimizing the time spent on development.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a hyperparameter in the context of machine learning?

A

A configuration that controls how a model learns and operates, set before training begins.

Hyperparameters are configurations set before the training process begins that control how a model learns and operates, such as the depth of a decision tree.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Which of the following is true about Amazon SageMaker’s automatic model tuning feature?

A

It runs multiple training jobs with different hyperparameter combinations to find the best model.

SageMaker’s automatic model tuning runs multiple training jobs with different hyperparameter combinations to find the best-performing model based on the chosen performance metric.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Which hyperparameter tuning technique ensures that every possible combination of hyperparameters is tested?

A

Grid search

Grid search tests every possible combination of specified hyperparameters by creating a grid and testing each combination in sequence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the primary advantage of using custom Docker containers in Amazon SageMaker?

A

Custom containers allow full control over the training and inference environment, including specific libraries and operating system choices.

Custom Docker containers provide full control over the environment, allowing users to include their own code, dependencies, libraries, and operating system.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

When using a custom Docker container for training in SageMaker, which service is used to store and retrieve the Docker image?

A

Amazon Elastic Container Registry (ECR)

Amazon ECR is used to store Docker images, which are then pulled by SageMaker to run training jobs or inference.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

In SageMaker, what must a custom Docker container expose during inference to function correctly?

A

A REST API endpoint

During inference, a custom Docker container must expose a REST API endpoint for SageMaker to make predictions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the main purpose of using SageMaker Experiments?

A

To track, organize, and compare multiple model training runs.

SageMaker Experiments helps track, organize, and compare different model training runs (called trials) to identify the best configuration. It captures all the details of hyperparameters, datasets, and performance metrics to assist in the comparison process.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

In SageMaker Experiments, what is a trial component?

A

A step or stage within a training run, such as data preprocessing or model evaluation.

A trial component in SageMaker Experiments represents the various stages of a machine learning workflow, such as data preprocessing, model training, or evaluation. It helps capture the metadata for each stage.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is the primary function of SageMaker Clarify?

A

To detect and mitigate biases in machine learning models

SageMaker Clarify is designed to detect and mitigate biases in machine learning models, ensuring fairness and transparency.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Which of the following is a method used by SageMaker Data Wrangler to address class imbalance in datasets?

A

Random Undersampling

Random undersampling is one of the techniques used in SageMaker Data Wrangler to handle class imbalance by reducing the number of samples in the majority class.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is the primary purpose of Amazon SageMaker Debugger?

A

To monitor and debug training jobs in real-time

SageMaker Debugger enables real-time monitoring and debugging of model training jobs, allowing users to detect issues such as vanishing gradients or overfitting during the training process.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Which type of issue can SageMaker Debugger detect during model training?

A

Overfitting

SageMaker Debugger can detect common training issues like overfitting, vanishing gradients, and underfitting during model training.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What is the primary purpose of Amazon SageMaker Model Monitor?

A

To monitor model quality and detect issues like data and model drift

SageMaker Model Monitor automatically tracks the quality of deployed models and detects issues like data drift, model quality drift, bias, and feature attribution drift.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is “data drift” in the context of SageMaker Model Monitor?

A

When the data received in production differs from the data used during model training.

Data drift occurs when the data used to train the model starts to differ significantly from the data received in production, potentially impacting model performance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Which metric would be most relevant for monitoring a binary classification model in SageMaker Model Monitor?

A

Confusing Matrix

A confusion matrix is used to evaluate the performance of a binary classification model by showing the true positives, true negatives, false positives, and false negatives.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What is the primary function of SageMaker Pipelines in the context of machine learning workflows?

A

To automate and manage machine learning workflows, including building, training, and deploying models.

SageMaker Pipelines is a serverless workflow orchestration service designed for automating and managing machine learning workflows, including building, training, and deploying models.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

In SageMaker Pipelines, what are the three main components that define a pipeline instance?

A

Name, Steps, and Parameters

A SageMaker Pipeline instance is composed of three components: Name (the unique identifier for the pipeline), Steps (which define the actions in the workflow), and Parameters (which allow customization of the pipeline’s behavior).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What is a key benefit of using SageMaker Pipelines in production environments?

A

It handles tens of thousands of concurrent workflows, ensuring scalability.

SageMaker Pipelines is designed to be highly scalable, capable of handling tens of thousands of concurrent machine learning workflows in production.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What is the primary goal of image classification in machine learning?

A

To assign a label to the entire image based on learned features.

The main goal of image classification is to assign a label to the entire image by learning relevant features from the image data. The algorithm simplifies complex datasets by labeling unseen images based on learned patterns.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Which of the following statements best describes supervised learning?

A

The algorithm learns from labeled data to predict outcomes.

Supervised learning involves training a model on labeled data, where the outcome is known, and using this data to predict outcomes for new, unseen data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What is the primary goal of unsupervised learning?

A

To identify hidden patterns or groupings within unlabeled data.

Unsupervised learning seeks to uncover hidden patterns or structures in unlabeled data without prior knowledge of outcomes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

What is a key feature of reinforcement learning in general?

A

The agent interacts with an environment and receives rewards or penalties based on its actions.

In reinforcement learning, the agent interacts with an environment and learns by trial and error through receiving rewards or penalties for its actions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Which of the following is a task well-suited for reinforcement learning?

A

Optimizing a robot’s path in an uncertain environment.

Reinforcement learning is particularly useful in dynamic and uncertain environments, like navigation for robots.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

What is the role of the reward function in reinforcement learning?

A

It evaluates how well an agent’s actions align with its goal.

The reward function evaluates the benefit or cost of actions taken by the agent, helping it to learn what actions lead to better outcomes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

What does accuracy measure in a classification problem?

A

The proportion of correctly classified instances out of all instances.

Accuracy measures the proportion of correctly classified instances out of the total instances, regardless of their class.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

Which metric evaluates the balance between precision and recall?

A

F1 Score

The F1 score is the harmonic mean of precision and recall, helping to balance both when assessing a model’s performance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

What is R-squared used for in regression models?

A

To indicate how much variance in the target variable is explained by the model.

R-squared measures the proportion of variance in the target variable that is explained by the regression model, indicating how well the model fits the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

What feature of Amazon Bedrock ensures that data, including prompts and responses, remains secure?

A

Data remains within the AWS region of the API call

Data, including prompts and responses, remains within the same AWS region where the API is called from, ensuring data security and compliance with regional data protection regulations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

Which of the following best describes the pricing model for Amazon Bedrock?

A

Pay-as-you-go based on input and output tokens

Amazon Bedrock offers a pay-as-you-go pricing model based on the number of input and output tokens processed. There is also a provisioned throughput mode for larger, more steady workloads.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

What is a primary benefit of using Amazon Bedrock for generative AI applications?

A

Ability to deploy AI models without managing infrastructure

The key benefit is the ability to deploy models without the need to manage the underlying infrastructure.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

What is the primary function of Amazon Fraud Detector?

A

To create custom fraud detection models using machine learning.

Amazon Fraud Detector helps in building custom machine learning models specifically designed to detect potential fraud.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

What is the main benefit of using Amazon Augmented AI?

A

Providing a human review workflow for machine learning predictions.

Human-in-the-loop workflows

Amazon Augmented AI (A2I) provides a human review system for machine learning predictions, allowing humans to review and validate predictions when necessary.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

What is the primary function of Amazon Comprehend?

A

To analyze and extract insights from text using natural language processing (NLP).

Analyzing sentiment in text data.

Amazon Comprehend is an NLP service that analyzes text and extracts insights such as sentiment, key phrases, entities, and language.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

How can Amazon Comprehend and Amazon Augmented AI be used together?

A

Amazon Comprehend identifies entities in text, and A2I allows human review of the identified entries for validation.

Amazon Comprehend can extract entities from text, and Amazon Augmented AI (A2I) can be used to allow human reviewers to validate or correct the extracted data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

What is the primary function of Amazon Textract?

A

To extract text, tables, and forms from scanned documents.

Amazon Textract automatically extracts text, tables, forms, and other data from scanned documents, making it easy to analyze and process document content.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

Which of the following best describes Amazon Kendra?

A

An enterprise search service that uses natural language to return relevant results.

Amazon Kendra is an intelligent search service that allows users to search across internal documents and knowledge bases using natural language queries.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

What capability does Amazon Rekognition provide?

A

Recognizing faces, objects, and scenes in images and videos.

Amazon Rekognition can detect faces, objects, and scenes in images and videos, as well as analyze video content for facial recognition and object detection.

Rekognition doesn’t perform sentiment analysis; it handles visual content analysis.

50
Q

In Amazon S3, what does the term “key” refer to?

A

The unique identifier of an object within a bucket.

In Amazon S3, a “key” uniquely identifies an object within a bucket and includes the full path to the object along with its file name.

51
Q

What are the primary components of Amazon S3’s structure?

A

Buckets and objects.

Amazon S3’s structure consists mainly of Buckets, which store Objects—the actual files.

52
Q

Which of the following statements is true about S3 bucket naming conventions?

A

Bucket names must be globally unique across all AWS accounts.

S3 bucket names must be unique globally across all AWS accounts and regions.

53
Q

What is AWS Glue primarily designed for?

A

Providing a fully managed extract, transform, and load (ETL) service.

AWS Glue is a fully managed ETL service that makes it easy to prepare and load data for analytics. It simplifies the process of moving data between data stores and transforming it for analysis.

54
Q

Which statement best describes batch data ingestion?

A

Data is collected over time and processed all at once.

Batch data ingestion involves collecting data over a set period and then processing it all at once. This method is efficient for handling large volumes of data that do not require immediate processing.

55
Q

What is the primary purpose of an AWS Glue crawler?

A

To create and update the AWS Glue Data Catalog with table definitions from data stores.

An AWS Glue crawler connects to your data stores, determines the schema for your data, and populates the AWS Glue Data Catalog with metadata tables. This makes your data searchable and queryable using services like Amazon Athena and Amazon Redshift Spectrum.

AWS Glue crawlers do not execute ETL jobs; they are used to discover data and store metadata. ETL jobs are defined and run separately within AWS Glue.

56
Q

AWS Glue triggers

A

To schedule and trigger AWS Glue jobs based on predefined events.

57
Q

AWS CloudWatch

A

To monitor and log AWS Glue job performance in real-time.

Monitoring and logging are managed through AWS CloudWatch and AWS Glue job metrics, not by crawlers.

58
Q

What is the purpose of partitioning data in Amazon Athena?

A

To enhance query performance by reducing the amount of data scanned.

Partitioning data in Amazon Athena allows queries to scan only the relevant partitions. This reduces the amount of data read during a query, improving performance and lowering costs.

59
Q

What is the main advantage of utilizing federated queries in Amazon Athena?

A

To query and analyze data across multiple data sources without moving it.

Federated queries in Athena let you run SQL queries across various data sources (like DynamoDB, RDS) without the need to extract or load the data into S3.

60
Q

Which strategy among the following can lower the cost of executing queries in Amazon Athena?

A

Compressing data using columnar storage formats like Parquet.

Using compressed, columnar storage formats like Parquet reduces the data size and the amount of data scanned during queries, directly lowering costs.

61
Q

What is the main benefit of employing partition projection in Amazon Athena?

A

It automates the creation and management of partitions, improving query performance.

Partition projection automates the creation and management of partitions, which improves query performance by reducing the need to manually specify partitions and allowing Athena to quickly prune unnecessary partitions during query execution.

62
Q

What feature most accurately defines a stateless system?

A

Handles each request independently without storing previous data.

A stateless system processes each request without maintaining information from prior interactions.

63
Q

What is a key benefit of a stateful data ingestion system?

A

Tracks the context of previous data ingestion events to prevent reprocessing the same data.

Stateful ingestion systems maintain context, such as timestamps, to avoid reprocessing and enhance efficiency.

64
Q

Which AWS service supports stateful data ingestion by using bookmarks for incremental data loads?

A

AWS Glue

AWS Glue uses bookmarks to remember previously processed data, allowing for stateful ingestion and incremental loading.

65
Q

What is the main advantage of partitioning data in AWS Glue?

A

Facilitates parallel processing and enhances query performance.

Partitioning in AWS Glue allows for parallel processing and boosts query performance by enabling queries to access only the relevant partitions instead of scanning the entire dataset.

66
Q

Which type of AWS Glue job is most suitable for basic, data-driven transformations without requiring custom code?

A

Python Shell

Python Shell jobs are ideal for straightforward data transformations and tasks that do not need the complexity of a full Spark environment.

67
Q

Spark

A

Spark is used for more complex, distributed data processing.

68
Q

What is one key benefit of using AWS Glue Workflows?

A

Enables users to create and manage dependencies between ETL jobs.

AWS Glue Workflows allow the orchestration of multiple jobs and tasks in a specific sequence by managing complex ETL job dependencies.

69
Q

ETL

A

Extract, transform, and load (ETL) is the process of combining data from multiple sources into a large, central repository called a data warehouse.

ETL is a broad term for data processing tasks and doesn’t specify the job type.

70
Q

Scala

A

Scala is a language used in Spark for more advanced tasks but is not the simplest option for basic transformations.

71
Q

Which of the following is a key use case for AWS Glue DataBrew?

A

Data cleaning and transformation

Glue DataBrew offers a visual tool for preparing and transforming data without needing to code.

72
Q

What is a major feature of AWS Lambda that enhances scalability?

A

It automatically scales according to demand.

AWS Lambda automatically adjusts its scaling to handle the number of requests, providing seamless scalability without user input.

73
Q

How does AWS Lambda manage state between function executions?

A

Each execution is independent and does not save state from prior runs.

AWS Lambda functions are stateless by design, meaning each invocation is isolated and doesn’t retain memory from previous executions.

74
Q

Which of the following scenarios is best suited for AWS Lambda in data processing?

A

Real-time data processing triggered by new file uploads to an S3 bucket.

AWS Lambda is ideal for real-time, event-based tasks like processing files when they are uploaded to an S3 bucket.

Lambda has a maximum execution time limit, making it unsuitable for long-running jobs.

75
Q

For scheduled tasks like monthly data warehousing

A

For scheduled tasks like monthly data warehousing, services such as AWS Glue are better suited.

76
Q

Which Amazon Kinesis service is best suited for delivering streaming data to Amazon S3, Redshift, or other AWS services for long-term storage and analysis?

A

Kinesis Data Firehose

Kinesis Data Firehose is specifically designed to stream data into AWS services like Amazon S3 and Redshift, making it ideal for this task.

77
Q

What is the main advantage of using the Kinesis Producer Library (KPL) over custom coding with the AWS SDK to send data to Kinesis Data Streams?

A

Easier error handling and improved data throughput.

KPL does not specifically enhance data security; it focuses more on efficient data production.

KPL is tailored for integration with AWS services and doesn’t directly enhance compatibility with non-AWS streaming sources.

78
Q

In the context of Amazon Kinesis Data Streams, what does the term “shard” represent?

A

A processing unit with a fixed data throughput capacity.

Shards provide a fixed amount of data throughput, typically measured in data and record rates per second.

79
Q

Amazon Kinesis Data Streams

A

Kinesis Data Streams is mainly used to capture and process large volumes of streaming data in real time, not directly for transferring data to storage services.

80
Q

Amazon Kinesis Data Analytics

A

Kinesis Data Analytics is intended for analyzing and processing streaming data, not for delivering it to storage services.

81
Q

Amazon Managed Apache Flink

A

Managed Apache Flink is used for advanced analytics on streaming data and does not focus on data delivery to storage.

82
Q

Which of the following best explains the purpose of AWS Kinesis Data Firehose?

A

A fully managed service that ingests real-time data and optionally transforms it before storing it.

Kinesis Data Firehose is a managed service designed to collect, optionally transform, and load streaming data into AWS storage systems.

83
Q

What is a key use case for integrating AWS Lambda with Kinesis Data Firehose?

A

To perform real-time data transformation during data ingestion.

AWS Lambda can be integrated with Kinesis Data Firehose to process and transform data in real-time during ingestion.

84
Q

In the context of AWS Kinesis Data Firehose, what is the purpose of the buffering mechanism?

A

It batches incoming data to optimize data transfer to the destination.

Buffering mechanism in Kinesis Data Firehose collects data into batches (based on size or time) to improve transfer efficiency and minimize API calls.

85
Q

Which of the following is a primary role of the master node in an Amazon EMR cluster?

A

Orchestrating the distribution of data and tasks.

The master node in an Amazon EMR cluster is responsible for managing the cluster, including coordinating the distribution of data and tasks among the other nodes.

86
Q

Which storage option can be used for transient, cost-effective storage in an Amazon EMR cluster?

A

Instance Store

Instance Store is an ephemeral (lasting for short time) storage option, which is cost-effective for temporary data storage during EMR operations

87
Q

Which Amazon EMR deployment option allows for the automatic scaling of the cluster as workloads change?

A

EMR Serverless

EMR Serverless automatically provisions and scales compute capacity based on the workload’s needs.

88
Q

Serverless

A

Do not MANAGE servers; as-used basis

Serverless computing is a cloud computing execution model that allocates machine resources on an as-used basis. Under a serverless model, developers can build and run applications without having to manage any servers and pay only for the exact amount of resources used.

89
Q

What is the purpose of enabling versioning on an S3 bucket?

A

To allow multiple versions of an object to be stored and retrieved.

Enabling versioning on an S3 bucket allows multiple versions of an object to be stored, providing a way to recover and restore previous versions if needed.

90
Q

In Amazon S3, how can you optimize the cost of storing objects that are infrequently accessed over time?

A

By setting up a lifecycle policy to move objects to S3 Intelligent Tiering.

S3 Intelligent-Tiering automatically moves objects between different access tiers based on usage patterns, optimizing cost.

91
Q

Which S3 storage class is designed for data that is rarely accessed but requires quick retrieval when needed?

A

S3 Glacier Instant Retrieval

S3 Glacier Instant Retrieval is designed for rarely accessed data, but it still offers quick retrieval times when the data is needed.

92
Q

What is the primary benefit of using S3 Select?

A

To query and retrieve specific data from an object without downloading the entire object.

S3 Select allows users to perform SQL-like queries to retrieve specific data from a larger object, saving time and reducing the cost of data retrieval.

93
Q

Which of the following ensures that only authorized users can access objects in an S3 bucket?

A

Bucket Policies and IAM Permissions

Bucket Policies and IAM Permissions control who has access to the S3 bucket and its objects, ensuring security.

94
Q

What can trigger an S3 Event Notification?

A

A new object is uploaded to the bucket.

S3 Event Notifications can be triggered when specific actions occur, such as an object being uploaded, deleted, or restored from Glacier.

95
Q

What is the main principle behind a Data Mesh architecture?

A

Data as a product, with decentralized data ownership and domain-oriented architecture.

In a Data Mesh, each domain owns and treats its data as a product, ensuring it is accessible and well-governed by those who know it best.

96
Q

What is a key difference between Amazon EBS and Amazon EFS?

A

EBS is block storage, while EFS is file storage

Amazon EBS is block-level storage, which is useful for databases and applications requiring raw storage, while EFS is a managed file system that allows concurrent access to files.

97
Q

What is the advantage of using Amazon EFS over Amazon EBS for shared storage between multiple EC2 instances?

A

EFS allows concurrent access from multiple instances.

The key advantage of EFS is that it supports concurrent access by multiple EC2 instances, making it ideal for shared file storage.

98
Q

What is the primary function of an IAM policy in AWS?

A

To define permissions and control access to AWS resources.

IAM policies are documents that specify permissions, controlling which actions users and roles can perform on AWS resources.

99
Q

Which of the following best describes an IAM role?

A

An entity that defines a set of permissions for making AWS service requests.

An IAM role is an identity with specific permissions that can be assumed by trusted entities, allowing them to make AWS service requests.

100
Q

What is an IAM policy in AWS?

A

A JSON document that defines permissions for actions on AWS resources.

An IAM policy is a JSON document that specifies permissions, defining what actions are allowed or denied on AWS resources.

101
Q

What is the primary purpose of AWS Key Management Service (KMS)?

A

To manage encryption keys for AWS services and applications.

AWS KMS is a managed service that makes it easy to create and control encryption keys used to encrypt your data across AWS services and applications.

102
Q

Which AWS service uses machine learning to automatically discover, classify, and protect sensitive data stored in AWS?

A

Amazon Macie

Amazon Macie uses machine learning to recognize sensitive data such as personally identifiable information (PII) and intellectual property.

103
Q

What type of protection does AWS Shield offer?

A

Protection against DDoS Attacks

AWS Shield is a managed DDoS protection service that safeguards applications running on AWS.

AWS Shield provides protection against Distributed Denial of Service (DDoS) attacks.

104
Q

What is AWS Secrets Manager primarily used for?

A

Storing and rotating database credentials, API keys, and other secrets.

AWS Secrets Manager helps you securely store, distribute, and rotate credentials and keys.

105
Q

What is the primary purpose of a VPC peering connection in AWS?

A

To enable direct network connectivity between two VPCs.

A VPC peering connection allows you to route traffic between two VPCs using private IP addresses, enabling direct connectivity.

106
Q

Which statement best describes the difference between Security Groups and Network ACLs (NACLs) in AWS?

A

Security Groups operate at the instance level; NACLs operate at the subnet level.

Security Groups act as a virtual firewall for instances, controlling inbound and outbound traffic at the instance level. NACLs control traffic at the subnet level.

107
Q

What is the primary function of an AWS VPN connection?

A

To securely connect an on-premises network to an AWS VPC over the internet.

An AWS VPN connection enables a secure communication tunnel between your on-premises network and your AWS VPC over the public internet.

108
Q

What is the primary purpose of AWS CloudTrail?

A

To monitor and log API calls made within your AWS account.

AWS CloudTrail records API calls and events for your AWS account, providing a history of AWS API calls made through the AWS Management Console, SDKs, and command-line tools.

109
Q

What does AWS Config primarily help you with?

A

Tracking resource inventory and changes over time.

AWS Config provides a detailed view of the configuration of AWS resources in your account and tracks changes over time.

110
Q

Which of the following is NOT one of the five pillars of the AWS Well-Architected Framework?

A

Cost Reduction

Cost Reduction is not a pillar; the correct pillar is Cost Optimization.

(1) Operational excellence
(2) Security
(3) Reliability
(4) Performance efficiency
(5) Cost optimization

111
Q

AWS CloudTrail

A

AWS CloudTrail is an AWS service that helps you enable operational and risk auditing, governance, and compliance of your AWS account.

112
Q

Which of the following best defines a CloudFormation template?

A

A text file containing definitions of AWS resources and their configurations.

CloudFormation template is a text file in JSON or YAML format that defines AWS resources and their configurations.

113
Q

Which of the following statements correctly describes ECS launch types?

A

Fargate launch type is a serverless feature requiring no management of underlying infrastructure.

Fargate is a serverless feature where AWS manages the underlying infrastructure.

114
Q

What is the main advantage of using the ‘spread’ task placement strategy in Amazon ECS?

A

It ensures tasks are evenly distributed across container instances to improve fault tolerance.

The ‘spread’ strategy ensures tasks are evenly distributed across instances, which improves fault tolerance and availability.

115
Q

Which of the following is a primary function of Amazon CloudWatch?

A

To monitor and provide operational data from AWS resources.

Amazon CloudWatch is primarily used to monitor and provide operational data, such as metrics, logs, and alarms, from AWS resources.

116
Q

Which of the following best describes CloudWatch Metrics Streams?

A

A feature that delivers real-time metrics to a specified AWS service or external system.

CloudWatch Metrics Streams continuously stream real-time metrics data to a destination, such as an external system or AWS service, for further analysis or storage.

117
Q

Which of the following conditions can trigger a CloudWatch Alarm?

A

A specific metric breaching a defined threshold.

Amazon CloudWatch Alarms are designed to monitor specific metrics (like CPU utilization, memory usage, etc.) and trigger notifications or actions when those metrics breach thresholds you define.

118
Q

What is the primary function of Amazon Polly?

A

To convert text to lifelike speech.

Amazon Polly is a service that converts text into lifelike speech using advanced deep learning technologies.

119
Q

What is the key feature of Amazon Transcribe?

A

Automatically transcribing spoken language into text from audio files.

Amazon Transcribe is designed to convert spoken language from audio files into accurate text transcriptions.

120
Q

How can Amazon Polly and Amazon Transcribe be used together?

A

Transcribe can convert speech to text, and Polly can then convert that text back into speech.

Amazon Transcribe can convert speech into text, and Amazon Polly can take that text and convert it back into speech, making it possible to process and re-synthesize spoken content.