Machine Learning Quizzes - Nikolai Flashcards
What is the primary benefit of using Amazon SageMaker notebooks for machine learning tasks?
They provide an interactive environment with full managed infrastructure for machine learning tasks.
Amazon SageMaker notebooks are fully managed Jupyter notebooks that provide an interactive environment for managing machine learning tasks. They handle the setup and configuration of the underlying infrastructure, including servers, environments, and software.
Which feature of SageMaker notebooks ensures that your work and data are not lost between sessions?
Persistent Storage
SageMaker notebooks come with persistent storage, ensuring that data and work are saved and can be accessed later, even between sessions.
What is a key difference between a SageMaker notebook instance and a SageMaker Studio notebook?
Studio notebooks are integrated within SageMaker Studio and allow managing multiple notebooks.
SageMaker Studio notebooks are part of SageMaker Studio, offering an integrated environment where users can manage multiple notebooks, access persistent storage, and perform tasks within a unified interface.
Which of the following is a key feature of SageMaker Data Wrangler?
It provides over 300 pre-configured data transformations out of the box.
SageMaker Data Wrangler provides more than 300 pre-configured transformations to help with tasks like handling missing values and encoding categorical data.
What is one of the key features of SageMaker Data Wrangler that helps in transforming categorical features into numerical ones?
One-hot encoding
One-hot encoding is a common technique used to convert categorical data into numerical data, which is essential for certain machine learning algorithms that require numerical input. SageMaker Data Wrangler offers this transformation out-of-the-box.
Which of the following is NOT a capability of SageMaker Data Wrangler?
Creating SQL queries to fetch data.
SageMaker Data Wrangler does not create SQL queries to fetch data. It focuses on importing data, transforming it, and visualizing it within the SageMaker environment.
What is the benefit of using the “human in the loop” feature in SageMaker Ground Truth?
It allows a combination of human input and machine learning models to improve labeling accuracy.
“Human in the loop” allows SageMaker Ground Truth to combine human feedback with machine learning models, improving labeling efficiency and accuracy over time.
What is a primary function of SageMaker Ground Truth?
Labeling datasets for machine learning model training.
SageMaker Ground Truth is primarily used for labeling datasets, which is crucial for preparing data for machine learning model training.
What is one of the main purposes of SageMaker Feature Store?
To store, manage, and share features for machine learning models.
SageMaker Feature Store is used to store, manage, and share features across different machine learning models and teams, helping streamline feature reuse and collaboration.
What is a key advantage of using built-in algorithms in Amazon SageMaker?
They are optimized for specific use cases and abstract away infrastructure management.
Built-in algorithms in SageMaker are optimized for specific tasks and come pre-packaged, making it easier to use without needing to manage infrastructure or perform complex setups.
Which type of learning is not one of the four main types of SageMaker’s built-in algorithms?
Reinforcement Learning
Reinforcement learning is not included in the four main categories of built-in algorithms in SageMaker.
What is the primary benefit of using SageMaker Jumpstart for machine learning development?
It provides access to pre-trained models and prebuilt solutions, reducing development time.
SageMaker Jumpstart allows users to quickly develop and deploy machine learning models by providing pre-trained models and prebuilt solutions, minimizing the time spent on development.
What is a hyperparameter in the context of machine learning?
A configuration that controls how a model learns and operates, set before training begins.
Hyperparameters are configurations set before the training process begins that control how a model learns and operates, such as the depth of a decision tree.
Which of the following is true about Amazon SageMaker’s automatic model tuning feature?
It runs multiple training jobs with different hyperparameter combinations to find the best model.
SageMaker’s automatic model tuning runs multiple training jobs with different hyperparameter combinations to find the best-performing model based on the chosen performance metric.
Which hyperparameter tuning technique ensures that every possible combination of hyperparameters is tested?
Grid search
Grid search tests every possible combination of specified hyperparameters by creating a grid and testing each combination in sequence.
What is the primary advantage of using custom Docker containers in Amazon SageMaker?
Custom containers allow full control over the training and inference environment, including specific libraries and operating system choices.
Custom Docker containers provide full control over the environment, allowing users to include their own code, dependencies, libraries, and operating system.
When using a custom Docker container for training in SageMaker, which service is used to store and retrieve the Docker image?
Amazon Elastic Container Registry (ECR)
Amazon ECR is used to store Docker images, which are then pulled by SageMaker to run training jobs or inference.
In SageMaker, what must a custom Docker container expose during inference to function correctly?
A REST API endpoint
During inference, a custom Docker container must expose a REST API endpoint for SageMaker to make predictions.
What is the main purpose of using SageMaker Experiments?
To track, organize, and compare multiple model training runs.
SageMaker Experiments helps track, organize, and compare different model training runs (called trials) to identify the best configuration. It captures all the details of hyperparameters, datasets, and performance metrics to assist in the comparison process.
In SageMaker Experiments, what is a trial component?
A step or stage within a training run, such as data preprocessing or model evaluation.
A trial component in SageMaker Experiments represents the various stages of a machine learning workflow, such as data preprocessing, model training, or evaluation. It helps capture the metadata for each stage.
What is the primary function of SageMaker Clarify?
To detect and mitigate biases in machine learning models
SageMaker Clarify is designed to detect and mitigate biases in machine learning models, ensuring fairness and transparency.
Which of the following is a method used by SageMaker Data Wrangler to address class imbalance in datasets?
Random Undersampling
Random undersampling is one of the techniques used in SageMaker Data Wrangler to handle class imbalance by reducing the number of samples in the majority class.
What is the primary purpose of Amazon SageMaker Debugger?
To monitor and debug training jobs in real-time
SageMaker Debugger enables real-time monitoring and debugging of model training jobs, allowing users to detect issues such as vanishing gradients or overfitting during the training process.
Which type of issue can SageMaker Debugger detect during model training?
Overfitting
SageMaker Debugger can detect common training issues like overfitting, vanishing gradients, and underfitting during model training.
What is the primary purpose of Amazon SageMaker Model Monitor?
To monitor model quality and detect issues like data and model drift
SageMaker Model Monitor automatically tracks the quality of deployed models and detects issues like data drift, model quality drift, bias, and feature attribution drift.
What is “data drift” in the context of SageMaker Model Monitor?
When the data received in production differs from the data used during model training.
Data drift occurs when the data used to train the model starts to differ significantly from the data received in production, potentially impacting model performance.
Which metric would be most relevant for monitoring a binary classification model in SageMaker Model Monitor?
Confusing Matrix
A confusion matrix is used to evaluate the performance of a binary classification model by showing the true positives, true negatives, false positives, and false negatives.
What is the primary function of SageMaker Pipelines in the context of machine learning workflows?
To automate and manage machine learning workflows, including building, training, and deploying models.
SageMaker Pipelines is a serverless workflow orchestration service designed for automating and managing machine learning workflows, including building, training, and deploying models.
In SageMaker Pipelines, what are the three main components that define a pipeline instance?
Name, Steps, and Parameters
A SageMaker Pipeline instance is composed of three components: Name (the unique identifier for the pipeline), Steps (which define the actions in the workflow), and Parameters (which allow customization of the pipeline’s behavior).
What is a key benefit of using SageMaker Pipelines in production environments?
It handles tens of thousands of concurrent workflows, ensuring scalability.
SageMaker Pipelines is designed to be highly scalable, capable of handling tens of thousands of concurrent machine learning workflows in production.
What is the primary goal of image classification in machine learning?
To assign a label to the entire image based on learned features.
The main goal of image classification is to assign a label to the entire image by learning relevant features from the image data. The algorithm simplifies complex datasets by labeling unseen images based on learned patterns.
Which of the following statements best describes supervised learning?
The algorithm learns from labeled data to predict outcomes.
Supervised learning involves training a model on labeled data, where the outcome is known, and using this data to predict outcomes for new, unseen data.
What is the primary goal of unsupervised learning?
To identify hidden patterns or groupings within unlabeled data.
Unsupervised learning seeks to uncover hidden patterns or structures in unlabeled data without prior knowledge of outcomes.
What is a key feature of reinforcement learning in general?
The agent interacts with an environment and receives rewards or penalties based on its actions.
In reinforcement learning, the agent interacts with an environment and learns by trial and error through receiving rewards or penalties for its actions.
Which of the following is a task well-suited for reinforcement learning?
Optimizing a robot’s path in an uncertain environment.
Reinforcement learning is particularly useful in dynamic and uncertain environments, like navigation for robots.
What is the role of the reward function in reinforcement learning?
It evaluates how well an agent’s actions align with its goal.
The reward function evaluates the benefit or cost of actions taken by the agent, helping it to learn what actions lead to better outcomes.
What does accuracy measure in a classification problem?
The proportion of correctly classified instances out of all instances.
Accuracy measures the proportion of correctly classified instances out of the total instances, regardless of their class.
Which metric evaluates the balance between precision and recall?
F1 Score
The F1 score is the harmonic mean of precision and recall, helping to balance both when assessing a model’s performance.
What is R-squared used for in regression models?
To indicate how much variance in the target variable is explained by the model.
R-squared measures the proportion of variance in the target variable that is explained by the regression model, indicating how well the model fits the data.
What feature of Amazon Bedrock ensures that data, including prompts and responses, remains secure?
Data remains within the AWS region of the API call
Data, including prompts and responses, remains within the same AWS region where the API is called from, ensuring data security and compliance with regional data protection regulations.
Which of the following best describes the pricing model for Amazon Bedrock?
Pay-as-you-go based on input and output tokens
Amazon Bedrock offers a pay-as-you-go pricing model based on the number of input and output tokens processed. There is also a provisioned throughput mode for larger, more steady workloads.
What is a primary benefit of using Amazon Bedrock for generative AI applications?
Ability to deploy AI models without managing infrastructure
The key benefit is the ability to deploy models without the need to manage the underlying infrastructure.
What is the primary function of Amazon Fraud Detector?
To create custom fraud detection models using machine learning.
Amazon Fraud Detector helps in building custom machine learning models specifically designed to detect potential fraud.
What is the main benefit of using Amazon Augmented AI?
Providing a human review workflow for machine learning predictions.
Human-in-the-loop workflows
Amazon Augmented AI (A2I) provides a human review system for machine learning predictions, allowing humans to review and validate predictions when necessary.
What is the primary function of Amazon Comprehend?
To analyze and extract insights from text using natural language processing (NLP).
Analyzing sentiment in text data.
Amazon Comprehend is an NLP service that analyzes text and extracts insights such as sentiment, key phrases, entities, and language.
How can Amazon Comprehend and Amazon Augmented AI be used together?
Amazon Comprehend identifies entities in text, and A2I allows human review of the identified entries for validation.
Amazon Comprehend can extract entities from text, and Amazon Augmented AI (A2I) can be used to allow human reviewers to validate or correct the extracted data.
What is the primary function of Amazon Textract?
To extract text, tables, and forms from scanned documents.
Amazon Textract automatically extracts text, tables, forms, and other data from scanned documents, making it easy to analyze and process document content.
Which of the following best describes Amazon Kendra?
An enterprise search service that uses natural language to return relevant results.
Amazon Kendra is an intelligent search service that allows users to search across internal documents and knowledge bases using natural language queries.