AIF-C01 Practice Exam (Udemy Course) Flashcards
Which of the following applies to Amazon Bedrock? (Select two)
(a) Smaller models are cheaper to use than larger models
(b) You can use a customized model only in the Provisioned Throughput mode
(c) Larger models are cheaper to use than smaller models
(d) You can use a customized model in the Provisioned Throughput or On-Demand mode
(e) You can use the On-Demand mode only with time-based term commitments
Which of the following applies to Amazon Bedrock? (Select two)
Correct
(a) Smaller models are cheaper to use than larger models
(b) You can use a customized model only in the Provisioned Throughput mode
With Amazon Bedrock, you will be charged for model inference and customization. You have a choice of two pricing plans for inference: 1. On-Demand and Batch: This mode allows you to use FMs on a pay-as-you-go basis without having to make any time-based term commitments. 2. Provisioned Throughput: This mode allows you to provision sufficient throughput to meet your application’s performance requirements in exchange for a time-based term commitment.
(a) Smaller models are cheaper to use than larger models
The cost of generative AI models can vary. It’s important to weigh the trade-offs between model size and speed. Larger models tend to be more accurate but are costly and have limited deployment options. In contrast, smaller models are more affordable and faster, offering more deployment flexibility.
(b) You can use a customized model only in the Provisioned Throughput mode
With the Provisioned Throughput mode, you can purchase model units for a specific base or custom model. The Provisioned Throughput mode is primarily designed for large consistent inference workloads that need guaranteed throughput. Custom models can only be accessed using Provisioned Throughput.
via - https://aws.amazon.com/bedrock/pricing/
Incorrect options:
(c) Larger models are cheaper to use than smaller models - As mentioned above, larger models tend to be more accurate but are costly and have limited deployment options. In contrast, smaller models are more affordable and faster, offering more deployment flexibility.
(d) You can use a customized model in the Provisioned Throughput or On-Demand mode - Custom models can only be accessed using Provisioned Throughput. So, this option is incorrect.
(e) You can use the On-Demand mode only with time-based term commitments - With the On-Demand mode, you only pay for what you use, with no time-based term commitments. So, this option is incorrect.
References:
https://aws.amazon.com/bedrock/pricing/
https://aws.amazon.com/blogs/machine-learning/best-practices-to-build-generative-ai-applications-on-aws/
Which AWS services/tools can be used to implement Responsible AI practices? (Select two)
(a) Amazon SageMaker Model Monitor
(b) Amazon Inspector
(c) Amazon SageMaker Clarify
(d) AWS Audit Manager
(e) Amazon SageMaker JumpStart
Correct options:
(a) Amazon SageMaker Model Monitor
(c) Amazon SageMaker Clarify
(a) Amazon SageMaker Model Monitor
Amazon SageMaker Model Monitor is a service within the Amazon SageMaker suite that helps developers continuously monitor machine learning models deployed in production. It ensures that models maintain optimal performance and make accurate predictions over time by detecting data quality issues, concept drift, and other anomalies. Amazon SageMaker Model Monitor automatically detects and alerts you to inaccurate predictions from deployed models.
(c) Amazon SageMaker Clarify
Amazon SageMaker Clarify is a service provided by AWS (Amazon Web Services) that helps developers detect biases and explain the predictions made by machine learning models. It is part of the Amazon SageMaker suite of machine learning tools and focuses on enhancing transparency, fairness, and explainability in machine learning workflows.
Tools and resources to build AI responsibly: via - https://aws.amazon.com/machine-learning/responsible-ai/resources/
Incorrect options:
(b) Amazon Inspector - Amazon Inspector is an automated vulnerability management service that continually scans AWS workloads for software vulnerabilities and unintended network exposure.
(d) AWS Audit Manager - AWS Audit Manager helps you assess internal risk with prebuilt frameworks that translate evidence from cloud services into security IT audit reports.
(e) Amazon SageMaker JumpStart - Amazon SageMaker JumpStart is a machine learning hub with foundation models, built-in algorithms, and prebuilt ML solutions that you can deploy with just a few clicks.
References:
https://aws.amazon.com/machine-learning/responsible-ai/resources/
https://aws.amazon.com/sagemaker/clarify/
https://aws.amazon.com/sagemaker/ml-governance/
Which of the following summarizes the central idea behind machine learning?
(a) Machine learning only functions effectively when data is manually labeled and categorized by humans
(b) Machine learning involves training algorithms on large datasets to identify patterns and make predictions or decisions based on new data
(c) Machine learning is primarily based on hardware configurations and does not rely on software algorithms or data analysis
(d) Machine learning works by using predefined rules to generate outcomes without the need for data input
Correct option:
(b) Machine learning involves training algorithms on large datasets to identify patterns and make predictions or decisions based on new data
Machine learning works by training algorithms on large datasets, allowing them to identify patterns within the data. Once trained, these algorithms can make predictions or decisions when presented with new data, improving their performance over time as they are exposed to more data.
The central idea behind machine learning is an existing mathematical relationship between any input and output data combination. The machine learning model does not know this relationship in advance but can guess if sufficient examples of input-output data sets are given. This means every machine learning algorithm is built around a modifiable math function. The underlying principle can be understood like this:
We ‘train’ the algorithm by giving it the following input/output (i,o) combinations – (2,10), (5,19), and (9,31) The algorithm computes the relationship between input and output to be: o=3*i+4 We then give it input 7 and ask it to predict the output. It can automatically determine the output as 25.
While this is a basic understanding, machine learning focuses on the principle that computer systems can mathematically link all complex data points as long as they have sufficient data and computing power to process. Therefore, the accuracy of the output is directly co-relational to the magnitude of the input given. Machine learning phases are given below.
via - https://aws.amazon.com/what-is/machine-learning/
Incorrect options:
(a) Machine learning only functions effectively when data is manually labeled and categorized by humans - While labeled data can improve the performance of supervised learning models, machine learning can also function with unlabeled data through unsupervised learning methods.
(c) Machine learning is primarily based on hardware configurations and does not rely on software algorithms or data analysis - Machine learning is fundamentally based on software algorithms and data analysis rather than solely on hardware configurations.
(d) Machine learning works by using predefined rules to generate outcomes without the need for data input - Machine learning does not rely on predefined rules alone; it uses data to learn and make predictions or decisions.
References:
https://aws.amazon.com/what-is/machine-learning/
Which of the following summarizes the capabilities of a multimodal model?
(a) A multimodal model can accept only a single type of input, however, it can create a mix of output types such as video/image
(b) A multimodal model can accept a mix of input types such as audio/text, however, it can only create a single type of output
(c) A multimodal model can accept a mix of input types such as audio/text and create a mix of output types such as video/image
(d) A multimodal model can accept only a single type of input and it can only create a single type of output
Correct option:
(c) A multimodal model can accept a mix of input types such as audio/text and create a mix of output types such as video/image
A multimodal model is an artificial intelligence system designed to process and understand multiple types of data, such as text, images, audio, and video. Unlike unimodal models, which handle a single type of data, multimodal models can integrate and make sense of information from various sources, allowing them to perform more complex and versatile tasks.
Multimodal models represent a significant advancement in AI, enabling the integration and understanding of multiple types of data. By combining different modalities, these models can perform a wide range of complex tasks, making them highly versatile and powerful tools in various fields.
Incorrect options:
(a) A multimodal model can accept only a single type of input, however, it can create a mix of output types such as video/image
(b) A multimodal model can accept a mix of input types such as audio/text, however, it can only create a single type of output
(d) A multimodal model can accept only a single type of input and it can only create a single type of output
These three options contradict the explanation provided above, so these options are incorrect.
References:
https://aws.amazon.com/blogs/industries/multimodal-data-analysis-with-aws-health-and-machine-learning-services/
https://aws.amazon.com/blogs/industries/training-machine-learning-models-on-multimodal-health-data-with-amazon-sagemaker/
Which of the following summarizes the differences between Guardrails for Amazon Bedrock and watermark detection for Amazon Bedrock?
(a) Both Guardrails and watermark detection help control the interaction between users and FMs by filtering undesirable and harmful content
(b) Both Guardrails and watermark detection help identify if an image was created by the Amazon Titan Image Generator model on Bedrock
(c) Guardrails helps control the interaction between users and FMs by filtering undesirable and harmful content, whereas, watermark detection identifies if an image was created by the Amazon Titan Image Generator model on Bedrock
(d) Watermark detection helps control the interaction between users and FMs by filtering undesirable and harmful content, whereas, Guardrails identifies if an image was created by the Amazon Titan Image Generator model on Bedrock
Correct option:
(c) Guardrails helps control the interaction between users and FMs by filtering undesirable and harmful content, whereas, watermark detection identifies if an image was created by the Amazon Titan Image Generator model on Bedrock
Guardrails for Amazon Bedrock help you implement safeguards for your generative AI applications based on your use cases and responsible AI policies. Guardrails for Amazon Bedrock helps control the interaction between users and FMs by filtering undesirable and harmful content as well as redacting personally identifiable information (PII), enhancing content safety and privacy in generative AI applications.
Watermark detection is a security feature in Amazon Bedrock that identifies if an image was created by the Amazon Titan Image Generator model on Bedrock.
Incorrect options:
(a) Both Guardrails and watermark detection help control the interaction between users and FMs by filtering undesirable and harmful content
(b) Both Guardrails and watermark detection help identify if an image was created by the Amazon Titan Image Generator model on Bedrock
(d) Watermark detection helps control the interaction between users and FMs by filtering undesirable and harmful content, whereas, Guardrails identifies if an image was created by the Amazon Titan Image Generator model on Bedrock
These three options contradict the explanation provided above, so these options are incorrect.
References: N/A
Which of the following is regarding underfitting/overfitting in machine learning?
(a) Underfit models experience high bias, whereas, overfit models experience high variance
(b) Underfit models experience high bias, whereas, overfit models experience low variance
(c) Underfit models experience low bias, whereas, overfit models experience low variance
(d) Underfit models experience low bias, whereas, overfit models experience high variance
Correct option:
(a) Underfit models experience high bias, whereas, overfit models experience high variance
Your model is underfitting the training data when the model performs poorly on the training data. This is because the model is unable to capture the relationship between the input examples (often called X) and the target values (often called Y). Your model is overfitting your training data when you see that the model performs well on the training data but does not perform well on the evaluation data. This is because the model is memorizing the data it has seen and is unable to generalize to unseen examples.
Underfit models experience high bias — they give inaccurate results for both the training data and test set. On the other hand, overfit models experience high variance - they give accurate results for the training set but not for the test set. More model training results in less bias but variance can increase. Data scientists aim to find the sweet spot between underfitting and overfitting when fitting a model. A well-fitted model can quickly establish the dominant trend for seen and unseen data sets.
Incorrect options:
(b) Underfit models experience high bias, whereas, overfit models experience low variance
(c) Underfit models experience low bias, whereas, overfit models experience low variance
(d) Underfit models experience low bias, whereas, overfit models experience high variance
These three options contradict the explanation provided above, so these options are incorrect.
References:
https://docs.aws.amazon.com/machine-learning/latest/dg/model-fit-underfitting-vs-overfitting.html
https://aws.amazon.com/what-is/overfitting/
A company wants to implement fully managed support for end-to-end Retrieval Augmented Generation (RAG) workflow in Amazon Bedrock. What do you recommend?
(a) Guardrails for Amazon Bedrock
(b) Watermark detection for Amazon Bedrock
(c) Continued pretraining in Amazon Bedrock
(d) Knowledge Bases for Amazon Bedrock
Correct option:
(d) Knowledge Bases for Amazon Bedrock
With Knowledge Bases for Amazon Bedrock, you can give FMs and agents contextual information from your company’s private data sources for RAG to deliver more relevant, accurate, and customized responses
Knowledge Bases for Amazon Bedrock takes care of the entire ingestion workflow of converting your documents into embeddings (vector) and storing the embeddings in a specialized vector database. Knowledge Bases for Amazon Bedrock supports popular databases for vector storage, including vector engine for Amazon OpenSearch Serverless, Pinecone, Redis Enterprise Cloud, Amazon Aurora (coming soon), and MongoDB (coming soon). If you do not have an existing vector database, Amazon Bedrock creates an OpenSearch Serverless vector store for you.
via - https://aws.amazon.com/bedrock/knowledge-bases/
Incorrect options:
(a) Guardrails for Amazon Bedrock - Guardrails for Amazon Bedrock help you implement safeguards for your generative AI applications based on your use cases and responsible AI policies. It helps control the interaction between users and FMs by filtering undesirable and harmful content, redacts personally identifiable information (PII), and enhances content safety and privacy in generative AI applications. You cannot use guardrails to implement RAG workflow in Amazon Bedrock.
(b) Watermark detection for Amazon Bedrock - The watermark detection mechanism allows you to identify images generated by Amazon Titan Image Generator, a foundation model that allows users to create realistic, studio-quality images in large volumes and at low cost, using natural language prompts. With watermark detection, you can increase transparency around AI-generated content by mitigating harmful content generation and reducing the spread of misinformation. You cannot use a watermark detection mechanism to implement RAG workflow in Amazon Bedrock.
(c) Continued pretraining in Amazon Bedrock - In the continued pretraining process, you provide unlabeled data to pre-train a model by familiarizing it with certain types of inputs. You can provide data from specific topics to expose a model to those areas. The continued pretraining process will tweak the model parameters to accommodate the input data and improve its domain knowledge. You can use continued pretraining or fine-tuning for model customization in Amazon Bedrock. You cannot use continued pretraining to implement RAG workflow in Amazon Bedrock.
References:
https://aws.amazon.com/bedrock/faqs/
https://aws.amazon.com/bedrock/knowledge-bases/
https://aws.amazon.com/about-aws/whats-new/2024/04/watermark-detection-amazon-titan-image-generator-bedrock/
You are a Large Language Model (LLM) developer at a company. The company wants to migrate to AWS Cloud. Which AWS services would you recommend for developing LLMs? (Select two)
(a) Amazon Q
(b) Amazon SageMaker JumpStart
(c) AWS Trainium
(d) AWS Inferentia
(e) Amazon Bedrock
Correct options:
(b) Amazon SageMaker JumpStart
(e) Amazon Bedrock
Large language models (LLM) are very large deep learning models that are pre-trained on vast amounts of data. The underlying transformer is a set of neural networks that consist of an encoder and a decoder with self-attention capabilities. The encoder and decoder extract meanings from a sequence of text and understand the relationships between words and phrases in it.
Large language models (LLMs) are one class of Foundation Models. For example, OpenAI’s generative pre-trained transformer (GPT) models are LLMs. LLMs are specifically focused on language-based tasks such as such as summarization, text generation, classification, open-ended conversation, and information extraction.
AWS recommends AWS Bedrock and Amazon SageMaker JumpStart as the best-fit services for developing LLMs.
(b) Amazon SageMaker JumpStart
Amazon SageMaker JumpStart is a machine learning hub with foundation models, built-in algorithms, and prebuilt ML solutions that you can deploy with just a few clicks. With SageMaker JumpStart, you can access pre-trained models, including foundation models, to perform tasks like article summarization and image generation. Pretrained models are fully customizable for your use case with your data, and you can easily deploy them into production with the user interface or SDK.
(e) Amazon Bedrock
Amazon Bedrock is the easiest way to build and scale generative AI applications with foundation models. Amazon Bedrock is a fully managed service that makes foundation models from Amazon and leading AI startups available through an API, so you can choose from various FMs to find the model that’s best suited for your use case. With Bedrock, you can speed up developing and deploying scalable, reliable, and secure generative AI applications without managing infrastructure.
Incorrect options:
(a) Amazon Q - Amazon Q is a generative AI–powered assistant for accelerating software development and leveraging companies’ internal data. Amazon Q generates code, tests, and debugs. It has multistep planning and reasoning capabilities that can transform and implement new code generated from developer requests.
(c) AWS Trainium - AWS Trainium is the machine learning (ML) chip that AWS purpose-built for deep learning (DL) training of 100B+ parameter models. Each Amazon Elastic Compute Cloud (Amazon EC2) Trn1 instance deploys up to 16 Trainium accelerators to deliver a high-performance, low-cost solution for DL training in the cloud.
(d) AWS Inferentia - AWS Inferentia is an ML chip purpose-built by AWS to deliver high-performance inference at a low cost. AWS Inferentia accelerators are designed by AWS to deliver high performance at the lowest cost in Amazon EC2 for your deep learning (DL) and generative AI inference applications.
References:
https://aws.amazon.com/what-is/large-language-model/
https://aws.amazon.com/bedrock/
https://aws.amazon.com/sagemaker/jumpstart/
https://aws.amazon.com/q/
https://aws.amazon.com/machine-learning/trainium/
Which of the following techniques is used by Foundation models to create labels from input data?
(a) Reinforcement learning
(b) Self-supervised learning
(c) Supervised learning
(d) Unsupervised learning
Correct option:
(b) Self-supervised learning
It works when models are provided vast amounts of raw, almost entirely, or completely unlabeled data and then generate the labels themselves.
Foundation models use self-supervised learning to create labels from input data. In self-supervised learning, models are provided vast amounts of raw completely unlabeled data and then the models generate the labels themselves. This means no one has instructed or trained the model with labeled training data sets.
Incorrect options:
(a) Reinforcement learning - Reinforcement learning is a method with reward values attached to the different steps that the algorithm must go through. So the model’s goal is to accumulate as many reward points as possible and eventually reach an end goal.
(c) Supervised learning - In supervised learning, models are supplied with labeled and defined training data to assess for correlations. The sample data specifies both the input and the output for the model. For example, images of handwritten figures are annotated to indicate which number they correspond to. A supervised learning system could recognize the clusters of pixels and shapes associated with each number, given sufficient examples.
Data labeling is the process of categorizing input data with its corresponding defined output values. Labeled training data is required for supervised learning. For example, millions of apple and banana images would need to be tagged with the words “apple” or “banana.” Then machine learning applications could use this training data to guess the name of the fruit when given a fruit image.
(d) Unsupervised learning - Unsupervised learning algorithms train on unlabeled data. They scan through new data, trying to establish meaningful connections between the inputs and predetermined outputs. They can spot patterns and categorize data. For example, unsupervised algorithms could group news articles from different news sites into common categories like sports, crime, etc. They can use natural language processing to comprehend meaning and emotion in the article.
References:
https://aws.amazon.com/what-is/foundation-models/
https://aws.amazon.com/what-is/machine-learning/
Which of the following represents the statement regarding Amazon Q Developer?
(a) Amazon Q Developer can neither be used in the integrated development environments (IDEs) nor the AWS Management Console
(b) Amazon Q Developer can only be used in the integrated development environments (IDEs)
(c) Amazon Q Developer can only be used in the AWS Management Console
(d) Amazon Q Developer can be used in integrated development environments (IDEs) as well as the AWS Management Console
Correct option:
(d) Amazon Q Developer can be used in integrated development environments (IDEs) as well as the AWS Management Console
You can use Amazon Q Developer in the AWS Management Console, AWS Console Mobile Application, AWS Marketing website, AWS Documentation website, and chat channels integrated with AWS Chatbot to ask questions about AWS. You can ask Amazon Q about AWS architecture, best practices, support, and documentation. Amazon Q can also help with code that you’re writing with the AWS SDKs and AWS Command Line Interface (AWS CLI).
You can also use Amazon Q Developer in integrated development environments (IDEs) to learn about AWS and get assistance with your software development needs. In IDEs, Amazon Q includes capabilities to provide guidance and support across various aspects of software development, such as answering questions about building on AWS, generating and updating code, security scanning, and optimizing and refactoring code.
Incorrect options:
(a) Amazon Q Developer can neither be used in the integrated development environments (IDEs) nor the AWS Management Console
(b) Amazon Q Developer can only be used in the integrated development environments (IDEs)
(c) Amazon Q Developer can only be used in the AWS Management Console
These three options contradict the explanation provided above, so these are incorrect.
References:
https://docs.aws.amazon.com/amazonq/latest/qdeveloper-ug/q-on-aws.html
https://docs.aws.amazon.com/amazonq/latest/qdeveloper-ug/q-in-IDE.html
Which of the following is regarding Foundation Models (FMs) in the context of generative AI?
(a) FMs use labeled training data sets for supervised learning
(b) FMs use unlabeled training data sets for self-supervised learning
(c) FMs use unlabeled training data sets for supervised learning
(d) FMs use labeled training data sets for self-supervised learning
Correct option:
(b) FMs use unlabeled training data sets for self-supervised learning
In supervised learning, you train the model with a set of input data and a corresponding set of paired labeled output data. Unsupervised machine learning is when you give the algorithm input data without any labeled output data. Then, on its own, the algorithm identifies patterns and relationships in and between the data. Self-supervised learning is a machine learning approach that applies unsupervised learning methods to tasks usually requiring supervised learning. Instead of using labeled datasets for guidance, self-supervised models create implicit labels from unstructured data.
Foundation models use self-supervised learning to create labels from input data. This means no one has instructed or trained the model with labeled training data sets.
via - https://aws.amazon.com/what-is/foundation-models/
Incorrect options:
(a) FMs use labeled training data sets for supervised learning
(c) FMs use unlabeled training data sets for supervised learning
(d) FMs use labeled training data sets for self-supervised learning
These three options contradict the explanation provided above, so these options are incorrect.
References:
https://aws.amazon.com/what-is/foundation-models/
https://docs.aws.amazon.com/sagemaker/latest/dg/jumpstart-foundation-models-fine-tuning.html
https://aws.amazon.com/compare/the-difference-between-machine-learning-supervised-and-unsupervised/
Which of the following are statements regarding the AWS Global Infrastructure? (Select two)
(a) Each AWS Region consists of two or more Edge Locations
(b) Each Availability Zone (AZ) consists of two or more discrete data centers
(c) Each Availability Zone (AZ) consists of one or more discrete data centers
(d) Each AWS Region consists of a minimum of two Availability Zones (AZ)
(e) Each AWS Region consists of a minimum of three Availability Zones (AZ)
Correct options:
(c) Each Availability Zone (AZ) consists of one or more discrete data centers
(e) Each AWS Region consists of a minimum of three Availability Zones (AZ)
AWS has the concept of a Region, which is a physical location around the world where AWS clusters its data centers. AWS calls each group of logical data centers an Availability Zone (AZ). Each AWS Region consists of a minimum of three, isolated, and physically separate AZs within a geographic area. Each AZ has independent power, cooling, and physical security and is connected via redundant, ultra-low-latency networks.
An Availability Zone (AZ) is one or more discrete data centers with redundant power, networking, and connectivity in an AWS Region. All AZs in an AWS Region are interconnected with high-bandwidth, low-latency networking, over fully redundant, dedicated metro fiber providing high-throughput, low-latency networking between AZs.
AWS Regions and Availability Zones Overview: via - https://aws.amazon.com/about-aws/global-infrastructure/regions_az/
Incorrect options:
(a) Each AWS Region consists of two or more Edge Locations
(b) Each Availability Zone (AZ) consists of two or more discrete data centers
(d) Each AWS Region consists of a minimum of two Availability Zones (AZ)
These three options contradict the details provided earlier in the explanation, so these options are incorrect.
Reference:
https://aws.amazon.com/about-aws/global-infrastructure/regions_az/
In the context of the shared responsibility model for AWS cloud services, which of the following best describes the division of responsibilities between the customer and AWS?
(a) AWS is responsible for the security “of” the cloud, including infrastructure, hardware, and software, while the customer is responsible for security “in” the cloud, including data, applications, and access management
(b) AWS is responsible for configuring and managing the security settings of the customer’s applications, while the customer is responsible for the underlying hardware infrastructure
(c) Customers are responsible for ensuring the physical security of data centers, while AWS is responsible for monitoring network traffic and managing user identities
(d) AWS handles all security aspects including data encryption, user access management, and application security, while the customer only needs to manage their virtual machines
Correct option:
(a) AWS is responsible for the security “of” the cloud, including infrastructure, hardware, and software, while the customer is responsible for security “in” the cloud, including data, applications, and access management
In the shared responsibility model, AWS is responsible for the security of the cloud, which includes the physical security of data centers, networking infrastructure, and hardware. The customer is responsible for security in the cloud, which includes securing their data, managing access and identity, configuring network settings, and ensuring application security.
Shared Responsibility Model Overview: via - https://aws.amazon.com/compliance/shared-responsibility-model/
Incorrect options:
(b) AWS is responsible for configuring and managing the security settings of the customer’s applications, while the customer is responsible for the underlying hardware infrastructure - AWS manages the underlying infrastructure, including the hardware, but the customer is responsible for configuring and managing the security settings of their applications.
(c) Customers are responsible for ensuring the physical security of data centers, while AWS is responsible for monitoring network traffic and managing user identities - Customers do not manage the physical security of data centers; AWS is responsible for this aspect of security.
(d) AWS handles all security aspects including data encryption, user access management, and application security, while the customer only needs to manage their virtual machines - AWS does not handle all aspects of security; customers must manage their own data encryption, user access management, and application security.
Reference:
https://aws.amazon.com/compliance/shared-responsibility-model/
What is the sequence of steps in the machine learning process?
(a) Data preprocessing, Model evaluation, Model training, Data collection
(b) Model training, Data collection, Data preprocessing, Model evaluation
(c) Data collection, Data preprocessing, Model training, Model evaluation
(d) Model evaluation, Model training, Data collection, Data preprocessing
Correct option:
(c) Data collection, Data preprocessing, Model training, Model evaluation
The machine learning process typically follows these steps:
Data Collection: Gathering the necessary data from various sources. Data Preprocessing: Cleaning and preparing the data for training, including handling missing values, normalizing data, and splitting it into training and test sets. Model Training: Using the preprocessed data to train a machine learning algorithm, resulting in a trained model. Model Evaluation: Assessing the performance of the trained model using a separate test set to ensure it generalizes well to new, unseen data.
via - https://aws.amazon.com/what-is/machine-learning/
Incorrect options:
(a) Data preprocessing, Model evaluation, Model training, Data collection - Data collection should be the first step, not the last.
(b) Model training, Data collection, Data preprocessing, Model evaluation - Data collection should precede model training.
(d) Model evaluation, Model training, Data collection, Data preprocessing - Model evaluation should come after model training, not before data collection and preprocessing.
References:
How would you differentiate between overfitting and underfitting in the context of machine learning?
(a) Overfitting is desirable as it ensures the model captures all nuances in the training data, while underfitting is desirable as it ensures the model generalizes well to new data
(b) Overfitting occurs when a model performs well on the training data but poorly on new, unseen data, while underfitting occurs when a model performs poorly on both the training data and new, unseen data
(c) Overfitting and underfitting both refer to a model performing equally well on both the training data and new, unseen data
(d) Overfitting occurs when a model is too simple to capture the underlying patterns in the data, while underfitting occurs when a model is too complex and captures noise rather than the actual patterns
Correct option:
(b) Overfitting occurs when a model performs well on the training data but poorly on new, unseen data, while underfitting occurs when a model performs poorly on both the training data and new, unseen data
Overfitting happens when a model learns the training data too well, including noise and outliers, leading to excellent performance on the training data but poor generalization to new, unseen data. Underfitting occurs when a model is too simplistic to capture the underlying patterns in the data, resulting in poor performance on both the training data and new data.
via - https://docs.aws.amazon.com/machine-learning/latest/dg/model-fit-underfitting-vs-overfitting.html
Incorrect options:
(a) Overfitting is desirable as it ensures the model captures all nuances in the training data, while underfitting is desirable as it ensures the model generalizes well to new data - Overfitting is generally undesirable because it reduces a model’s ability to generalize to new data, while underfitting is also undesirable because the model fails to capture important patterns in the data.
(c) Overfitting and underfitting both refer to a model performing equally well on both the training data and new, unseen data - Both overfitting and underfitting are problematic, leading to poor model performance on new data, and not equal performance on training and new data.
(d) Overfitting occurs when a model is too simple to capture the underlying patterns in the data, while underfitting occurs when a model is too complex and captures noise rather than the actual patterns - This option reverses the definitions of overfitting and underfitting.
References:
https://docs.aws.amazon.com/machine-learning/latest/dg/model-fit-underfitting-vs-overfitting.html
https://aws.amazon.com/what-is/overfitting/