AI Practice Test #3 Flashcards

1
Q

Model customization methods

A

Model customization involves further training and changing the weights of the model to enhance its performance. You can use continued pre-training or fine-tuning for model customization in Amazon Bedrock.

Continued Pre-training

Fine-tuning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Continued Pre-training

A

In the continued pre-training process, you provide unlabeled data to pre-train a foundation model by familiarizing it with certain types of inputs. You can provide data from specific topics to expose a model to those areas. The Continued Pre-training process will tweak the model parameters to accommodate the input data and improve its domain knowledge.

For example, you can train a model with private data, such as business documents, that are not publicly available for training large language models. Additionally, you can continue to improve the model by retraining the model with more unlabeled data as it becomes available.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Fine-tuning

A

While fine-tuning a model, you provide labeled data to train a model to improve performance on specific tasks. By providing a training dataset of labeled examples, the model learns to associate what types of outputs should be generated for certain types of inputs. The model parameters are adjusted in the process and the model’s performance is improved for the tasks represented by the training dataset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

A company is using Amazon Bedrock based Foundation Model in a Retrieval Augmented Generation (RAG) configuration to provide tailored insights and responses based on client data stored in Amazon S3. Each team within the company is assigned to different clients and uses the foundation model to generate insights specific to their clients’ data. To maintain data privacy and security, the company needs to ensure that each team can only access the model responses generated from the data of their respective clients, preventing any unauthorized access to other teams’ client data.

What is the most effective approach to implement this access control and maintain data security?

A

The company should create a service role for Amazon Bedrock for each team, granting access only to the specific team’s clients data in Amazon S3

This is the correct approach because creating a service role for each team that has specific access to their data in Amazon S3 ensures fine-grained control over who can access which data. By assigning specific service roles to Amazon Bedrock, the company can enforce data security and privacy rules at the team level, ensuring that each team only has access to the data they are authorized to use. This method also aligns with AWS best practices for secure and controlled access management.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Amazon SageMaker Automatic Model Tuning (AMT)

A healthcare analytics company is using Amazon SageMaker Automatic Model Tuning (AMT) to optimize its machine learning models for predicting patient outcomes. To ensure the models are performing at their best, the data science team is configuring the autotune settings but needs to understand which parameters are mandatory for successful tuning. Properly setting these configurations will allow the team to enhance model accuracy and performance efficiently.

Which of the following options is mandatory for the given use case?

A

None

Choosing the correct hyperparameters requires experience with machine learning techniques and can drastically affect your model performance. Even with hyperparameter tuning, you still need to specify multiple tuning configurations, such as hyperparameter ranges, search strategy, and number of training jobs to launch. Correcting such a setting is intricate and typically requires multiple experiments, which may incur additional training costs.

Amazon SageMaker Automatic Model Tuning can automatically choose hyperparameter ranges, search strategy, maximum runtime of a tuning job, early stopping type for training jobs, number of times to retry a training job, and model convergence flag to stop a tuning job, based on the objective metric you provide. This minimizes the time required for you to kickstart your tuning process and increases the chances of finding more accurate models with a lower budget.

Incorrect options:

Hyperparameter ranges

Tuning strategy

Number of jobs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Serverless Inference

A

Serverless Inference

On-demand Serverless Inference is ideal for workloads that have idle periods between traffic spurts and can tolerate cold starts.

Amazon SageMaker Serverless Inference is a purpose-built inference option that enables you to deploy and scale ML models without configuring or managing any of the underlying infrastructure.

Serverless endpoints automatically launch compute resources and scale them in and out depending on traffic, eliminating the need to choose instance types or manage scaling policies. This takes away the undifferentiated heavy lifting of selecting and managing servers. Serverless Inference integrates with AWS Lambda to offer you high availability, built-in fault tolerance, and automatic scaling. With a pay-per-use model, Serverless Inference is a cost-effective option if you have an infrequent or unpredictable traffic pattern. During times when there are no requests, Serverless Inference scales your endpoint down to 0, helping you to minimize your costs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Unsupervised learning

A

Unsupervised learning algorithms train on unlabeled data. They scan through new data and establish meaningful connections between the unknown input and predetermined outputs. For instance, unsupervised learning algorithms could group news articles from different news sites into common categories like sports and crime.

Clustering

Dimensionality reduction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Clustering

A

Clustering is an unsupervised learning technique that groups certain data inputs, so they may be categorized as a whole. There are various types of clustering algorithms depending on the input data. An example of clustering is identifying different types of network traffic to predict potential security incidents.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Dimensionality reduction

A

Dimensionality reduction is an unsupervised learning technique that reduces the number of features in a dataset. It’s often used to preprocess data for other machine learning functions and reduce complexity and overheads. For example, it may blur out or crop background features in an image recognition application.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Decision tree

A

The decision tree is a supervised machine learning technique that takes some given inputs and applies an if-else structure to predict an outcome. An example of a decision tree problem is predicting customer churn.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Neural network

A

A neural network solution is a more complex supervised learning technique. To produce a given outcome, it takes some given inputs and performs one or more layers of mathematical transformation based on adjusting data weightings. An example of a neural network technique is predicting a digit from a handwritten image.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Sentiment analysis

A

This is an example of semi-supervised learning. Semi-supervised learning is when you apply both supervised and unsupervised learning techniques to a common problem. This technique relies on using a small amount of labeled data and a large amount of unlabeled data to train systems. When considering the breadth of an organization’s text-based customer interactions, it may not be cost-effective to categorize or label sentiment across all channels. An organization could train a model on the larger unlabeled portion of data first, and then a sample that has been labeled.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Amazon SageMaker Data Wrangler

A

Amazon SageMaker Data Wrangler reduces the time it takes to aggregate and prepare tabular and image data for ML from weeks to minutes. With SageMaker Data Wrangler, you can simplify the process of data preparation and feature engineering, and complete each step of the data preparation workflow (including data selection, cleansing, exploration, visualization, and processing at scale) from a single visual interface. You can use SQL to select the data that you want from various data sources and import it quickly. Next, you can use the data quality and insights report to automatically verify data quality and detect anomalies, such as duplicate rows and target leakage. SageMaker Data Wrangler contains over 300 built-in data transformations, so you can quickly transform data without writing any code.

SageMaker Data Wrangler offers a selection of over 300 prebuilt, PySpark-based data transformations, so you can transform your data and scale your data preparation workflow without writing a single line of code. Preconfigured transformations cover common use cases such as flattening JSON files, deleting duplicate rows, imputing missing data with mean or median, one hot encoding, and time-series–specific transformers to accelerate the preparation of time-series data for ML.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Amazon SageMaker Clarify

A

SageMaker Clarify helps identify potential bias during data preparation without writing code. You specify input features, such as gender or age, and SageMaker Clarify runs an analysis job to detect potential bias in those features.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Amazon SageMaker Ground Truth

A

Amazon SageMaker Ground Truth offers the most comprehensive set of human-in-the-loop capabilities, allowing you to harness the power of human feedback across the ML lifecycle to improve the accuracy and relevancy of models. You can complete a variety of human-in-the-loop tasks with SageMaker Ground Truth, from data generation and annotation to model review, customization, and evaluation, either through a self-service or an AWS-managed offering.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Amazon SageMaker Feature Store

A

Amazon SageMaker Feature Store is a fully managed, purpose-built repository to store, share, and manage features for machine learning (ML) models. Features are inputs to ML models used during training and inference. For example, in an application that recommends a music playlist, features could include song ratings, listening duration, and listener demographics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Which of the following performance metrics would you recommend to the team for evaluating the effectiveness of its classification system?

A

Precision, Recall and F1-Score

Precision, Recall, and F1-Score are standard performance metrics used to evaluate the effectiveness of a classification system:

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Precision

A

Measures the accuracy of the positive predictions, calculated as the ratio of true positives to the sum of true positives and false positives.

19
Q

Recall (Sensitivity)

A

Measures the ability of the classifier to identify all positive instances, calculated as the ratio of true positives to the sum of true positives and false negatives.

20
Q

F1-Score

A

The harmonic mean of Precision and Recall, providing a single metric that balances both concerns.

21
Q

Amazon SageMaker Clarify

A

You can use SageMaker Clarify to identify potential bias in data preparation, allowing you to detect and measure bias in datasets and models to ensure fairness and transparency in machine learning applications

SageMaker Clarify is specifically designed to help identify and mitigate bias in machine learning models and datasets. It provides tools to analyze both data and model predictions to detect potential bias, generate reports, and help ensure that models are fair and transparent. It can help identify and measure bias within the data preparation stage and throughout the model’s lifecycle. This capability is essential for building trustworthy AI systems that do not inadvertently discriminate against specific groups.

22
Q

Batch inference

A

Batch inference is the most suitable choice for processing a large payload of several gigabytes with Amazon SageMaker when there is no need for immediate responses. This method allows the company to run predictions on large volumes of data in a single batch job, which is more cost-effective and efficient than processing individual requests in real-time. Batch inference can handle large datasets and is ideal for scenarios where waiting for the responses is acceptable, making it the best fit for this use case.

SageMaker Batch Transform will automatically split your input file of several gigabytes (GBs) into whatever payload size is specified if you use “SplitType”: “Line” and “BatchStrategy”: “MultiRecord”.

23
Q

Data security and compliance aspects of Amazon Bedrock

A

The company’s data is not used to improve the base Foundation Models (FMs) and it is not shared with any model providers

Amazon Bedrock is a fully managed service that makes high-performing foundation models (FMs) from leading AI startups and Amazon available for your use through a unified API. Using Amazon Bedrock, you can easily experiment with and evaluate top foundation models for your use cases, privately customize them with your data using techniques such as fine-tuning and Retrieval Augmented Generation (RAG), and build agents that execute tasks using your enterprise systems and data sources.

With Amazon Bedrock, your content is not used to improve the base models and is not shared with any model providers.

Your data in Amazon Bedrock is always encrypted in transit and at rest, and you can optionally encrypt the data using your own keys. You can use AWS PrivateLink with Amazon Bedrock to establish private connectivity between your FMs and your Amazon Virtual Private Cloud (Amazon VPC) without exposing your traffic to the Internet.

24
Q

AWS Artifact

A

The company should use AWS Artifact to facilitate on-demand access to AWS compliance reports and agreements, as well as allow users to receive notifications when new compliance documents or reports, including ISV compliance reports, are available

This is the correct option because AWS Artifact is specifically designed to provide access to a wide range of AWS compliance reports, including those from Independent Software Vendors (ISVs). AWS Artifact allows users to configure settings to receive notifications when new compliance documents or reports are available. This capability makes it an ideal choice for a company that needs timely email alerts regarding the availability of ISV compliance reports.

The new third-party reports tab on the AWS Artifact Reports page provides on-demand access to security compliance reports of Independent Software Vendors (ISVs) who sell their products through AWS Marketplace.

You can subscribe to notifications and create configurations to get notified when a new report or agreement, or a new version of an existing report or agreement becomes available on AWS Artifact.

25
Q

AWS Audit Manager

A

AWS Audit Manager is focused on helping users automate evidence collection for auditing purposes and assess their AWS environment against specific compliance frameworks. It does not offer functionality for accessing or receiving notifications about ISV compliance reports from AWS.

26
Q

AWS Trusted Advisor

A

AWS Trusted Advisor is a service that provides guidance to optimize AWS resources by analyzing security, cost, performance, and fault tolerance. However, it does not provide features for managing or receiving notifications about compliance reports, including ISV compliance reports. Therefore, it is not suitable for this requirement.

27
Q

AWS Config

A

AWS Config is a tool for monitoring and recording AWS resource configurations and evaluating them against desired configurations. While it is useful for maintaining configuration compliance, it does not deal with external compliance reports or provide notification capabilities for ISV compliance documents, making it irrelevant to the company’s needs.

28
Q

Amazon Q Business

A

Amazon Q Business is a fully managed, generative-AI powered assistant that you can configure to answer questions, provide summaries, generate content, and complete tasks based on your enterprise data. It allows end users to receive immediate, permissions-aware responses from enterprise data sources with citations, for use cases such as IT, HR, and benefits help desks.

Amazon Q Business is powered by Amazon Bedrock.

29
Q

Amazon Q Apps

A

Amazon Q Business allows web experience users to create lightweight, purpose-built Q Apps to fulfill specific tasks from within their web experience. For example, you can use Amazon Q Business to create an app with a web experience that exclusively generates marketing-related content to improve your marketing team’s productivity.

30
Q

Amazon SageMaker Jumpstart

A

Amazon SageMaker JumpStart is a machine learning hub with foundation models, built-in algorithms, and prebuilt ML solutions that you can deploy with just a few clicks With SageMaker JumpStart, you can access pre-trained models, including foundation models, to perform tasks like article summarization and image generation.

31
Q

Amazon Kendra

A

Amazon Kendra is an intelligent search service that uses natural language processing and advanced machine learning algorithms to return specific answers to search questions from your data.

32
Q

Amazon SageMaker Asynchronous Inference

A

Requests with large payload sizes up to 1GB and long processing times

Amazon SageMaker Asynchronous Inference is a capability in SageMaker that queues incoming requests and processes them asynchronously. This option is ideal for requests with large payload sizes (up to 1GB), long processing times (up to one hour), and near real-time latency requirements. Asynchronous Inference enables you to save on costs by autoscaling the instance count to zero when there are no requests to process, so you only pay when your endpoint is processing requests.

33
Q

Multi-class classification

A

Multi-class classification assigns each instance to one of several possible classes

ML models for multiclass classification problems allow you to generate predictions for multiple classes (predict one of more than two outcomes).

34
Q

multi-label classification

A

multi-label classification assigns each instance to one or more classes

With multi-label classification, you can train models and classify your documents with more than one label. For example, you can use multi-label classification to categorize customer contact transcripts with one or more labels to identify departments within your company like Payments, Renewals, or Tech Support. These labels can then be mapped to relevant content in your support library or directed towards the appropriate contacts within your company.

35
Q

underlying techniques in the increasing order of complexity for implementing a solution to improve the performance of a Foundation Model (FM) being used in Amazon Bedrock.

A

Prompt engineering

Retrieval Augmented Generation (RAG)

Fine-tuning

36
Q

Prompt engineering

A

Prompt engineering is the practice of carefully designing prompts to efficiently tap into the capabilities of FMs. It involves the use of prompts, which are short pieces of text that guide the model to generate more accurate and relevant responses. With prompt engineering, you can improve the performance of FMs and make them more effective for a variety of applications.

37
Q

Retrieval Augmented Generation (RAG)

A

Retrieval Augmented Generation (RAG) allows you to customize a model’s responses when you want the model to consider new knowledge or up-to-date information. When your data changes frequently, like inventory or pricing, it’s not practical to fine-tune and update the model while it’s serving user queries. To equip the FM with up-to-date proprietary information, organizations turn to RAG, a technique that involves fetching data from company data sources and enriching the prompt with that data to deliver more relevant and accurate responses.

RAG produces quality results, due to augmenting use case-specific context directly from vectorized data stores. Compared to prompt engineering, it produces vastly improved results with massively low chances of hallucinations. RAG has a higher complexity than prompt engineering because you need to have coding and architecture skills to implement this solution.

38
Q

Fine-tuning

A

Fine-tuning is the process of taking a pre-trained FM, such as Llama 2, and further training it on a downstream task with a dataset specific to that task. The pre-trained model provides general linguistic knowledge, and fine-tuning allows it to specialize and improve performance on a particular task like text classification, question answering, or text generation. With fine-tuning, you provide labeled datasets—which are annotated with additional context—to train the model on specific tasks.

Model customization - such as fine-tuning - has higher complexity than prompt engineering and RAG because the model’s weight and parameters are being changed via tuning scripts, which requires data science and ML expertise.

39
Q

Convolutional Neural Networks (CNNs)

A

Convolutional Neural Networks (CNNs) are specifically designed for processing and classifying image data. They use convolutional layers to automatically and adaptively learn spatial hierarchies of features from input images, making them highly effective for tasks such as image recognition and classification.

40
Q

Recurrent Neural Networks (RNNs)

A

Recurrent Neural Networks (RNNs) are typically used for sequence data, such as time series or natural language processing tasks. RNNs are not the best fit for image classification.

41
Q

Generative Adversarial Networks (GANs)

A

Generative Adversarial Networks (GANs) are used for generating new data that resembles the training data, such as creating realistic images, but are not specifically designed for image classification.

42
Q

Retrieval-Augmented Generation (RAG)

A

Retrieval-Augmented Generation (RAG) is the process of optimizing the output of a large language model, so it references an authoritative knowledge base outside of its training data sources before generating a response. While the RAG framework itself is not a single neural network, it integrates multiple neural network components to enhance text generation tasks. RAGs are not designed for image classification.

43
Q

Deep learning

A

Deep learning is a subset of machine learning that uses neural networks with many layers to learn from large amounts of data, while traditional machine learning algorithms often require feature extraction and can use various methods such as decision trees or support vector machines

Deep learning is a subset of machine learning that employs neural networks with multiple layers (hence the term “deep”) to automatically learn representations from large datasets. Traditional machine learning, on the other hand, often involves manual feature extraction and can use a variety of algorithms like decision trees, support vector machines, and linear regression.

44
Q

Machine learning - Feature Determination

A

In traditional machine learning, a data scientist manually determines the set of relevant features that the software must analyze, whereas in deep learning, the data scientist gives only raw data to the software and the deep learning network derives the features by itself

Traditional machine learning methods require human input for the machine learning software to work sufficiently well. A data scientist manually determines the set of relevant features that the software must analyze. This limits the software’s ability, which makes it tedious to create and manage.

On the other hand, in deep learning, the data scientist gives only raw data to the software. The deep learning network derives the features by itself and learns more independently. It can analyze unstructured datasets like text documents, identify which data attributes to prioritize, and solve more complex problems.