AI Practice Test #3 Flashcards
Model customization methods
Model customization involves further training and changing the weights of the model to enhance its performance. You can use continued pre-training or fine-tuning for model customization in Amazon Bedrock.
Continued Pre-training
Fine-tuning
Continued Pre-training
In the continued pre-training process, you provide unlabeled data to pre-train a foundation model by familiarizing it with certain types of inputs. You can provide data from specific topics to expose a model to those areas. The Continued Pre-training process will tweak the model parameters to accommodate the input data and improve its domain knowledge.
For example, you can train a model with private data, such as business documents, that are not publicly available for training large language models. Additionally, you can continue to improve the model by retraining the model with more unlabeled data as it becomes available.
Fine-tuning
While fine-tuning a model, you provide labeled data to train a model to improve performance on specific tasks. By providing a training dataset of labeled examples, the model learns to associate what types of outputs should be generated for certain types of inputs. The model parameters are adjusted in the process and the model’s performance is improved for the tasks represented by the training dataset.
A company is using Amazon Bedrock based Foundation Model in a Retrieval Augmented Generation (RAG) configuration to provide tailored insights and responses based on client data stored in Amazon S3. Each team within the company is assigned to different clients and uses the foundation model to generate insights specific to their clients’ data. To maintain data privacy and security, the company needs to ensure that each team can only access the model responses generated from the data of their respective clients, preventing any unauthorized access to other teams’ client data.
What is the most effective approach to implement this access control and maintain data security?
The company should create a service role for Amazon Bedrock for each team, granting access only to the specific team’s clients data in Amazon S3
This is the correct approach because creating a service role for each team that has specific access to their data in Amazon S3 ensures fine-grained control over who can access which data. By assigning specific service roles to Amazon Bedrock, the company can enforce data security and privacy rules at the team level, ensuring that each team only has access to the data they are authorized to use. This method also aligns with AWS best practices for secure and controlled access management.
Amazon SageMaker Automatic Model Tuning (AMT)
A healthcare analytics company is using Amazon SageMaker Automatic Model Tuning (AMT) to optimize its machine learning models for predicting patient outcomes. To ensure the models are performing at their best, the data science team is configuring the autotune settings but needs to understand which parameters are mandatory for successful tuning. Properly setting these configurations will allow the team to enhance model accuracy and performance efficiently.
Which of the following options is mandatory for the given use case?
None
Choosing the correct hyperparameters requires experience with machine learning techniques and can drastically affect your model performance. Even with hyperparameter tuning, you still need to specify multiple tuning configurations, such as hyperparameter ranges, search strategy, and number of training jobs to launch. Correcting such a setting is intricate and typically requires multiple experiments, which may incur additional training costs.
Amazon SageMaker Automatic Model Tuning can automatically choose hyperparameter ranges, search strategy, maximum runtime of a tuning job, early stopping type for training jobs, number of times to retry a training job, and model convergence flag to stop a tuning job, based on the objective metric you provide. This minimizes the time required for you to kickstart your tuning process and increases the chances of finding more accurate models with a lower budget.
Incorrect options:
Hyperparameter ranges
Tuning strategy
Number of jobs
Serverless Inference
Serverless Inference
On-demand Serverless Inference is ideal for workloads that have idle periods between traffic spurts and can tolerate cold starts.
Amazon SageMaker Serverless Inference is a purpose-built inference option that enables you to deploy and scale ML models without configuring or managing any of the underlying infrastructure.
Serverless endpoints automatically launch compute resources and scale them in and out depending on traffic, eliminating the need to choose instance types or manage scaling policies. This takes away the undifferentiated heavy lifting of selecting and managing servers. Serverless Inference integrates with AWS Lambda to offer you high availability, built-in fault tolerance, and automatic scaling. With a pay-per-use model, Serverless Inference is a cost-effective option if you have an infrequent or unpredictable traffic pattern. During times when there are no requests, Serverless Inference scales your endpoint down to 0, helping you to minimize your costs.
Unsupervised learning
Unsupervised learning algorithms train on unlabeled data. They scan through new data and establish meaningful connections between the unknown input and predetermined outputs. For instance, unsupervised learning algorithms could group news articles from different news sites into common categories like sports and crime.
Clustering
Dimensionality reduction
Clustering
Clustering is an unsupervised learning technique that groups certain data inputs, so they may be categorized as a whole. There are various types of clustering algorithms depending on the input data. An example of clustering is identifying different types of network traffic to predict potential security incidents.
Dimensionality reduction
Dimensionality reduction is an unsupervised learning technique that reduces the number of features in a dataset. It’s often used to preprocess data for other machine learning functions and reduce complexity and overheads. For example, it may blur out or crop background features in an image recognition application.
Decision tree
The decision tree is a supervised machine learning technique that takes some given inputs and applies an if-else structure to predict an outcome. An example of a decision tree problem is predicting customer churn.
Neural network
A neural network solution is a more complex supervised learning technique. To produce a given outcome, it takes some given inputs and performs one or more layers of mathematical transformation based on adjusting data weightings. An example of a neural network technique is predicting a digit from a handwritten image.
Sentiment analysis
This is an example of semi-supervised learning. Semi-supervised learning is when you apply both supervised and unsupervised learning techniques to a common problem. This technique relies on using a small amount of labeled data and a large amount of unlabeled data to train systems. When considering the breadth of an organization’s text-based customer interactions, it may not be cost-effective to categorize or label sentiment across all channels. An organization could train a model on the larger unlabeled portion of data first, and then a sample that has been labeled.
Amazon SageMaker Data Wrangler
Amazon SageMaker Data Wrangler reduces the time it takes to aggregate and prepare tabular and image data for ML from weeks to minutes. With SageMaker Data Wrangler, you can simplify the process of data preparation and feature engineering, and complete each step of the data preparation workflow (including data selection, cleansing, exploration, visualization, and processing at scale) from a single visual interface. You can use SQL to select the data that you want from various data sources and import it quickly. Next, you can use the data quality and insights report to automatically verify data quality and detect anomalies, such as duplicate rows and target leakage. SageMaker Data Wrangler contains over 300 built-in data transformations, so you can quickly transform data without writing any code.
SageMaker Data Wrangler offers a selection of over 300 prebuilt, PySpark-based data transformations, so you can transform your data and scale your data preparation workflow without writing a single line of code. Preconfigured transformations cover common use cases such as flattening JSON files, deleting duplicate rows, imputing missing data with mean or median, one hot encoding, and time-series–specific transformers to accelerate the preparation of time-series data for ML.
Amazon SageMaker Clarify
SageMaker Clarify helps identify potential bias during data preparation without writing code. You specify input features, such as gender or age, and SageMaker Clarify runs an analysis job to detect potential bias in those features.
Amazon SageMaker Ground Truth
Amazon SageMaker Ground Truth offers the most comprehensive set of human-in-the-loop capabilities, allowing you to harness the power of human feedback across the ML lifecycle to improve the accuracy and relevancy of models. You can complete a variety of human-in-the-loop tasks with SageMaker Ground Truth, from data generation and annotation to model review, customization, and evaluation, either through a self-service or an AWS-managed offering.
Amazon SageMaker Feature Store
Amazon SageMaker Feature Store is a fully managed, purpose-built repository to store, share, and manage features for machine learning (ML) models. Features are inputs to ML models used during training and inference. For example, in an application that recommends a music playlist, features could include song ratings, listening duration, and listener demographics.
Which of the following performance metrics would you recommend to the team for evaluating the effectiveness of its classification system?
Precision, Recall and F1-Score
Precision, Recall, and F1-Score are standard performance metrics used to evaluate the effectiveness of a classification system: