AI Prac Terms Flashcards

1
Q

is a centralized logging service that monitors AWS resources and stores application logs and performance metrics. You can use CloudWatch to monitor and observe resources

A

Amazon CloudWatch

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Data Preparation, Transformation and feature engineering Tool

A

SageMaker Data Wrangler

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

These are a feature of SageMaker that you can use to record information about ML models. It includes information such as training details, evaluation metrics, and model performance.

A

Amazon SageMaker Model Cards

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

build ML models without needing to write any code. It does not have any models that can perform content moderation of creative content types.

A

SageMaker Canvas

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

is a service that uses a human workforce to create accurate labels for data that you can use to train models. does not store information about model training and performance for audit purposes.

A

SageMaker Ground Truth

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

This monitors the quality of SageMaker machine learning models in production. You can set up continuous monitoring with a real-time endpoint (or a batch transform job that runs regularly), or on-schedule monitoring for asynchronous batch transform jobs.

A

SageMaker Model Monitor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

is a feature that you can use when you create generative AI applications. They can automatically call Amazon Bedrock APIs and can enhance foundation model (FM) performance. They do not store information about model training and performance for audit purposes. They do Task Coordination, Leverage RAG.

A

Agents for Amazon Bedrock

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

is a feature that helps manage generative AI applications. They filter out unwanted topics or content and add safeguard to the model. They do not store information about model training and performance for audit purposes.

A

Guardrails for Amazon Bedrock

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

offers a suite of integrated development environments (IDEs), including JupyterLab, RStudio, and Visual Studio Code - Open Source (Code-OSS). You can use it to build content moderation models that can handle creative content types. However, this solution requires additional operational overhead.

A

SageMaker Studio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

is a fully managed AI service for image and video analysis. You can use it to identify inappropriate content in images, including drawings, paintings, and animations. it is designed specifically for performing content moderation of the creative content types. Additionally, you can access it directly through an API. Therefore, it requires the least operational overhead.

A

Amazon Rekognition

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

indicates that the model is not making erroneous assumptions about the training data. indicates that the model is not paying attention to noise in the training data. This is an ideal outcome for model training and would not result in model overfitting / underfitting.

A

Low bias & Low variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

indicates that the model is not making erroneous assumptions about the training data. indicates that the model is paying attention to noise in the training data and is overfitting. When a model performs well on training data but fails to generalize to new data

A

Low bias & High variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

indicates that the model is making erroneous assumptions about the training data. indicates that the model is not paying attention to noise in the training data, which will lead to underfitting

A

High bias & Low variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

is an audit resource that provides on-demand access to security and compliance documentation for the AWS Cloud.

A

AWS Artifact

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

provides resources and recommendations for cost optimization, security, and resilience. It evaluates your AWS environment, compares environment settings with best practices, and recommends actions to remediate any deviation from best practices.

A

Trusted Advisor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

is a service that tracks user activity and API usage on AWS. You can use it for audit purposes to record actions taken by users, roles, and services in your AWS account.

A

CloudTrail

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

is a centralized logging service that monitors AWS resources and stores application logs and performance metrics. You can use it to monitor and observe resources

A

CloudWatch

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

uses ML to discover, monitor, and protect sensitive data that is stored in Amazon S3. You can use it to identify and protect PII. You can use it to comply with data governance and privacy regulations

A

Macie

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

provides an overview of your AWS resource configurations. You can use it to identify how resources were configured in the past. it can identify settings that do not meet compliance standards, such as if an S3 bucket is publicly accessible.

A

AWS Config

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

is a fully managed, native JSON document database. You can use it to operate critical document workloads at scale without the need to manage infrastructure. it supports vector search. You can use vector search to store, index, and search millions of vectors with millisecond response times. it can perform real-time similarity queries with low latency.

A

Amazon DocumentDB

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

is a fully managed service that you can use to deploy, scale, and operate it on AWS. You can use it vector database capabilities for many purposes. For example, you can implement semantic search, retrieval augmented generation (RAG) with large language models (LLMs), recommendation engines, and multimedia searches. It supports storing vector embeddings for similarity search capabilities with low latency. It can also scale to store millions of embeddings and can support high query throughput.

A

OpenSearch Service

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

is a feature of SageMaker that helps you explain how a model makes predictions and whether datasets or models reflect bias. It also includes a library to evaluate FM performance. The foundation model evaluation (FMEval) library includes tools to compare FM quality and responsibility metrics, including bias and toxicity scores. FMEval can use built-in test datasets, or you can provide a test dataset that is specific to your use case.

A

SageMaker Clarify

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

is a hub that consists of hundreds of open source pre-trained models for a wide range of problem types. However, a company cannot insert its models into it.

A

SageMaker JumpStart

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

is a fully managed catalog for ML models. You can use it to manage model versions, associate metadata with models, and manage model approval status. You can use SageMaker Canvas to push built models to it. SageMaker Studio users can then access the same thing and the models in the registry. This solution requires the least operational overhead because the company needs only to register the models to implement the workflow.

A

SageMaker Model Registry

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
are vector representations of content that captures semantic relationships. It provide content with similar meanings to have close vector representations. They are a crucial component of text generation models. It generates a numerical representation of the text. The output is an array of numerical values.
Embeddings
23
is a vulnerability management service that continuously scans workloads for software vulnerabilities and unintended network exposure. It assesses the security and compliance of your AWS resources by performing automated security checks based on best practices and common vulnerabilities. It can assess EC2 instances and Amazon ECR repositories to provide detailed findings and recommendations for remediation. You can use it to maintain a secure and compliant AWS environment.
Amazon Inspector
24
is a service that uses natural language processing (NLP) to extract insights from documents. It can use built-in or custom models to analyze text in real-time. You can recognize entities, extract key phrases, detect dominant languages, detect and redact PII, determine sentiment, detect targeted sentiment, or analyze syntax.
Amazon Comprehend
25
is a service that you can use to add document text detection and analysis to applications. You can use it to identify handwritten text, to extract text from documents, and to extract specific information from documents. It does not provide access to FMs.
Amazon Textract
26
is an intelligent search service that provides answers to questions based on the data that is provided. It uses semantic and contextual understanding to provide specific answers. It does not provide access to FMs.
Amazon Kendra
27
is a generative AI virtual assistant that can answer questions, summarize content, generate content, and complete tasks based on the data that is provided. It does not provide access to FMs.
Amazon Q Business
28
is the process of optimizing the output of a large language model, so it references an authoritative knowledge base outside of its training data sources before generating a response.
Retrieval-Augmented Generation (RAG)
29
refers to the process of taking a pre-trained langauge model and further training it on a specific tasks or domain-specific dataset. This allows the model to adapt its knowledge and capabilities to better suit the requirments of the business use case.
Fine-tuning
30
Type of fine tuning, it provides human feedback data resulting in a model that is better aligned with human preferences.
Reinforcement learning from human feedback (RLHF)
31
To adapt FMs with knowledge more relevant to a domain, you can engage this which leverages vast sets of unlabeled data. With it, you can expand the model’s understanding to include the language used in your domain and improve the model’s overall competency for your business. Also called domain-adaptation fine-tuning
Continued Pre-Training
32
This approach is a method where a model developed for one task is reused as the starting point for a model on a second task. Widely used for image classification
Transfer learning
33
is a prompt engineering technique that breaks down a complex question into smaller parts. This prompting is the recommended technique when you have arithmetic and logical tasks that require reasoning.
Chain of thought
34
Used for conversational voice and text. It is a fully managed artificial intelligence (AI) service with advanced natural language models to design, build, test, and deploy conversational interfaces in applications.
Amazon Lex
35
Converts speech to text
Amazon Transcribe
36
Converts text to speech
Amazon Polly
37
AWS Services focused on Personalized Product Information
Amazon Personalize
38
Predicts future points in time series data
Amazon Forecast
38
Translates between 75 languages
Amazon Translate
39
AWS Services that * Detects fraud and fradulant activities * Checks online transactions, product reviews, checkouts and payments
Amazon fraud Detector
40
Unfair prejudice or preference that favors or disfavors a person or group.
Bias
41
Core Dimension of Responsible AI: Ensuring AI models are unbiased and do not discriminate against individuals or groups based on protected characteristics
Fairness
42
When a model performs well on training data but fails to generalize to new data; Low bias and high variance: Low bias indicates that the model is not making erroneous assumptions about the training data. High variance indicates that the model is paying attention to noise in the training data and is overfitting.
Overfitting
43
occurs when a model performs poorly on both the training data and new, unseen data, HIgh Bias and Low Variance
Underfitting
44
Core Dimension of Responsible AI: This refers to the characteristic of an AI model to clearly explain or provide justification for its internal mechanisms and decisions so that it is understandable to humans. Helps to understand WHY the model made the decision that it made. It gives insight into the limitations of a mode
Explainability
45
is a feature of model transparency. It is the degree to which a human can understand the cause of a decision. it is access into a system so that a human can interpret the model's output based on the weights and features. Users might misinterpret the model’s output, which could lead to incorrect conclusions or decisions.
Interpretability
46
Predict the customer turnover rate for a telecommunication company Create a text sentiment analysis application - it doesn't generate new content
Traditional ML model
47
Develop a large patent repository of English-to-French translations that includes image processing Build unique, realistic images or videos from text prompts and descriptions for advertising and marketing campaigns
Generative AI model
48
- a set of metrics used to evaluate automatic summarization of texts in addition to machine translation quality in NLP.it is widely used because it is not complex. It is interpretable, and correlates reasonably well with human judgment, especially when evaluating the recall aspect of summaries.
Recall-Oriented Understudy for Gisting Evaluation ROUGE
49
- a metric used to evaluate the quality of text that has been machine-translated from one natural language to another.it is fundamentally a precision metric. It checks how many words or phrases in the machine translation appear in the reference translations.
Bilingual Evaluation Understudy BLEU
50
is a parameter used in language models to limit the selection of tokens to the it most probable options during text generation, controlling the balance between diversity and predictability in the output. It allows data scientists to fine-tune the model's creativity and coherence when deploying and using language models through SageMaker's infrastructure, often in combination with other sampling techniques.
Top-K
51
is a setting that controls the diversity of the text by limiting the number of words that the model can choose from based on their probabilities. it is set on a scale of 0-1.
Top P
52
(like 0.25), the model will only consider words that make up the top 25% of the total probability distribution. This can help the output to be more focused and coherent because the model is limited to choosing from the most probable words given the context.
Low Top P
53
the model will consider a broad range of possible words for the next word in the sequence because it will include words that make up the top 99% of the total probability distribution. This can lead to more diverse and creative outputs because the model has a wider pool of words to choose from.
High top P (0.99)
54
score balances precision and recall by combining them in a single metric. its score is a metric that you can use to evaluate classification models. ? = 2 * P * R / P + R ( P = Precision , R = Recall )
F1
55
It uses pretrained contextual embeddings from models like it to evaluate the quality of text-gen tasks. ItA computes the cosine similarity between the contextual embeddings of words in the candidate and reference text. It is sensitive to minor paraphrasing and synonym usage that does not affect the overall meaning conveyed by the text. It is used in cases where capturing the deeper semantic meaning of the text is important. It is a metric that you can use to evaluate the quality of text that is generated by a text-to-text language model. It measures the semantic similarity between the generated text and the reference text. Therefore, you can use It to assess the similarity between chatbot and human responses.
Bidirectional encoder representation tranformers BERT
56
Core Dimension of Responsible AI: Ensuring AI systems operate reliably and consistently and are resilient to potential failures or adversarial attacks.
Semantic Robustness
57
is a metric that you can use to evaluate language models, particularly in the context of natural language processing (NLP) tasks. It measures the probability of a model to generate a given sequence of words.
Perplexity
58
This is a fundamental classification metric used to evaluate the overall performance of a binary (or multi-class) classification model. It is a simple and widely used metric that provides a general understanding of how well the model is performing
Accuracy
59
It is a classification metric used to evaluate the performance of a binary classification model. It is particularly useful when the focus is on the reliability of the positive predictions made by the model. True positives/(true positives + false positives)
Precision
60
It is a classification metric used to evaluate the performance of a binary classification model. It is particularly useful when the focus is on correctly identifying instances of the positive (or target) class. This classification metric is True Positives / (True Positives + False Negatives). This metric is particularly important in scenarios where the cost of missing a positive instance (false negative) is high, such as in medical diagnostics, fraud detection, or spam filtering.
Recall
61
False Positives / (False Positives + True Negatives)
False Postive Rate (FPR)
62
This is a metric that you can use to evaluate regression models by measuring the average squared difference between the predicted and actual values. It is not suitable to evaluate the semantic similarity between chatbot and human responses.
Mean Squared Error (MSE)
62
This classification metric can show what the curve for true positive compared to false positive looks like at various thresholds. It is primarily used to evaluate the performance of binary classification models. This metric provides a way to measure the ability of a classification model to distinguish between two classes (typically a positive class and a negative class). It is based on the tradeoff between the true positive rate (TPR) and the false positive rate (FPR) of the model
Area under the curve AUC-ROC
63
A commonly used metric with linear regression problems. It explains the fraction of variance accounted for by the model. It’s like a percentage, reporting a number from 0 to 1. When this metric is close to 1, it usually indicates that a lot of the variance in the data can be explained by the model itself.
R Squared
64
This parameter in generative models is a scaling factor that controls the randomness or diversity of the generated outputs. A higher value increases the probability of sampling from less likely or lower-probability output tokens, resulting in a more diverse and unpredictable response. A lower value favors the most probable outputs, leading to more deterministic and repetitive respones. Higher Value generates Most Creative, Random Output.
temperature
65
A fully managed service that data scientists and developers use to quickly build, train, and deploy ML models.
Sagemaker
66
A fully managed service that makes FMs from Amazon and leading AI companies available through an API. It has a broad set of capabilities to quickly build and scale genAI applications with security, privacy and responsible AI. With it serverless experience, you can quickly get started using FMs without the need to manage any infrastructure. You can also privately customize FMs with your own data and seamlessly integrate and deploy them into your apps using AWS tools and capabilities.
Bedrock
67
a hyperparameter controls the step size at which a model's parameters are updated during training. It determines how quickly or slowly the model's parameters are updated during training. - -
Learning Rate
68
A means that the parameters are updated by a large step size, which can lead to faster convergence but may also cause the optimization process to overshoot the optimal solution and become unstable.
high learning rate
69
means that the parameters are updated by a small step size, which can lead to more stable convergence but at the cost of slower learning.
A low learning rate
70
Accelerate visual content creation in the cloud
Amazon Nimble Studio
71
Generative adversarial networks (GANs), Variational autoencoders (VAEs), Transformers, Diffusion Model
Gen AI Architectures
71
3D Content creation
Amazon Sumerian
72
Identify use case -> Feature Engineering -> Experiment and Select Model -> Adapt, Align, Augment -> Evaluate -> Deploy and Integrate → Monitor
AI Project Lifecycle stages
73
Continually audit your AWS usage to simplify risk and compliance assessment
AWS Audit Manager
74
Real-time inference allows you to deploy your model to SageMaker hosting services and get a fully managed, autoscaling endpoint that can be used for real-time inference. Serverless version of this lets you deploy and scale without managing any underlying architecture. Asynchronous inference queues incoming, large requests and processes them asynchronously. Batch transform is for batch of this
SageMaker inference
75
Resource to help customers better understand our AWS AI services. They are a form of responsible AI documentation that provides a single place to find information on the intended use cases and limitations, responsible AI design choices, and deployment and performance optimization best practices for AWS AI services
AWS AI Service Cards
76
helps debug and optimize machine learning models by monitoring and profiling training jobs in real-time. It does not address label inconsistencies directly.
Amazon SageMaker Debugger
77
is a metric used to evaluate the quality of automatic summarization and machine translation by measuring the overlap of n-grams (sequences of n words) between a system-generated summary and one or more reference summaries created by humans.- focuses on unigrams (individual words)-
ROUGE-N
78
focuses on unigrams (individual words) as part of ROUGE
ROUGE-1
79
assesses bigrams (pairs of consecutive words), with higher scores indicating greater similarity between the generated and reference texts as part of ROUGE
ROUGE-2
80
is a type of machine learning where the algorithm is trained on a labeled dataset, meaning the input data is paired with the correct output. The goal is for the algorithm to learn the mapping between input and output so it can accurately predict outcomes for new, unseen data.
Supervised learning
81
involves training algorithms on unlabeled data, without predefined outputs or correct answers. The goal is for the algorithm to discover hidden patterns, structures, or relationships within the data on its own, often used for clustering, dimensionality reduction, or anomaly detection.
Unsupervised learning
82
is a hybrid approach that combines elements of both supervised and unsupervised learning, using a small amount of labeled data along with a larger amount of unlabeled data. This method aims to leverage the benefits of both approaches, improving model performance when fully labeled datasets are scarce or expensive to obtain.
Semi-supervised learning
83
are specific tokens or phrases that instruct an AI model to cease generating text at a designated point, such as the end of a sentence or list. They can enhance control over output by ensuring that the generated content does not exceed the desired length or format, allowing for more structured and concise responses.
Stop sequences
84
Part of instruction-based fine-tuning a way to fine tune, refers to a conversational interaction where the user provides a single message or query, and the system responds with a single response, without any further back-and-forth exchange.
Single-Turn Messaging
85
Part of fine tuning, chat bots, instruction-based for conversation, alternate between user and assistant roles
Multi-turn Messaging
86
What are Workshop on Machine Translation (WMT) Stanford Question Answering Dataset (SQuAD) General Language Understanding Evaluation (GLUE) SuperGlue
Popular benchmark datasets
87
What are the following types of Amplification Interaction Algorithm Data
Bias
88
Type of bias can also arise from the way humans interact with AI systems or the context in which the AI is deployed. For example, if an AI system for facial recognition is primarily tested on a certain demographic group, it may perform poorly on other groups.
Interaction
89
Type of bias AI systems can perpetuate existing societal biases, if not properly designed and monitored. This can lead to unfair treatment or discrimination against certain groups, even if it was not intentional. And with more adoption of AI, there is increased risk of bias increasing further, especially through social media platforms.
Amplify
90
Type of bias It and models used in AI systems can introduce biases, even if the training data is unbiased. This can happen due to the inherent assumptions or simplifications made by it, in particular for underrepresented groups, or due to machine learning models optimize for performance, not necessarily for fairness.
Algorithms
91
Type of Bias If the training data used to train an AI model is biased or underrepresents certain groups, the resulting model may exhibit biases in its predictions or decisions. For example, if an AI system for hiring is trained on historical data that reflects past adverse decision towards an individual or a group based on their characteristics, it may perpetuate those biases in its recommendations.
Data
92
Challenge of GenAI: Generating content that is offensive, disturbing, or inappropriate
Toxicity
93
Challenge of GenAI:Assertions or claims that sound true, but are incorrect
Hallucinations
94
Challenge of GenAI:Worries that Gen AI can be used to write college essays, writing samples for job applications.
Plagiarism and Cheating
95
Challenge of GenAI:The model might generate different outputs for the same input, which can cause problems in applications where reliability is key.
Nondeterminism
96
Challenge of GenAI:Generative AI models trained on sensitive data might inadvertently generate an output that violates regulations, such as exposing personally identifiable information (PII).
Regulatory violations
97
Challenge of GenAI: The information shared with your model can include personal information and can potentially violate privacy laws.
Data security and privacy concerns
98
Challenge of GenAI:The possibility of unwanted content that might reflect negatively on your organization is a social risk.
Social risks
99
type of what Poisoning Hijacking and Prompt Injection Exposure Prompt Leaking
Prompt Misuses
100
Prompt Misuses: Intentional introduction of malicious or biased data into training dataset model
Poisoning
101
Prompt Misuses: Influencing the outputs by embedding specific instructions
Hijacking and Prompt Injection
102
Prompt Misuses: Risk of exposing sensitive or confidential information during training or inference
Exposure
103
Prompt Misuses: The unintentional disclosure or inputs used within a model, It can expose protected data or other data used by the model such as how the model workrs
Prompt Leaking
104
Circumvent the constraints and safety measures implemented in a generative model to uptain unauthorized access
Jailbreaking
105
Human-Centered Design for Explainable AI: Minimize risk and errors in a stressful or high-pressure env
Design for amplified decision-making
106
Human-Centered Design for Explainable AI: Decision process is free from bias
Design for unbiased decision-making
107
Human-Centered Design for Explainable AI: Cognitive apprenticeship: AI systems learn from human instructors and experts
Design for human and AI learning
108
Approach to design AI systems with priorities for human needs
Human-Centered Design for Explainable AI
109
Supervised learning: Used to predict a number value based on input data Output variable is continuous, meaning it can take any value within a range
Regression
110
Supervised learning: Used to predict the categorical label of input data Output variable is discrete, if falls into a specific category or class
Classification
111
This ss a type of artificial neural network architecture designed to process sequential data, such as text, speech, or time series. These are particularly useful for tasks where the output at a given time step depends not only on the current input, but also on the previous inputs and the internal state of the network.
Recurrent Neural Network RNN
112
Deep Convolutional Neural Network used for image recognition tasks, object dection, facial recognition
Residual Network ResNet
113
ML algorithm for classification and regression
Support Vector Machine SVM
114
Model to generate raw audio waveform, used in speech synthesis
WaveNet
115
An implementation of gradient boosting
XGBoost
116
The process of using domain knowledge to select and transform raw data into meaningful features
Feature Engineering
117
Settings that define the model structure an learning algorithm and process
Hyperparameter
118
this type prompting is a technique where a user presents a task to a generative model without providing any examples or explicit training for that specific task
Zero Prompting
119
Core Dimension of Responsible AI: AI refers to the practice of HOW you might communicate information about an AI system. This helps stakeholders to make informed choices about their use of the system.
Transparency
120
Core Dimension of Responsible AI: Identifying and mitigating potential risks and unintended consequences associated with AI systems.
Safety
121
Core Dimension of Responsible AI: Maintaining appropriate human control and oversight over AI systems, particularly in high-stakes decision-making scenarios
Controllability
122
is a supervised learning technique used for predicting continuous values. It involves determining the relationship between a dependent variable and one or more independent variables. By analyzing the patterns in historical data, regression models can predict future outcomes, making it ideal for tasks like forecasting stock prices, real estate values, or portfolio performance.
Regression
123
refers to supervised learning models that use one or more inputs to predict a value on a continuous scale. It is used to predict housing prices. After training a model using a set of historical sales training data that includes those characteristics, you could forecast the price of a property based on its location, age, and number of rooms.
Linear regression
124
This is the process of identifying data points, events, or observations that deviate significantly from the norm or expected pattern within a dataset. It is a fundamental task in data analysis and machine learning, with applications across various domains, such as fraud detection, network security, system monitoring, and predictive maintenance.
Anomaly detection
125
this option is generally used to estimate the likelihood or possibility of a random variable falling within a particular range of values, not for predicting future values based on historical data
Probability density
126
This model create new data by iteratively making controlled random changes to an initial data sample. They start with the original data and add subtle changes (noise), progressively making it less similar to the original
Diffusion Model
127
is a technique that combines multiple individual models to produce a single, more robust and accurate prediction. The idea behind it is that by combining the strengths of different models, the resulting model can outperform any individual model.
Ensemble model
128
This network type works by training two neural networks in a competitive manner. The first network, known as the generator, generates fake data samples by adding random noise. The second network, called the discriminator, tries to distinguish between real data and the fake data produced by the generator
Generative adversarial networks (GANs)
129
This uses two neural networks—the encoder and the decoder. The encoder neural network maps the input data to a mean and variance for each dimension of the latent space.
Variational autoencoders (VAE)
130
part of fine tuning, is a method that you can use to customize a pre-trained FM by fine-tuning the model on a specific task or domain-specific information. You can use it to improve the performance of an FM by fine-tuning the model on industry-specific terminology.
Domain adaptation fine-tuning
131
This type of fine-tuning, also known as prompt-based fine-tuning or few-shot learning, is a technique used to adapt large language models (LLMs) to perform specific tasks or respond to specific instructions, often with limited training data. The key idea behind this fine-tuning is to provide the model with a set of examples or instructions that demonstrate the desired task or behavior, rather than training the model on a large dataset of labeled examples. This approach is particularly useful when the available training data is scarce or the desired task is quite specific.
Instruction-based fine-tuning
132
is a framework that you can use to classify generative AI use cases. You can use the framework to determine the level of ownership required for a use case and to prioritize security concerns.
Generative AI Security Scoping Matrix
133
You can use this classification models when the input has to be classified into one of two classes. You can use a this classification model to classify feedback as positive or negative.
Binary Classification Model
134
This involves the selection of data attributes or variables during the development of a predictive model. This is the ML lifecycle stage where you incorporate parameters into the model.
Feature selection
135
You can use this to create, store, share, and manage features that are used in ML models. Features are the data for an ML algorithm to learn patterns from.
SageMaker Feature Store
136
This provides quick and efficient ML model development. This provides pre-designed and optimized solutions for a variety of tasks.
SageMaker built-in algorithms
137
This automates the creation, training, and tuning of ML models. You can use itt to develop high-quality models with minimal effort.
SageMaker Autopilot
138
This is an unsupervised ML method that identifies discrete groupings within data. The k-means algorithm is a type of this algorithm.
Clustering algorithm
139
This prompt injection attack format provides an LLM with malicious instructions that are written in an altered or non-human-readable format, like base64 encoding. This tactic is used to bypass application input filters that might prevent the model from processing harmful instructions.
changing the input
140
An LLM responds differently depending on whether the LLM deems a user as friendly or adversarial. This" attack format uses friendly language to prompt an LLM with malicious instructions.
exploiting friendliness
141
This type of attack format makes the LLM adopt a new, malicious persona.
prompting persona switches
142
This measures the proportion of correctly classified instances out of all instances. This is a common metric that you can use in classification tasks to evaluate the performance of models in categorizing data into predefined categories.
Classification accuracy
143
This is the mean of the absolute differences between the actual values and the predicted values, divided by the actual values. You can use MAPE in numeric predictions to understand model prediction errors. You can use MAPE to predict the annual sales volume of a specific product.
Mean absolute percentage error (MAPE)
144
This measures how different the predicted and actual values are when the values are averaged over all values. You can use this in numeric predictions to understand model prediction errors. You can use this to predict the annual sales volume of a specific product.
Mean absolute error (MAE)
145
Challenges of generative AI: The proficiency with which generative AI is able to create compelling text and images, perform well on standardized tests, write entire articles on given topics, and successfully summarize or improve the grammar of provided articles has created some anxiety. There is a concern that some professions might be replaced or seriously disrupted by the technology.
Disruption of Work
146
Core Dimension of Responsible AI: This dimension in responsible AI provides a framework for building and operating AI systems and applications in a way that data is protected from theft and exposure.
Privacy and security
147
Core Dimension of Responsible AI: This dimensions refers to the set of processes that are used to define, implement, and enforce responsible AI practices within an organization.
Governance
148
This is a service that helps build the workflows required for human review of ML predictions. Amazon brings human review to all developers and removes the undifferentiated heavy lifting associated with building human review systems or managing large numbers of human reviewers.
Augmented AI A2I
149
With this SageMaker feature, administrators can define minimum permissions in minutes.
SageMaker Role Manager
150
With this SageMaker feature, you can keep your team informed on model behavior in production, all in one place.
SageMaker Model Dashboard
151
overcome bias and variance errors: This is a technique for evaluating ML models by training several ML models on subsets of the available input data and evaluating them on the complementary subset of the data. This should be used to detect overfitting.
Cross-validation
152
overcome bias and variance errors: Add more data samples to increase the learning scope of the model.
Increase Data
153
overcome bias and variance errors: This is a method that penalizes extreme weight values to help prevent linear models from overfitting training data examples. This technique in GenAI aim to address these biases by incorporating additional constraints or penalties during the model training process.
Regularization
154
overcome bias and variance errors: Use this to help with overfitting. If the model is underfitting, the model might be too simple.
simpler model architectures
155
overcome bias and variance errors: End training early so that the model does not memorize the data.
Stop training early
156
overcome bias and variance errors: This is an unsupervised machine learning algorithm that attempts to reduce the dimensionality (number of features) within a dataset while still retaining as much information as possible.
Dimension reduction
157
This is an approach to creating products and services that are intuitive, easy to use, and meet the needs of the people who will be using them.
Human-centered design (HCD)
158
As part of model evaluation you can use this SageMaker feature to experiment with multiple combinations of data, algorithms, and parameters, all while observing the impact of incremental changes on model accuracy
SageMaker Experiments
159
As part of model evaluation Hyperparameter tuning is a way to find the best version of your models. You can do this by using this SageMaker feature to run many jobs with different hyperparameters in combination and measuring each of them by a metric that you choose.
SageMaker Automatic Model Tuning
160
What type of metrics are these used for Accuracy Precision Recall F1 AUC-ROC
Classification metrics
161
What type of metrics are these used for Mean squared error R squared
Regression metrics
162
What type of matrix can help classify why and how a model gets something wrong.
confusion matrix
163
what makes up the following: Processing code in data preparation Training data and training code in model building Candidate models, test, and validation data in model evaluation Metadata during model selection Deployment-ready models and inference code during deployment Production code, models, and data for monitoring
MLOPs
164
What is the goal of feature engineering in the machine learning lifecycle?
To transform data and create variables for the model
165
Which solutions are part of the reinforcement learning from human feedback (RLHF) design?
Build a separate reward model Supervised fine tuning of language model Data Collection Optimize the LM with the reward-based model
166
This prompt attack format asks a model through a prompt template to ignore its instructions. By instructing the model to this attack, a user can instruct the LLM to provide output on a prohibited or harmful topic.
ignoring the prompt template
167
Which earning techniques is used by Foundation models to create labels from input data?
self-supervised