AWS AI Practitioner Flashcards

Question 1

Q

Portion of training training data is labeled and feedback is provided in the form of rewards or penalties. What type of learning

Answer

A

Reinforcement learning

Question 2

Q

What are the two types of inferencing?

Answer

A

Batch and Real time

Question 3

Q

Deep learning is used in which use 2 cases?

Answer

A

Computer vision and NLP

Question 4

Q

What are FMs in generative AI?

Answer

A

Pretrained models

Question 5

Q

What are Transformer models?

Answer

A

Builds encoder decoder concept in genAI. They use self-attention to process input data. Self-attention allows the model to weigh the importance of different words in a sentence when encoding a particular word

Question 6

Q

Are FMs pre trained using reinforced learning? True or False?

Answer

A

False. FMs are typically pre-trained through self-supervised learning

Question 7

Q

Where is pre-text tasks used for?

Answer

A

In self-supervised learning

Question 8

Q

Self-supervised learning makes use of the structure within the data to autogenerate labels. True or False?

Question 9

Q

Optimization of pre trained FMs are done using what?

Answer

A

Prompt engineering,
Retrieval-augmented generation (RAG),
Fine-tuning on task-specific data

Question 10

Q

LLMs, Diffusion and Multiodel models are what?

Answer

A

These are FM models

Question 11

Q

These are numerical representations of tokens, where each token is assigned a vector (a list of numbers) that captures its meaning and relationships with other tokens?

Answer

A

Embeddings

Question 12

Q

What is a context window?

Answer

A

The maximum number of tokens a LLM model can take when generating text

Question 13

Q

What is a vector?

Answer

A

It is an array of numercial values

Question 14

Q

What is the process of vectorization?

Answer

A

Text -> [Tokenization]->Tokens -> [Embeddings Model] -> Vectors

Question 15

Q

What is the process of vectorization in Bedrock KBs using RAG?

Answer

A

Customer KB->[Upload in Amazon S3]->[Select a vector DB]->[Select a Model]->[Sync with customer KB]->Vectorization of Customer KB text

Question 16

Q

What is Watermark detection for Amazon Bedrock?

Answer

A

Identify images generated by Amazon Titan Image Generator, a foundation model that allows users to create realistic, studio-quality images in large volumes and at low cost, using natural language prompts

Question 17

Q

What is continued pretraining in Amazon Bedrock?

Answer

A

You provide unlabeled data to pre-train a model by familiarizing it with certain types of inputs

Question 18

Q

Which is the models which gradually add more and more meaningful information to this noise until they end up with a clear and coherent output, like an image or a piece of text?

Answer

A

Diffusion model

Question 19

Q

Which model has generator and discriminator?

Answer

A

Generative adversarial networks

Question 20

Q

Which model has encoders and decoders?

Answer

A

Varional autoencoders

Question 21

Q

What are the components of prompt engineering

Answer

A

Instructions, Context, Input data and Output indicator

Question 22

Q

What are non-determistic LLMs popularly called?

Answer

A

Generative Language Models

Question 23

Q

What is a supervised learning process that involves taking a pre-trained model and adding specific, smaller datasets?

Answer

A

Fine tuning

Question 24

Q

Two types of fine tuning

Answer

A

Instruction fine-tuning and Reinforcement learning from human feedback (RLHF)

Question 25

Q

Fine tuning does it add weight to the data?

Question 26

Q

What is Retrieval-augmented generation (RAG)?

Answer

A

Supplies domain-relevant data as context to produce responses based on that data.

Question 27

Q

To create fine tuned models in Bedrock what is the pricing option?

Answer

A

Provisioned Throughput only which billed by the hour

Question 28

Q

How is RAG different from fine tuning

Answer

A

Rather than having to fine-tune an FM with a small set of labeled examples, RAG retrieves a small set of relevant documents and uses that to provide context to answer the user prompt

Question 29

Q

What are two types of supervised learning?

Answer

A

Classification and Regression

Question 30

Q

Predicting continuous or numerical values based on one or more input variable?

Answer

A

Regression

Question 31

Q

Forcasting uses which supervised learning technique?

Answer

A

Regression

Question 32

Q

Diagnostic uses which supervised learning technique?

Answer

A

Classfication

Question 33

Q

What are two types of unsupervised learning?

Answer

A

Clustering and Dimensionality reduction

Question 34

Q

Examples of RAG vector databases?

Answer

A

Amazon OpenSearch Service(KNN capability, vector embeddings),
DynamoDB(high performance,vector embeddings),
Aurora(RDS),
RDS for PostgreqSQL(RDS and open source),
Neptune(GraphQL)

Question 35

Q

Grouping of unstructured data is done in which type of unsupervised learning?

Answer

A

Clustering

Question 36

Q

Reducing the number of features or dimensions in a dataset in which type of unsupervised learning?

Answer

A

Dimensionality reduction

Question 37

Q

Which learning type continuously improves its model by mining feedback from previous iterations?

Answer

A

Reinforcement learning

Question 38

Q

In which learning the reward of a desired outcome is known, but the path to achieving it isn’t?

Answer

A

Reinforcement learning

Question 39

Q

How to reduce toxity risk in generative AI?

Answer

A

Use guardrail models

Question 40

Q

What does guradrail models do?

Answer

A

These models will detect and filter out unwanted content

Question 41

Q

What is the risk term for when model generates inaccurate responses that are not consistent with the training data?

Answer

A

Hellucinations

Question 42

Q

What is the risk term when model might generate different outputs for the same input?

Answer

A

Nondeterminism

Question 43

Q

What is the risk term when the information shared with your model can include personal information and can potentially violate privacy laws?

Answer

A

Data security and privacy concerns

Question 44

Q

What is the risk term when output generated by model has PII?

Answer

A

Regulatory violations

Question 45

Q

Which generative AI model used for chatbots?

Question 46

Q

Which generative AI model used for code generation?

Question 47

Q

Which generative AI model used for code gaming?

Answer

A

Stable Diffusion

Question 48

Q

Which generative AI has embeddings?

Answer

A

Amazon Titan

Question 49

Q

Which generative AI has a use case of Healthcare – summarize key ideas from long text?

Question 50

Q

What are the capabilities of generative AI?

Answer

A

SPARCD

Adaptability
Responsiveness
Simplicity
Creativity and exploration
Data efficiency
Personalization

Question 51

Q

Recommendation engines, gaming, and voice assistance are examples of which type of AI system?

Answer

A

Traditional AI

Question 52

Q

Chatbots, code generation, and text and image generation are examples of which type of AI system?

Answer

A

Generative AI

Question 53

Q

When is a model is underfitted?

Answer

A

When a model has a high bias

Question 54

Q

Overfitting happens when?

Answer

A

When model performs well on the training data but does not perform well on the evaluation data

Question 55

Q

SageMaker Clarify

Answer

A

You can automatically evaluate FMs for your generative AI use case with metrics such as accuracy, robustness, and toxicity to support your responsible AI initiative

Question 56

Q

SageMaker Clarify is used for text based models only. True or False?

Question 57

Q

Model evaluation on Amazon Bedrock

Answer

A

Evaluate, compare, and select the best foundation model for your use case in just a few clicks

Question 58

Q

Can in human evaluation, we can automate it?

Answer

A

Yes, using built in task types

Question 59

Q

Amazon Bedrock Guardrails

Answer

A

Guardrails helps control the interaction between users and FMs by filtering undesirable and harmful content, redacting personally identifiable information (PII), and enhancing content safety and privacy in generative AI applications

Question 60

Q

Amazon SageMaker Data Wrangler

Answer

A

Offers three balancing operators: random undersampling, random oversampling, and Synthetic Minority Oversampling Technique (SMOTE) to rebalance data in your unbalanced datasets

Question 61

Q

Amazon SageMaker Experiments

Answer

A

Provide scores detailing which features contributed the most to your model prediction on a particular input for tabular, natural language processing (NLP), and computer vision models

You can use to create, manage, analyze, and compare your machine learning experiments.

Question 62

Q

Amazon A2I

Answer

A

Human review of ML predictions

Question 63

Q

SageMake governance tools

Answer

A

Amazon SageMaker Role Manager - define minimum permissions in minutes

Amazon SageMaker Model Cards - capture, retrieve, and share essential model information, such as intended uses, risk ratings, and training details, from conception to deployment

Amazon SageMaker Model Dashboard - You can keep your team informed on model behavior in production, all in one place

Question 64

Q

AWS AI Service Cards

Answer

A

Responsible AI documentation

Basic concepts to help customers better understand the service or service features
Intended use cases and limitations
Responsible AI design considerations
Guidance on deployment and performance optimization

Answer 59

A

Value alignment
Responsible reasoning skills
Appropriate level of autonomy
Transparency and accountability

Answer 60

A

Curating datasets is the process of labeling, organizing, and preprocessing the data

Answer 61

A

Data preprocessing, augmentation and audit

Answer 62

A

AWS AI Service Cards and Amazon SageMaker Model Cards

Answer 63

A

SageMaker Clarify and SageMaker Autopilot

Answer 64

A

Sagemaker Canvas using AutoML powered by Sagemaker Autopilot

Answer 65

A

Interpretability

Answer 66

A

They tell how single feature influence the predicted outcome. Used for interpretibiity and explainability

Answer 67

A

Design for amplified decision making
Design for unbiased decision making
Design for human and AI learning

Answer 68

A

Design for amplified decision-making

Answer 69

A

Design for unbiased decision-making

Answer 70

A

Design for human and AI learning

Answer 71

A

(RLHF) is an ML technique that uses human feedback to optimize ML models to self-learn more efficiently

Answer 72

A

Amazon SageMaker Ground Truth

Answer 73

A

It is the process of creating, transforming, extracting, and selecting variables from data. Convert raw data into meaningful data

Answer 74

A

80,10,10 or 70,15,15

Answer 75

A

Amazon SageMaker Data Wrangler

Answer 76

A

Amazon SageMaker Feature Store

Answer 77

A

Amazon SageMaker Canvas

Answer 78

A

Access to ready models from Bedrock and Jumpstart and No coding is required
It is integrated with Comprehend, Rekognition and Textextract

Answer 79

A

Provides pretrained, open source models that customers can use for a wide range of problem types

Answer 80

A

Amazon SageMaker Experiments

Answer 81

A

Amazon SageMaker Automatic Model Tuning

Answer 82

A

So if you have a higher learning rate, that means that your model is going
to have a faster conversions, but there is a risk of you to overshoot the optimal solution because while you’re going too fast for learning.

And if you have a low learning rate, it may be more precise but slower convergence

Answer 83

A

Amazon SageMaker Model Monitor

Answer 84

A

(FxKL)
Linera learner
Factorization machines
XGBoost
K-Nearst Neighbours(KNN)

Answer 85

A

Clustering - K-means, LDA
Topic Modeling - LDA
Embeddings - Object2Vec
Anomoly detection - Random cut forest, IP insights
Dimensionality reduction - Pricipal component analysis (PCA)

Answer 86

A

Image classification - MXNet tensor flow
Object detection - MXNet tensor flow
Semantic segmentation - FCN,PSP,Deeplab V3
Time series - DeepAR

Answer 87

A

Text classification -Blazing text
Word2Vec - Blazing text
Machine translation - Sequence to sequence
Topic modeling - LDA,NTM
Speech - Sequence to sequence

Answer 88

A

High bias

Answer 89

A

High variance

Answer 90

A

Feature selection for more important features and multiple sets of training and test sets of data

Answer 91

A

Confusion matrix

Answer 92

A

It is used to evaluate the performance of the model that does classfication

Answer 93

A

(TP+TN)/(TP+FP+TN+FN)

Answer 94

A

(TP)/(TP+FP)

Answer 95

A

When the cost of false positives are high in your particular business situation

Think about a classification model that identifies emails as spam or not. In this case, you do not want your model labeling a legitimate email as spam and preventing your users from seeing that email.

Answer 96

A

(TP)/(TP+FN)

Answer 97

A

If it is extremely important and vital to the success of the model that it not give false negative results

Think about a model that needs to predict whether a patient has a terminal illness or not

Answer 98

A

ROC is a probability curve, and AUC represents the degree or measure of separability.

In general, AUC-ROC can show what the curve for true positive compared to false positive looks like at various thresholds.

Answer 99

A

You take the difference between the prediction and actual value, square that difference, and then sum up all the squared differences for all the observations and divide by number of predictions

Answer 100

A

R squared explains the fraction of variance accounted for by the model

Answer 101

A

MSE focuses the measure of model performance.
R squared provides a measure of the model’s goodness of fit to the data.

Answer 102

A

Developers can experiment with two or more variants of a model and help achieve the business goals.

Answer 103

A

Real-time

Answer 104

A

Asynchronous

Answer 105

A

Batch transform

Answer 106

A

Serverless

Answer 107

A

Productivity

Answer 108

A

Reliability

Answer 109

A

Manage entire ML lifecycle in Sagemaker Studio

Answer 110

A

Repeatibility

Answer 111

A

Auditibility

Answer 112

A

Prepare data:
Sagemake data wrangle
Sagemake processing job

Curate feature:
Sagemake feature store

Experiment tracking:
Sagemaker experiments

Train model:
Sagemaker training job

Evaluate model:
Sagemaker processing job

Register model:
Sagemaker model registry

Deploy model:
Deployments

Manage model:
Sagemaker model monitor

Answer 113

A

LLM - text generation, contextual question answering, summarization, and classification

Answer 114

A

Embeddings, text generation, and image generation

Answer 115

A

Art vision and text AI models

Answer 116

A

Text-based responses based on prompts

Answer 117

A

LLM - generate coherent and contextually relevant text

Answer 118

A

large reasoning capabilities or are highly specialized, like synthetic text generation, code generation, RAG, or agents

Answer 119

A

Can generate images of from text input.

Answer 120

A

Using prompt engineering, RAG, fine-tuning, or automation agents

Answer 121

A

Augmentation

Answer 122

A

Ensembling

Answer 123

A

Customer support, virtual assistants
Journalism and research
Content marketing

Answer 124

A

Provide you the capability of amassing data sources into a repository of information

Answer 125

A

RAG without customization

Answer 126

A

Pay as you go
Based on no of tokens in input and response for text based models

Pay as you go
Based on no of images in input and response for image based models

Batch
Multiple predictions at a time and sent as 1 file to S3
50% discount

Provisioned Throughput
Based on no of input and response tokens processed each minute for text based models and is called Provisioned Throughput

Answer 127

A

Taking a pre-trained language model and further training it on a specific task or domain-specific dataset

Answer 128

A

Prompt Tuning

Answer 129

A

Labeled examples
Prompt-response pairs

Answer 130

A

Particular field or area of knowledge
Unlabeled data
Domain specific training

Answer 131

A

Reinforcement learning from human feedback (RLHF)

Answer 132

A

Selecting the appropriate neural network architecture, layers, and hyperparameters
A large and diverse dataset must be curated, cleaned, and preprocessed
Model is initialized with random weights and trained using various optimization algorithms

Answer 133

A

Carry out various multi-step tasks related to infrastructure provisioning, application deployment, and operational activities.

Task coordination
Reporting and logging
Scalability and concurrency
Integration and communication

Answer 134

A

Cheapest to Costly

Prompt Engineering - No model training needed

RAG - Use external knowledge but no FM changes. Cost for using vector dbs

Instruction based fine tuning - FM is fine tuned with instructions and change the tone of the model.

Domain Adaption fine tuning - Domain specific model training

Answer 135

A

GLUE - text classification, question answering, and natural language inference
SuperGLUE - compositional language understanding
SQuAD - question-answering capabilities
WMT - machine translation systems

Answer 136

A

Perplexity (a measure of how well the model predicts the next token)
BLEU score (for evaluating machine translation)
F1 score (for evaluating classification or entity recognition tasks)

Answer 137

A

(2xPrecisionxRecall) / (Precision+Recall)

Answer 138

A

Automated metrics can be useful for rapid iterations and fine-tuning during model development

Answer 139

A

They fail to capture the nuances and complexities of human language and might not align perfectly with human judgments

Answer 140

A

Automatic summarization and machine translation systems
The main idea behind ROUGE is to count the number of overlapping units

Answer 141

A

Similarity between a generated text and one or more reference translations

Used to evaluate the quality of text that has been machine-translated from one natural language to another

Answer 142

A

Compute contextualized embeddings for the input texts, and then calculates the cosine similarity between them

It relies on semantic similarity rather than relying on exact lexical matches

Answer 143

A

Compare N-gram matches
Vs
Evaluate Quality(Prcesion and penalizes)
Vs
Semantic similarity(Compare embeddings)
Vs
How confident the model to predict next token(lower is better)

Answer 144

A

Assess the performace of a FM in text summarization, machine translation, and open-ended text generation

Answer 145

A

Negative prompting is used to guide the model away from producing certain types of content or exhibiting specific behaviors

Answer 146

A

Intructions, Context, Input data and desired output

Answer 147

A

Inference parameters

Answer 148

A

Temperature - A higher temperature makes the output more diverse and unpredictable, and a lower temperature makes it more focused and predictable

Top P - With a low top p setting, like 0.250, the model will only consider words that make up the top 25 percent of the total probability distribution. Higher P means more diverse

Top K - Set to 50, the model will only consider the 50 most likely words for the next word in the sequence

Answer 149

A

Maximum length - Used in text summarization and translation
Stop Sequence - When the model encounters a stop sequence during the inference process, it will terminate the generation regardless of the maximum length setting

Answer 150

A

Running Small Language Models on an edge device

Answer 151

A

Zero-shot - Present task to generative model w/o and example
Few-shot - Present task to generative model with some examples
Chain of Thought - Divides intricate reasoning tasks into smaller, intermediary steps

Answer 152

A

Poisoning - intentional introduction of malicious or biased data

Hijacking, and prompt injection - influencing the outputs of generative models by embedding specific instructions

Answer 153

A

Risk of exposing sensitive or confidential information from its training corpus

Answer 154

A

Exposing the prompt or inputs used within the model or data used by the model

Answer 155

A

Modifying or circumventing the constraints and safety measures implemented in a generative model or AI assistant to gain unauthorized access or functionality

Answer 156

A

k-NN or cosine similarity

Answer 157

A

Amazon Opesearch
pgvector extension in RDS
Amazon Kendra

Answer 158

A

Accuracy
Speed and efficiency
Scalability

Answer 159

A

True. It is a set of questions and answers provided by the SME. Model’s response to the same questions is compared with benchmark datasets answers and model performance is scored

Answer 160

A

Instruction tuning

Answer 161

A

Reinforcement learning from human feedback (RLHF)

Answer 162

A

Continuous pretraining

Answer 163

A

Data curation
Labeling
Governance and compliance
Representativeness and bias checking
Feedback integration

Answer 164

A

ROUGE-N - This metric primarily assesses the fluency of the text and the extent to which it includes key ideas from the reference. Compare N-gram matches between required vs actual output

ROUGE-L - It is good at evaluating the coherence and order of the narrative in the outputs. Compare the longest sequence of words matche between required vs actual output

Answer 165

A

Measures the precision of N-grams in the machine-generated text that appears in the reference texts and applies a penalty for overly short translations (brevity penalty)

Answer 166

A

iAM and NACLs

Answer 167

A

Managing, optimizing, and scaling the organizational AI initiative
Maintaining responsible and trustworthy AI practices
Establish clear policies, guidelines, and oversight mechanisms

Answer 168

A

Data residency

Answer 169

A

Data logging

Answer 170

A

Data analysis

Answer 171

A

Accuracy
Precision
Recall
F1-score
Latency

Answer 172

A

Fine tuning a model using your data vs training a model from scratch using your data

Answer 173

A

It refers to the act of properly attributing and acknowledging the sources of the data used to train the model.

Datasets
Databases
Other sources

Answer 174

A

It provides detailed information about the provenance, or the place of origin of the data used to train the model.

Details about the data collection process
The methods used to curate and clean the data
Any preprocessing or transformations applied to the data

Answer 175

A

Data lineage

Answer 176

A

Cataloging

Answer 177

A

Model cards

Answer 178

A

Scope 1 : Consumer App (ChatGpt)
Scope 2 : Enterprise App (SaaS like Amazon Q developer)
Scope 3: Pre-trained models (Amazon Bedrock)
Scope 4: Fine-tuned models (Amazon Bedrock customized or SageMaker Jumpstart)
Scope 5: Self trained models (SageMaker )

Answer 179

A

Automation and access control - AWS Glue
Data collection - Kinesis, DMS, Glue
Data prep and cleaning - EMR or Glue
Data quality check - Glue data brew or Glue data quality check
Data visualization and analysis - Quicksight or Neptune
IaC deployment - CloudFormation
Monitoring and Debugging - CloudWatch

Answer 180

A

Reuse
Adapt
Customize
Start from scratch

Answer 181

A

Subscription based - Lite and Pro + Data storage for client documents

Answer 182

A

Same as Guardrails

Answer 183

A

Amazon Bedrock

Answer 184

A

Amazon SageMaker

Answer 185

A

Amazon Titan Text Express, LLAMA 2, Claude, stability.ai

Claude can take maximum tokens - 200K
stability.ai is for image

Content creation is by Titan
Text generation and customer service by LLAMA 2
Analysis and Forecasting by Claude
Image creation by stability.ai

Answer 186

A

Its a playground on Amazon Bedrock to build GenAI apps
You can access without having AWS account

Answer 187

A

Amazon Comprehend

Answer 188

A

Amazon Transcribe to remove PII information

Answer 189

A

Amazon Transcribe to transcribe technical terms and jargons and context

Answer 190

A

Amazon Polly to read specific type of text and add break, whisper etc

Answer 191

A

Amazon Lex to provide input parameters

Answer 192

A

Comprehend, Connect, Lambda function & Kendra