AWS AI Practitioner Confusions Flashcards

Question 1

Q

ROUGE
Vs
BLEU
Vs
BERTScore
Vs
Perplexity

Answer

A

Compare N-gram matches
Vs
Evaluate Quality(Prcesion and penalizes)
Vs
Semantic similarity(Compare embeddings)
Vs
How confident the model to predict next token(lower is better)

Question 2

Q

ROUGE-N
Vs
ROUGE-L

Answer

A

ROUGE-N - This metric primarily assesses the fluency of the text and the extent to which it includes key ideas from the reference. Compare N-gram matches between required vs actual output

ROUGE-L - It is good at evaluating the coherence and order of the narrative in the outputs. Compare the longest sequence of words matche between required vs actual output

Question 3

Q

Fine tuned models vs Self trained models

Answer

A

Fine tuning a model using your data vs training a model from scratch using your data

Question 4

Q

Retrieval-augmented generation (RAG)
Vs
Instruction fine-tuning

Answer

A

Supplies domain-relevant data as context to produce responses based on that data.
Vs
Labeled examples and Prompt-response pairs

Question 5

Q

Regression
Vs
Classification

Answer

A

Predicting continuous or numerical values based on one or more input variable
Vs
Diagnostic uses which supervised learning technique

Question 6

Q

Real Toxicity
Vs
BOLD
Vs
TREX
Vs
WikiText-2

Answer

A

RealToxicityPrompts is a dataset for measuring the degree to which racist, sexist, or otherwise toxic language presents in Pretrained neural language models (LMs).
(Text Generation-Toxicity)
Vs
Bias in Open-ended Language Generation Dataset (BOLD) is a dataset to evaluate fairness in open-ended language generation in English language.
(Text Generation-Toxicity)
Vs
Used for Relation Extraction and Natural Language Generation.
(Text Generation-Accurcy and Robustness)
Vs
Collection of over 100 million tokens extracted from the set of verified Good and Featured articles on Wikipedia
(Text Generation-Robustness)

Question 7

Q

Gigaword
Vs
Women’s Ecommerce Clothing Reviews

Answer

A

Gigaword provides headline-generation on a corpus of article pairs consisting of around 4 million articles.
(Text Summarization)
Vs
Dataset revolves around the reviews written by customers
(Text Classification)

Question 8

Q

(Question and answer)
BoolQ
Vs
Natural Questions
Vs
Trivia QA

Answer

A

BoolQ is a question answering dataset for yes/no questions containing 15942 examples.
Vs
NaturalQuestions (NQ) contains real user questions issued to Google search, and answers found from Wikipedia by annotators.
Vs
TriviaQA is a reading comprehension dataset containing over 650K question-answer-evidence triples.

Question 9

Q

Model tuning method comparison

Prompt Engineering
RAG
Instruction based fine tuning
Domain Adaption fine tuning
Transfer Learning

Answer

A

Prompt Engineering - No model training needed

RAG - Use external knowledge but no FM changes or retraining. Cost for using vector dbs

Instruction based fine tuning - FM is fine tuned with instructions and change the tone of the model. Labelled data and prompt-response pairs

Domain Adaption fine tuning - Domain specific model training. Unlabled data

Transfer Learning - Widely used for image classification

Question 10

Q

Temperature
Vs
Top K
Vs
Top P

Answer

A

Creativity of model output
Vs
Most probable response(Number)
Vs
Most likey words(Probability value)

Question 11

Q

Amazon Q Business
Vs
Amazon Q Apps
Vs
Amazon Q Developer

Answer

A

Part of Amazon Bedrock with no contol to choose FMs
Vs
Create GenAI apps with use of natural language and no coding
Vs
Generate code and commands related to AWS. Scan code for vulnerabilities. Debugging and Optmization improvements

Question 12

Q

Amazon Q Business Lite
Vs
Amazon Q Business Pro

Answer

A

Access to the Q&A feature in Amazon Q Business
Vs
Help you solve problems, generate content, and find insights in data, and Amazon QuickSight, a generative BI assistant to help consume insights.

Question 13

Q

GPT
vs
BERT
Vs
RNN
Vs
ResNet
Vs
SVM
Vs
WaveNet
Vs
GAN
Vs
XGBoost

Answer

A

Generate human text or code
Vs
Translation
Vs
Speech recognition
Vs
Image recognition
Vs
Classification & Regression
Vs
Speech Sythesis
Vs
Data augmentation
Vs
Gradient boosting

Question 14

Q

KNN
Vs
K-Means

Answer

A

Clustering technique mdoel used in supervised learning
Vs
Clustering technique model used in unsupervised learning

Question 15

Q

Underfitting
Vs
Overfitting

Answer

A

High bias and low variance
Vs
Low bias and high variance

Question 16

Q

Lexicons
Vs
SSML
Vs
Voice engine
Vs
Speech Mark

Answer

A

Like how to speak and abbreviations
Vs
Adding <break></break>, <whisper></whisper>, etc
Vs
Different types of voice styles
Vs
Helpful for lip synching or highlighting words

Question 17

Q

Custom Labels
Vs
Content Moderation
Vs
Amazon A2I

Answer

A

Identify your logo on social media using Amazon Rekognition
Vs
Remove inappropriate content using Amazon Rekognition
Vs
Incorporate human review using Amazon Rekognition

Question 18

Q

Sagemaker Real time deployment
Vs
Sagemaker Serverless deployment

Question 19

Q

ResponsibleAI using various AWS Tools?
Amazon Bedrock, SageMaker Clarify, SageMaker Data Wrangler, SageMaker Model Monitor & A2I

Answer

A

Amazon Bedrock - Guardrails for redacting PII and block undesirable content. Do Human or Automatic Evaluation

SageMaker Clarify - FM evaluation for accuracy, robustness, toxicity and bias detection

SageMaker Data Wrangler - To fix Bias and augment the data

SageMaker Model Monitor - Quality Ananlysis in production

A2I - Human review of ML predictions

Governance using Role manager, Model cards and Dashboard

Question 20

Q

Interpretability
Vs
Explainability

Answer

A

Degree to which a human can understand the cause of the decision
Vs
Understand the nature and behaviour of the model

Question 21

Q

ResponsibleAI
Vs
GovernanceAI
Vs
ComplianceAI

Answer

A

Fairness, explainability, interpretibility, transparency, controlability,privacy, dafety, robust
Vs
Managing, optimzing and scaling org AI activities with policies, guidelines, risk managment and build public trust
Vs
Complaince to various industry standards for the AI workloads

Question 22

Q

Data Lifecycles
Vs
Data Logging
Vs
Data Residency
Vs
Data Monitoring
Vs
Data Analysis
Vs
Data Retention
Vs
Data Lineage

Answer

A

Collecting, processing, storage, consumption and archival
Vs
Inputs, outputs, performace metrics and system events
Vs
Where the data is processed and stored
Vs
Data Quality, identifying anomilies and data drift
Vs
Statistical analysis, visualization and exploration
Vs
Regulatory requirements, historical data for training, cost
Vs
Sources of data, licenses and terms of usage or permissions

Question 23

Q

Threat detection
Vs
Vulnerability Mgmt
Vs
Infrastructue Mgmt

Answer

A

Generating fake content
Vs
Identify software bugs
Vs
Secure cloud computing platform

Question 24

Q

Accuracy
Vs
Precision
Vs
Recall
Vs
F1-score
Vs
Latency

Answer

A

Ratio of +ve predictions
Vs
Ratio of correct and incorrect +ve predictions
Vs
Ration of correct and incorrect +ve predictions compare to actual
Vs
Average of precision and recall
Vs
Time taken by the model to predict

Question 25

Q

Posining
Vs
Jailbreaking
Vs
Prompt Leaking
Vs
Exposure
Vs
Hijacking

Answer

A

Introduction of malicious and bias data
Vs
Gain access to offensive, harmful content which is otherwise prevented
Vs
Leaking of prompts and inputs
Vs
Leaking of sensitive data from training corpus
Vs
Influencing the output

Question 26

Q

Logistic Regression
Vs
Support Vector Machines (SVMs)

Answer

A

Primarily designed for binary classification problems
Vs
SVMs are effective for classification tasks, especially in high-dimensional spaces

Question 27

Q

Pretraining
Vs
Fine Tuning

Answer

A

Uses unlabeled data
Vs
Uses labeled data

Question 28

Q

Data drift
Vs
Hellucination

Answer

A

Input data changes which degrades the output
Vs
Output appears factutal but misleading and incorrect

Question 29

Q

Techniques to prevent overfitting

Easy Stopping
Vs
Pruning
Vs
Regularization
Vs
Ensembling
Vs
Data augmentation

Answer

A

Pause the training phase before noise
Vs
Identify most important feature
Vs
Apply penalty value to minimal impact feature
Vs
Combine different ML models predictions
Vs
Adding small datasets each time of iteration

Question 30

Q

Shapley values
Vs
PDP

Answer

A

Shapley values are a local interpretability method
Vs
Provide a global view of the model’s behavior

Question 31

Q

Sampling bias
Vs
Measurement bias
Vs
Observer bias
Vs
Confirmation bias

Answer

A

Data used to train the model does not accurately reflect the diversity of the real-world population
Vs
Inaccuracies in data collection, such as faulty equipment or inconsistent measurement processes
Vs
Human errors or subjectivity during data analysis or observation
Vs
Selectively searching for or interpreting information to confirm existing beliefs

Question 32

Q

Linear regression
Vs
Document classification
Vs
Neural networks
Vs
Decision tree
Vs
Association rule learning
Vs
Clustering

Which learning techniques?

Answer

A

Supervised Learning
Vs
Semi-supervised learning
Vs
Supervised Learning
Vs
Supervised Learning
Vs
Unsupervised learning
Vs
Unsupervised learning

Question 33

Q

Embedding models
Principal component analysis
Vs
Singular value decomposition
Vs
Word2Vec
Vs
BERT

Answer

A

Dimentionality reduction technique
Vs
Transforms a matrix into a singular matrix
Vs
Associate words using contunius BOW or Skip-gram
Vs
Semantic similarity using N-gram matches

Question 34

Q

AWS Trainium
Vs
AWS Inferentia

Answer

A

ML chip that AWS purpose-built for deep learning (DL) training
Vs
ML chip purpose-built by AWS to deliver high-performance inference at a low cost

Question 35

Q

GenAI
Vs
ML

Answer

A

Gets features from labels
Vs
Gets labels from features

Question 36

Q

Model parallelism
Vs
Data parallelism

Answer

A

Splitting a model up between multiple instances or nodes
Vs
Splitting the training set in mini-batches evenly distributed across nodes

Question 37

Q

Model Parameters
Vs
Hyperparameters

Answer

A

Internal variables of the model
Vs
External configurations set before the training process

Question 38

Q

Multi-modal generative model
Vs
Multi-modal embedding model

Answer

A

Generate new output
Vs
Context-based output (Cheaper than geneartive model)

Question 39

Q

Training set
Vs
Validation set
Vs
Test set

Answer

A

Used to train an algorithm or ML model. The model iteratively uses the data and learns to provide the desired result.
Vs
Introduces new data to the trained model. You can use a validation set to periodically measure model performance as training is happening, and also tune any hyperparameters of the model. However, validation datasets are optional.
Vs
Used on the final trained model to assess its performance on unseen data. This helps determine how well the model generalizes.

Question 40

Q

SHapley Additive exPlanations
Vs
Differential privacy
Vs
Adversarial debiasing
Vs
Fairness-aware preprocessing

Answer

A

Explain model predictions and identify feature importance
Vs
Protects individual privacy
Vs
Mitigation method but is typically applied during or after training
Vs
Proactive responsible AI strategy that helps reduce bias before the model is trained

Question 41

Q

Diffusion Model
Vs
GAN

Answer

A

Diffusion models have gained popularity over GANs due to their ability to generate high-quality images with superior fine-grained control. They are slower than GAN

Question 42

Q

Recurrent Neural Network (RNN)
Vs
Generative Adversarial Network (GAN)
Vs
Transformer-based vision-language model

Answer

A

Sequential data processing, such as text or time-series analysis
Vs
Image generation, style transfer, and upscaling
Vs
Generating text descriptions from images