Flashcards

AI

1
Q

_________ is a field of computer science dedicated to solving
problems that we commonly associate with human intelligence

A

Artificial Intelligence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Used to generate new data that is similar to the data it was trained on
* Text
* Image
* Audio
* Code
* Video…

A

Generative AI

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

To generate data, we must rely on a __________
* ___________ are trained on a wide variety of input data
* The models may cost tens of millions of dollars to train

A

Foundation Model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Type of AI designed to generate coherent human-like text
* One notable example: GPT-4 (ChatGPT / Open AI)
* Trained on large corpus of text data
* Usually very big models
* Billions of parameters
* Trained on books, articles, websites, other textual data
* Can perform language-related tasks
* Translation, Summarization
* Question answering * Content creatio

A

Large Language Models (LLM)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

We usually interact with the LLM by giving a ____

A

prompt

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the term for below: the generated text may be different for every user that uses
the same prompt

A

Non-deterministic:

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What’s Amazon Titan?

A
  • High-performing Foundation Models from AWS
  • Image, text, multimodal model choices via a fully-managed APIs
  • Can be customized with your own data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What term goes with this:
-Adapt a copy of a foundation model with your own data

A

Fine Tuning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

*Improves the performance of
a pre-trained FM on domain-specific tasks
* = further trained on a
particular field or area of
knowledge

A

Instruction based fine tuning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

-make a model expert in a specific domain
* For example: feeding the entire AWS
documentation to a model to make it an expert on AWS
* Good to feed industry-specific terminology
into a model (acronyms, etc…)
* Can continue to train the model as more
data becomes available

A

domain-adaptation fine-tuning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q
  • Part of instruction-based
    fine-tuning
  • system (optional) : context
    for the conversation.
  • messages : An array of
    message objects, each
    containing:
  • role :
    Either user or assistant
  • content : The text content
    of the message
A

single turn messaging

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q
  • To provide instructionbased fine tuning for a
    conversation (vs SingleTurn Messaging)
  • Chatbots = multi-turn
    environment
  • You must alternate
    between “user” and
    “assistant” roles
A

multi turn messaging

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

True or false: Instruction-based fine-tuning is usually cheaper than re training an FM as computations are
less intense and the amount of data required usually less

A

true

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

_________ the broader concept of re-using a pre-trained model to adapt it to a new related task
* Widely used for image classification
* And for NLP (models like BERT and GPT)

A

transfer learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

This is a good use case of _____

  • A chatbot designed with a particular persona or tone, or geared
    towards a specific purpose (e.g., assisting customers, crafting
    advertisements)
  • Training using more up-to-date information than what the language
    model previously accessed
  • Training with exclusive data (e.g., your historical emails or messages,
    records from customer service interactions)
  • Targeted use cases (categorization, assessing accuracy)
A

fine tuning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does it mean to automatically evaluate a model?

A

Evaluate a model for quality control.
Scores are calculated automatically

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What does it mean to have human evaluation of a model?

A
  • Choose a work team to evaluate
  • Employees of your company
  • Subject-Matter Experts (SMEs)
  • Define metrics and how to evaluate
  • Thumbs up/down, ranking
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q
  • Curated collections of data designed specifically
    at evaluating the performance of language
    models
  • Wide range of topics, complexities, linguistic
    phenomena
  • Helpful to measure: accuracy, speed and
    efficiency, scalability
A

benchmark datasets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

_________
* Semantic similarity between generated text
* Uses pre-trained ___ models (Bidirectional Encoder Representations from Transformers) to compare the
contextualized embeddings of both texts and computes the cosine similarity between them.
* Capable of capturing more nuance between the texts

A
  • BERTScore
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q
  • Evaluate the quality of generated text, especially for translations
  • Considers both precision and penalizes too much brevity
  • Looks at a combination of n-grams (1, 2, 3, 4)
A

BLEU: Bilingual Evaluation Understudy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Evaluating automatic summarization and machine translation systems
* ____-N – measure the number of matching n-grams between reference and generated text
* _____–L – longest common subsequence between reference and generated text

A
  • ROUGE: Recall-Oriented Understudy for Gisting Evaluation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q
  • Allows a Foundation Model to reference a data source outside of its training data
  • Bedrock takes care of creating Vector Embeddings in the database of your choice based on your data
  • Use where real-time data is needed to be fed into the Foundation Model
A
  • RAG = Retrieval-Augmented Generation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

search & analytics database
real time similarity queries, store millions of vector embeddings
scalable index management, and fast nearest-neighbor (kNN) search capability

A

Amazon OpenSearch Service

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

[with MongoDB compatibility] – NoSQL database
real time similarity queries, store millions of vector embeddings

A

Amazon DocumentDB

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

These are examples of what use cases?

  • Customer Service Chatbot
  • Knowledge Base – products, features, specifications, troubleshooting guides, and FAQs
  • ___application – chatbot that can answer customer queries
  • Legal Research and Analysis
  • Knowledge Base – laws, regulations, case precedents, legal opinions, and expert analysis
  • ____Application – chatbot that can provide relevant information for specific legal queries
  • Healthcare Question-Answering
  • Knowledge base – diseases, treatments, clinical guidelines, research papers, patients…
  • ___application – chatbot that can answer complex medical queries
A

RAG Use case

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

__________: converting raw text into a sequence of tokens

A

Tokenization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q
  • The number of tokens an LLM can consider when generating text
A

Context Window

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is the first factor to look at when considering a model?

A

the context window. The larger the context window,
the more information and
coherence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q
  • Control the interaction between users and Foundation Models (FMs)
  • Filter undesirable and harmful content
  • Remove Personally Identifiable Information (PII)
  • Enhanced privacy
  • Reduce hallucinations
A

guardrails

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q
  • Create vectors (array of numerical values) out of text, images or audio
  • Vectors have a high dimensionality to capture many features for one input
    token, such as semantic meaning, syntactic role, sentiment
  • _____models can power search applications
A

Embedding

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Manage and carry out various multi-step tasks related to infrastructure
provisioning, application deployment, and operational activities
* Task coordination: perform tasks in the correct order and ensure
information is passed correctly between tasks
* _____are configured to perform specific pre-defined action groups

A

agents

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Send logs of all invocations to Amazon
CloudWatch and S3
* Can include text, images and embeddings
* Analyze further and build alerting thanks to
CloudWatch Logs Insights

A
  • Model Invocation Logging
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q
  • Published metrics from Bedrock to _________
A
  • CloudWatch Metrics
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

_________
– give access to
Amazon Bedrock to your team so they can easily create AI
-powered
applications

A
  • Bedrock Studio`
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q
  • _________
    – check if
    an image was generated by
    Amazon Titan Generator
A

Watermark detection

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What is the bedrock pricing model for image models?

A

charged for every image generated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

What is the bedrock pricing model for embedding models?

A

charged for every input
token processed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

What is the bedrock pricing model for text models?

A

charged for every input/output
token processed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Model Improvement Techniques Cost Order:

Put the cheapest at the top and the expensive at the bottom.

Prompt Engineering, Domain Adaptation Fine-tuning, Instruction-based Fine-tuning, Retrieval Augmented Generation (RAG)

A
  1. Prompt Engineering
    * No model training needed (no additional computation or fine-tuning)
  2. Retrieval Augmented Generation (RAG)
    * Uses external knowledge (FM doesn’t need to ”know everything”, less complex)
    * No FM changes (no additional computation or fine-tuning)
  3. Instruction-based Fine-tuning
    * FM is fine-tuned with specific instructions (requires additional computation)
  4. Domain Adaptation Fine-tuning
    * Model is trained on a domain-specific dataset (requires intensive computation)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

usually a smaller model will be cheaper (T/F)

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

developing, designing, and optimizing prompts to
enhance the output of FMs for your needs

A

Prompt engineering

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q
  • Prompt gives a lot of guidance and leaves little into the model’s interpretation

True or false

A
  • false, Prompt gives little guidance and leaves a lot to the model’s interpretation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

what are four improved prompting techniques?

A
  • Instructions – a task for the model to do (description, how the model should perform)
  • Context – external information to guide the model
  • Input data – the input for which you want a response
  • Output Indicator – the output type or format
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

A technique where you explicitly instruct the model on what not to include
or do in its response

A

negative prompting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

True or false. Negative prompting aims to avoid Unwanted Content – explicitly states what not to include, reducing the chances
of irrelevant or inappropriate content

A

true

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

What is temperature in prompt engineering?

A

creativity of the model’s output
* Low (ex: 0.2) – outputs are more conservative, repetitive, focused on most likely response
* High (ex: 1.0) – outputs are more diverse, creative, and unpredictable, maybe less coherent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

_______ is how fast the model responds

A

prompt latency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

What type of prompt engineering technique is this:

Present a task to the model
without providing examples or
explicit training for that specific task

A

zero shot prompting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

What type of prompt engineering technique is this:

What type of prompt engineering technique is this:

A

few shots prompting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

What type of prompt engineering technique is this:

  • Divide the task into a sequence of
    reasoning steps, leading to more structure and coherence
  • Using a sentence like “Think step by step” helps
  • Helpful when solving a problem as a human usually requires several steps
A

Chain of Thought Prompting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

What type of prompt engineering technique is this:

  • Combine the model’s capability
    with external data sources to
    generate a more informed and
    contextually rich response
A

Retrieval-Augmented Generation (RAG)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q
  • Simplify and standardize the process of generating Prompts
  • Helps with:
  • Processes user input text and output prompts from
    foundation models (FMs)
  • Orchestrates between the FM, action groups, and knowledge bases
  • Formats and returns responses to the user
  • You can also provide examples with few-shots
    prompting to improve the model performance
A

prompt templates

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q
  • Amazon QuickSight is used to visualize your
    data and create dashboards about them
  • Amazon Q understands natural language that
    you use to ask questions about your data
  • Create executive summaries of your data * Ask and answer questions of data * Generate and edit visuals for your dashboards
A

Amazon Q for quicksight

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q
  • EC2 instances are the virtual servers you
    can start in AWS
  • Amazon Q for EC2 provides guidance and
    suggestions for EC2 instance types that are
    best suited to your new workload
  • Can provide requirements using natural
    language to get even more suggestions or
    ask for advice by providing other workload
    requirements
A

Amazon Q for EC2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q
  • AWS Chatbot is a way for you to deploy an AWS Chatbot in a Slack
    or Microsoft Teams channel that
    knows about your AWS account
  • Troubleshoot issues, receive
    notifications for alarms, security
    findings, billing alerts, create support
    request
  • You can access Amazon Q directly in
    AWS Chatbot to accelerate
    understanding of the AWS services,
    troubleshoot issues, and identify
    remediation paths
A

Amazon Q for AWS Chatbot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
53
Q
  • Fully managed Gen-AI assistant for your employees
  • Based on your company’s knowledge and data
  • Answer questions, provide summaries, generate content, automate tasks
  • Perform routine actions (e.g., submit time-off requests, send meeting invites)
  • Built on Amazon Bedrock (but you can’t choose the underlying FM)
A

Amazon Q for business

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
54
Q
  • Create Gen AI-powered apps without coding by using natural language
  • Leverages your company’s internal data
  • Possibility to leverage plugins (Jira, etc…)
A

Amazon Q apps

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
55
Q
  • Answer questions about the AWS
    documentation and AWS service selection
  • Answer questions about resources in your AWS
    account
  • Suggest CLI (Command Line Interface) to run
    to make changes to your account
  • Helps you do bill analysis, resolve errors,
    troubleshooting
A

amazon q developer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
56
Q

is a broad field for the development
of intelligent systems capable of
performing tasks that typically require
human intelligence:

A

Artificial Intelligence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
57
Q

What AI Component is this:

collect vast amount of data

A

data layer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
58
Q

What AI Component is this:

data scientists and engineer work together to understand use cases, requirements, and
frameworks that can solve them

A
  • ML Framework and Algorithm Layer
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
59
Q

What AI Component is this:

implement a model and train it, we have the structure, the parameters and functions, optimizer function

A

model layer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
60
Q

What AI Component is this:

how to serve the model,
and its capabilities for your users

A

application layer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
61
Q
  • _______ is a type of AI for building methods that allow machines to learn
  • Data is leveraged to improve computer performance on a set of task
  • Make predictions based on data used to train the model
  • No explicit programming of rules
A

Machine Learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
62
Q
  • Uses neurons and synapses (like our brain) to
    train a model
  • Process more complex patterns in the data
    than traditional ML
A

Deep Learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
63
Q

True or False: Natural Language Processing is NOT an example of deep learning.

A

False, it is

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
64
Q

Is generative AI a subset of deep learning?

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
65
Q
  • Powerful models that can understand and generate
    human-like text
  • Trained on vast amounts of text data from the internet,
    books, and other sources, and learn patterns and
    relationships between words and phrases
  • Example: Google BERT, OpenAI ChatGPT
A

Transformer based LLMs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
66
Q
  • Able to process a sentence as a whole instead of word by word
  • Faster and more efficient text processing (less
    training time)
  • It gives relative importance to specific words in a
    sentence (more coherent sentences
A

Transformer based LLMs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
67
Q
  • Does NOT rely on a single type of input (text, or images, or audio only)
  • Does NOT create a single type of output
  • Example: a ______ can take a mix of audio, image and text and output a mix of video, text for example
A

Multi-modal Models

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
68
Q

– generate human text or computer code based on input prompt

A

GPT (Generative Pre-trained Transformer)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
69
Q

– similar intent to GPT,
but reads the text in two directions

A

BERT (Bidirectional Encoder Representations from Transformers)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
70
Q

– meant for sequential data such as time-series or text,
useful in speech recognition, time-series prediction

A

RNN (Recurrent Neural Network)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
71
Q

used for image recognition tasks, object detection, facial recognition

A

ResNet (Residual Network) – Deep Convolutional Neural Network (CNN)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
72
Q

– ML algorithm for classification and regression

A

SVM (Support Vector Machine)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
73
Q

– model to generate raw audio waveform, used in Speech Synthesis

A

WaveNet

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
74
Q

– models used to generate synthetic data such as images, videos or sounds that resemble the training data. Helpful for data augmentatio

A

GAN (Generative Adversarial Network)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
75
Q

– an implementation of gradient boosting

A

XGBoost (Extreme Gradient Boosting)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
76
Q

Data includes both input features and corresponding output labels

A

labeled data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
77
Q

Data includes only input features without
any output labels

A

unlabeled data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
78
Q
  • Data is organized in a structured format, often in rows and columns (like Excel)
A

structured data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
79
Q

Data is arranged in a table with rows representing records and columns representing features

A

tabular data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
80
Q

Data points collected or recorded at successive
points in time

A

time series data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
81
Q
  • Data that doesn’t follow a specific structure and is often text-heavy or multimedia content
A

unstructured data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
82
Q
  • Learn a mapping function that can predict the output for new unseen input data
  • Needs labeled data: very powerful, but difficult to perform on millions of datapoints
A

supervised learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
83
Q
  • Used to predict a numeric value based on input data
  • The output variable is continuous, meaning it can
    take any value within a range
A

Supervised Learning – Regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
84
Q

What type of supervised learning do these scenarios represent?

  • Predicting House Prices – based on features like size,
    location, and number of bedrooms
  • Stock Price Prediction – predicting the future price of a
    stock based on historical data and other features
  • Weather Forecasting – predicting temperatures based on
    historical weather data
A

regression

85
Q

Used to predict the categorical label of input data
* The output variable is discrete, which means it falls into
a specific category or class
* Use cases: scenarios where decisions or predictions
need to be made between distinct categories (fraud,
image classification, customer retention, diagnostics)

A

Supervised Learning – Classification

86
Q
  • Used to train the model
  • Percentage: typically, 60-80% of the dataset
  • Example: 800 labeled images from a dataset of 1000 images
A

training set

87
Q
  • Used to tune model parameters and validate performance
  • Percentage: typically, 10-20% of the dataset
  • Example: 100 labeled images for hyperparameter tuning
    (tune the settings of the algorithm to make it more efficient)
A

validation set

88
Q
  • Used to evaluate the final model performance
  • Percentage: typically, 10-20% of the dataset
  • Example: 100 labeled images to test the model’s accuracy
89
Q
  • The process of using domain knowledge to
    select and transform raw data into meaningful
    features
  • Helps enhancing the performance of machine
    learning models
A

feature engineering

90
Q

– extracting useful information from raw data, such as deriving age from date of birth

A

feature extraction

91
Q

– selecting a subset of relevant features, like choosing important predictors in a regression model

A

Feature Selection

92
Q

– transforming data for better model performance, such as normalizing numerical data

A

Feature Transformation

93
Q
  • _______ – deriving new features like “price per square foot”
  • _________ – identifying and retaining important features such as location
    or number of bedrooms
  • _________ – normalizing features to ensure they are on a similar scale, which helps algorithms like gradient descent converge faster
A

Feature Creation
Feature Selection
Feature Transformation

94
Q
  • The goal is to discover inherent patterns, structures,
    or relationships within the input data
  • The machine must uncover and create the groups
    itself, but humans still put labels on the output groups
A

unsupervised learning

95
Q
  • Used to group similar data points together into clusters
    based on their features
A

unsupervised learning - clustering

96
Q
  • Use a small amount of labeled data and a
    large amount of unlabeled data to train
    systems
  • After that, the partially trained algorithm
    itself labels the unlabeled data
  • This is called pseudo-labeling
  • The model is then re-trained on the
    resulting data mix without being explicitly
    programmed
A

Semi-supervised Learning

97
Q
  • A type of Machine Learning where an agent
    learns to make decisions by performing actions in
    an environment to maximize cumulative rewards
A

reinforcement learning

98
Q

What are the associated reinforcement learning concepts:

  • Key Concepts
  • __– the learner or decision-maker
  • _____– the external system the agent
    interacts with
  • ____– the choices made by the agent
  • ___– the feedback from the environment based
    on the agent’s actions
  • __– the current situation of the environment
  • __– the strategy the agent uses to determine
    actions based on the state
A
  • Agent – the learner or decision-maker
  • Environment – the external system the agent
    interacts with
  • Action – the choices made by the agent
  • Reward – the feedback from the environment based
    on the agent’s actions
  • State – the current situation of the environment
  • Policy – the strategy the agent uses to determine
    actions based on the state
99
Q

The goal of __________ is to maximize cumulative reward over time.

A

reinforcement learning

100
Q

What does RLHF stand for?

A
  • RLHF = Reinforcement Learning from Human Feedback
101
Q
  • Use human feedback to help ML models to self-learn more efficiently
  • In Reinforcement Learning there’s a reward function
  • RLHF incorporates human feedback in the reward function, to be more
    aligned with human goals, wants and needs
  • First, the model’s responses are compared to human’s responses
  • Then, a human assess the quality of the model’s responses
A
  • RLHF = Reinforcement Learning from Human Feedback
102
Q

In case your model has poor
performance, you need to look at its ___

103
Q

What kind of model fit is this:

  • Performs well on the training data
  • Doesn’t perform well on evaluation data
A

Overfitting

104
Q

What kind of model fit is this:

  • Model performs poorly on training data
  • Could be a problem of having a model too
    simple or poor data features
A

overfitting

105
Q

What kind of model fit is this:

Model performs poorly on training data
* Could be a problem of having a model too
simple or poor data features

A

Underfitting

106
Q

What kind of model fit is this:

-Neither overfitting or underfitting

A

Balanced fit

107
Q
  • Difference or error between predicted and actual value
  • Occurs due to the wrong choice in the ML process
108
Q
  • The model doesn’t closely match the training data
  • Example: linear regression function on a non-linear dataset * Considered as underfitting
109
Q

How much the performance of a model changes if
trained on a different dataset which has a similar
distribution

110
Q
  • Model is very sensitive to changes in the training data
  • This is the case when overfitting: performs well on
    training data, but poorly on unseen test data
A

high variance

111
Q

how can you reduce variance?

A

Feature selection (less, more important features)
* Split into training and test data sets multiple times

112
Q

Precision or Recall?
True Positives / (True Positives + False Positives)

113
Q

Precision or Recall?

True Positives / (True Positives + False Negatives)

114
Q
  • _____– Best when false positives are costly
  • ____– Best when false negatives are costly
  • ______ – Best when you want a balance between precision and recall, especially in imbalanced datasets
  • ______ – Best for balanced datasets
A

Precision
Recall
F1 Score
Accuracy

115
Q
  • AUC-ROC shows what the curve for true positive compared to false positive looks like at various
    thresholds, with multiple confusion matrixes
  • You compare them to one another to find out the threshold you need for your business use case.
A

AUC-ROC
Area under the curve-receiver operator curve

116
Q
  • ________is when a model is making prediction on new data
A

Inferencing

117
Q
  • Settings that define the model structure and learning algorithm and process
  • Set before training begins
  • Examples: learning rate, batch size, number of epochs, and regularization
A

Hyperparameter

118
Q
  • Finding the best ______ values to optimize the model performance
A

hyperparameters

119
Q

What hyperparameter is this:

How large or small the steps are when updating the model’s weights during training
* High ________ can lead to faster convergence but risks overshooting the optimal
solution, while a low learning rate may result in more precise but slower convergence.

A

learning rate

120
Q

What hyperparamater is this:

  • Number of training examples used to update the model weights in one iteration
  • Smaller batches can lead to more stable learning but require more time to compute,
    while larger batches are faster but may lead to less stable updates.
A

batch size

121
Q

what hyperparameter is this:

  • Refers to how many times the model will iterate over the entire training dataset.
  • Too few epochs can lead to underfitting, while too many may cause overfitting
A

Number of epochs

122
Q
  • _______ is when the model gives good predictions for training data
    but not for the new data
A

Overfitting

123
Q

_______ are pre-trained ML services for your use case

A

AWS AI Services

124
Q
  • For Natural Language Processing – NLP
  • Fully managed and serverless service
  • Uses machine learning to find insights and relationships in text
  • Language of the text
  • Extracts key phrases, places, people, brands, or events
  • Understands how positive or negative the text is
  • Analyzes text using tokenization and parts of speech
  • Automatically organizes a collection of text files by topic
  • Sample use cases:
  • analyze customer interactions (emails) to find what leads to a positive or negative experience
  • Create and groups articles by topics that Comprehend will uncover
A

Amazon Comprehend

125
Q

Extracts predefined, general-purpose entities like people, places, organizations, dates, and other standard categories, from text

A

Named Entity Recognition (NER)

126
Q
  • Natural and accurate language translation
  • ________ allows you to localize content - such as websites and
    applications - for international users, and to easily translate large volumes of text efficiently.
A

Amazon Translate

127
Q
  • Automatically convert speech to text
  • Uses a deep learning process called automatic speech recognition (ASR) to convert speech to text quickly and accurately
  • Automatically remove Personally Identifiable Information (PII) using Redaction
  • Supports Automatic Language Identification for multi-lingual audio
  • Use cases:
  • transcribe customer service calls
  • automate closed captioning and subtitling
  • generate metadata for media assets to create a fully searchable archive
A

amazon transcribe

128
Q

What managed service:

Turn text into lifelike speech using deep learning
* Allowing you to create applications that talk

A

amazon polly

129
Q

What AWS Managed service:

  • Find objects, people, text, scenes in images and videos using ML
  • Facial analysis and facial search to do user verification, people counting
  • Create a database of “familiar faces” or compare against celebrities
  • Use cases:
  • Labeling
  • Content Moderation
  • Text Detection
  • Face Detection and Analysis (gender, age range, emotions…)
  • Face Search and Verification
  • Celebrity Recognition
  • Pathing (ex: for sports game analysis)
A

Amazon Rekognition

130
Q
  • Fully managed service that uses ML to deliver highly accurate forecasts
  • Example: predict the future sales of a raincoat
  • 50% more accurate than looking at the data itself
  • Reduce forecasting time from months to hours
  • Use cases: Product Demand Planning, Financial Planning, Resource Planning, …
A

amazon foorecast

131
Q

What managed service:

  • Build chatbots quickly for your
    applications using voice and text
  • Example: a chatbot that allows your customers to order pizzas or book a hotel
  • Supports multiple languages * Integration with AWS Lambda,
    Connect, Comprehend, Kendra
  • The bot automatically understands the
    user intent to invoke the correct
    Lambda function to “fulfill the intent”
  • The bot will ask for ”Slots” (input parameters) if necessary
A

amazon Lex

132
Q
  • Fully managed ML-service to build apps with real-time personalized recommendations
  • Example: personalized product recommendations/re-ranking, customized direct marketing
  • Example: User bought gardening tools, provide recommendations on the next one to buy
  • Same technology used by Amazon.com
  • Integrates into existing websites, applications, SMS, email marketing systems, …
  • Implement in days, not months (you don’t need to build, train, and deploy ML solutions)
  • Use cases: retail stores, media and entertainment…
A

Amazon Personalize

133
Q
  • Automatically extracts text, handwriting, and data from any scanned
    documents using AI and ML
A

amazon textract

134
Q
  • Fully managed document search service powered by Machine Learning
  • Extract answers from within a document (text, pdf, HTML, PowerPoint, MS Word, FAQs…)
  • Natural language search capabilities
  • Learn from user interactions/feedback to promote preferred results (Incremental Learning)
  • Ability to manually fine-tune search results (importance of data, freshness, custom, …)
A

Amazon Kendra

135
Q
  • Crowdsourcing marketplace to perform simple human
    tasks
  • Distributed virtual workforce * Example: * You have a dataset of 10,000,000 images and you want to
    labels these images
  • You distribute the task on Mechanical Turk and humans
    will tag those images
  • You set the reward per image (for example $0.10 per
    image)
  • Use cases: image classification, data collection, business
    processing
A

Amazon Mechanical Turk

136
Q
  • Human oversight of Machine Learning predictions in production
  • Can be your own employees, over 500,000 contractors from AWS, or AWS Mechanical Turk
  • Some vendors are pre-screened for confidentiality requirements
  • The ML model can be built on AWS or elsewhere (SageMaker, Rekognition…)
A

Amazon Augmented AI (A2I)

137
Q
  • Fully autonomous 1/18th scale car race driven by
    Reinforcement Learning (RL)
A

AWS DeepRacer

138
Q
  • Fully managed service for developers / data scientists to build ML models
  • Typically, difficult to do all the processes in one place + provision servers
  • Example: predicting your AWS exam score
139
Q

These are examples of a SageMaker service:

  • Supervised Algorithms
  • Unsupervised Algorithms
  • Textual Algorithms
  • Image Processing
A

sagemaker built in algorithms

140
Q
  • Define the Objective Metric
  • _____ automatically chooses
    hyperparameter ranges, search
    strategy, maximum runtime of a tuning job, and early stop
    condition
  • Saves you time and money
  • Helps you not wasting money
    on suboptimal configurations
A

SageMaker – Automatic Model Tuning (AMT)

141
Q

This is an example of batch or asynchronous sagemaker model deployment:

  • For large payload sizes up to 1GB
  • Long processing times
  • Near-real time latency requirements
  • Request and responses are in Amazon S3
A

Asynchronous

142
Q

This is an example of batch or asynchronous sagemaker model deployment:

  • Prediction for an entire dataset (multiple
    predictions)
  • Request and responses are in Amazon S3
143
Q
  • End-to-end ML development from a unified interface
  • Team collaboration
  • Tune and debug ML models
  • Deploy ML models
  • Automated workflow
A

sagemaker studio

144
Q
  • Prepare tabular and image data for machine learning
  • Data preparation, transformation and
    feature engineering
  • Single interface for data selection, cleansing, exploration, visualization,
    and processing
  • SQL support * Data Quality tool
A

SageMaker
– Data Wrangler

145
Q

_____are inputs to ML
models used during training and
used for inference
* Example - music dataset: song
ratings, listening duration, and
listener demographics

146
Q
  • Ingests features from a variety of sources
  • Ability to define the transformation of data into feature from within Feature Store
  • Can publish directly from SageMaker Data Wrangler into SageMaker Feature Store
  • Features are discoverable within SageMaker Studio
A

SageMaker – Feature Store

147
Q
  • Evaluate Foundation Models
  • Evaluating human-factors such as friendliness or humor
  • Leverage an AWS
    -managed team
    or bring your own employees
  • Use built
    -in datasets or bring your
    own dataset
  • Built-in metrics and algorithms
A

SageMaker Clarify

148
Q
  • A set of tools to help explain how machine
    learning (ML) models make predictions
  • Understand model characteristics as a whole
    prior to deployment
  • Debug predictions provided by the model
    after it’s deployed
  • Helps increase the trust and understanding of
    the model
  • Example:
  • “Why did the model predict a negative outcome
    such as a loan rejection for a given applicant?”
  • “Why did the model make an incorrect
    prediction?”
A

SageMaker Clarify - Model Explainability

149
Q
  • Ability to detect and
    explain biases in your
    datasets and models
  • Measure bias using
    statistical metrics
  • Specify input features and
    bias will be automatically
    detected
A

SageMaker Clarify – Detect Bias (human)

150
Q

_______ occurs when the training data does not represent the full population fairly, leading to a model that over-represents or disproportionately affects certain group

A

Sampling bias

151
Q

____ occurs when the tools or measurements used
in data collection are flawed or skewed

A

: Measurement bias

152
Q

_________ happens when the person collecting or interpreting the data has personal biases that affect the result

A

Observer bias

153
Q

_________is when individuals interpret or favor information that confirms their preconceptions. This is more applicable to human
decision-making rather than automated model outputs.

A

Confirmation bias

154
Q
  • Model review, customization and evaluation
  • Align model to human preferences
  • Reinforcement learning where human feedback is
    included in the “reward” function
A
  • RLHF – Reinforcement Learning from
    Human Feedback
155
Q
  • Define roles for personas
  • Example: data scientists, MLOps engineers
A
  • SageMaker Role Manager
156
Q
    • Centralized portal where you can view, search, and explore all of your models
  • Information and insights for all models
A
  • SageMaker Model Dashboard
157
Q
  • Monitor the quality of your model in production: continuous or on-schedule
  • Alerts for deviations in the model quality: fix data & retrain model
  • Example: loan model starts giving loans to people who don’t have the correct credit score (drift)
A

SageMaker – Model Monitor

158
Q
  • Centralized repository allows you to track, manage, and version ML models
  • Catalog models, manage model versions, associate metadata with a model
  • Manage approval status of a model, automate model deployment, share models…
A

SageMaker – Model Registry

159
Q
  • a workflow that
    automates the process of building,
    training, and deploying a ML mode
  • Continuous Integration and
    Continuous Delivery (CI/CD) service
    for Machine Learning
  • Helps you easily build, train, test, and
    deploy 100s of models automatically
  • Iterate faster, reduce errors (no manual
    steps), repeatable mechanisms
A

SageMaker Pipeline –

160
Q
  • ML Hub to find pre-trained Foundation
    Model (FM), computer vision models, or
    natural language processing models
  • Large collection of models from Hugging
    Face, Databricks, Meta, Stability AI…
  • Models can be fully customized for your data
    and use
    -case
  • Models are deployed on SageMaker directly
    (full control of deployment options)
  • Pre
    -built ML solutions for demand
    forecasting, credit rate prediction, fraud
    detection and computer vision
A

SageMaker JumpStart * ML Hub to find pre-trained Foundation

161
Q
  • Build ML models using a visual interface
    (no coding required)
  • Access to ready-to-use models from Bedrock or JumpStart
  • Build your own custom model using AutoML powered by SageMaker Autopilot
  • Part of SageMaker Studio
  • Leverage Data Wrangler for data preparation
A

SageMaker Canvas * Build ML models using a visual interface

162
Q

an open-source tool which
helps ML teams manage the entire ML
lifecycle

163
Q

What is sagemaker ground truth used for?

A

RLHF, humans for model grading and data labeling

164
Q

Sagemaker role manager is used for ______

A

access control

165
Q
  • Making sure AI systems are transparent and trustworthy
  • Mitigating potential risk and negative outcomes
  • Throughout the AI lifecycle: design, development, deployment,
    monitoring, evaluation
A
  • Responsible AI
166
Q
  • Ensure to add value and manage risk in the operation of business
  • Clear policies, guidelines, and oversight mechanisms to ensure AI
    systems align with legal and regulatory requirements
  • Improve trust
A
  • Governance
167
Q
  • Ensure adherence to regulations and guidelines
  • Sensitive domains such as healthcare, finance, and legal applications
A

Compliance

168
Q
  • Form of responsible AI
    documentation
  • Help understand the service
    and its features
  • Find intended use cases and
    limitations
  • Responsible AI design choices
  • Deployment and
    performance optimization
    best practices
A

AWS AI Service Cards

169
Q
  • The degree to which a human can understand the cause of a decision
  • Access into the system so that a human can interpret the model’s output
  • Answer “why and how”
A
  • Interpretability
170
Q
  • Understand the nature and behavior of the
    model
  • Being able to look at inputs and outputs
    and explain without understanding exactly
    how the model came to the conclusion
A

explainability

171
Q
  • Show how a single feature can
    influence the predicted outcome, while holding other features constant
  • Particularly helpful when the model
    is “black box” (i.e., Neural Networks)
  • Helps with interpretability and
    explainability
A

Partial Dependence Plots (PDP) * Show how a single feature can

172
Q
  • Approach to design AI systems with priorities for humans’ needs
A

Human-Centered Design (HCD) for
Explainable AI

173
Q

Generating content that is offensive, disturbing, or inappropriate

174
Q
  • Assertions or claims that sound true, but are incorrect
  • This is due to the next
    -word probability
    sampling employed by LLM
A

Hallucinations

175
Q
  • Intentional introduction of malicious or biased data
    into the training dataset of a model
  • Leads to the model producing biased, offensive, or
    harmful outputs (intentionally or unintentionally)
176
Q
  • Influencing the outputs by embedding specific
    instructions within the prompts themselves
  • Hijack the model’s behavior and make it produce
    outputs that align with the attacker’s intentions
    (e.g., generating misinformation or running
    malicious code)
A

Hijaking and Prompt Injection

177
Q
  • The risk of exposing sensitive or confidential
    information to a model during training or
    inference
  • The model can then reveal this sensitive data
    from their training corpus, leading to potential
    data leaks or privacy violations
178
Q
  • The unintentional disclosure or leakage of the
    prompts or inputs used within a model
  • It can expose protected data or other data used
    by the model, such as how the model works
A

prompt leaking

179
Q
  • AI models are typically trained with certain ethical and safety constraints in place to prevent misuse or harmful outputs (e.g., filtering out offensive content, restricting access
    to sensitive information…)
  • Circumvent the constraints and safety measures implemented in a generative model to gain
    unauthorized access or functionality
A

jailbreaking

180
Q

– principles, guidelines, and responsible AI considerations
* Data management, model training, output validation, safety, and human oversight
* Intellectual property, bias mitigation, and privacy protection

181
Q

– combination of technical, legal, and responsible AI review
* Clear timeline: monthly, quarterly, annually…
* Include Subject Matter Experts (SMEs), legal and compliance teams and end-users

A

review cadence

182
Q
  • Technical reviews on model performance, data quality, algorithm robustness
  • Non-technical reviews on policies, responsible AI principles, regulatory requirements
  • Testing and validation procedure for outputs before deploying a new model
  • Clear decision-making frameworks to make decisions based on review results
A

review strategies

183
Q
  • Publishing information about the AI models, training data, key decisions made
  • Documentation on limitations, capabilities and use cases of AI solutions
  • Channels for end-users and stakeholders to provide feedback and raise concerns
A

transparency standards

184
Q
  • Train on relevant policies, guidelines, and best practices
  • Training on bias mitigation and responsible AI practices
  • Encourage cross-functional collaboration and knowledge-sharing
  • Implement a training and certification program
A

team training requirements

185
Q
  • Responsible framework and guidelines (bias, fairness, transparency, accountability)
  • Monitor AI and Generative AI for potential bias, fairness issue, and unintended consequences
  • Educate and train teams on responsible AI practices
A

responsible AI

186
Q
  • Attributing and acknowledging the sources of the data * Datasets, databases, other sources * Relevant licenses, terms of use, or permissions
A

source citation

187
Q
  • Example: generating fake content, manipulated data, automated attacks * Deploy AI-based threat detection systems * Analyze network traffic, user behavior, and other relevant data sources
A

threat detection

188
Q
  • Identify vulnerabilities in AI systems: software bugs, model weaknesses… * Conduct security assessment, penetration testing and code reviews * Patch management and update processes
A

vulnerability management

189
Q
  • Secure the cloud computing platform, edge devices, data stores * Access control, network segmentation, encryption * Ensure you can withstand systems failures
A

infrastructure protection

190
Q
  • Manipulated input prompts to generate malicious or undesirable content
  • Implement guardrails: prompt filtering, sanitization, validation
A

prompt injection

191
Q

______ – ratio of true positive predictions (correct vs. incorrect positive prediction)
* ______– ratio of true positive predictions compare to actual positive

A

precision
Recall

192
Q

True or false:

  • AWS responsibility - Security of the Cloud
193
Q

True or false.

  • Customer responsibility - not Security in the Cloud
A

false

  • For Bedrock, customer is responsible for data management, access controls,
    setting up guardrails, etc…
  • Encrypting application data
194
Q
  • Make sure models aren’t just developed but also deployed, monitored,
    retrained systematically and repeatedly
  • Extension of DevOps to deploy code regularly
195
Q
  • Users or Groups can be
    assigned JSON documents
    called _____
  • These _____define the
    ______of the users
A

policies
permissions

196
Q
  • ____are people within your organization, and can be grouped
197
Q
  • EC2 =
A

Elastic Compute Cloud

198
Q

_____ is a fully managed data security and data privacy service
that uses machine learning and pattern matching to discover and protect your sensitive data in AWS.
* ___helps identify and alert you to sensitive data, such as personal info.

A

Amazon Macie

199
Q
  • Helps with auditing and recording compliance of your AWS resources
  • Helps record configurations and changes over time
A

AWS Config

200
Q
  • Automated Security Assessments
A

Amazon inspector

201
Q
  • Provides governance, compliance and audit for your AWS Account
A

AWS CloudTrail

202
Q
  • Portal that provides customers with on-demand access to AWS
    compliance documentation and AWS agreements
A

AWS Artifact

203
Q
  • On-demand access to security compliance
    reports of Independent Software Vendors
    (ISVs)
A

AWS Artifact - third party reports

204
Q
  • Assess risk and compliance of your AWS workloads
  • Continuously audit AWS services usage and prepare audits
A

AWS Audit Manager

205
Q
  • No need to install anything
    – high level AWS account assessment
  • Analyze your AWS accounts and provides
    recommendation on 6 categories: * Cost optimization * Performance * Security * Fault tolerance * Service limits * Operational Excellence
A

AWS Trusted advisor

206
Q

private
network to deploy your resources
(regional resource)

A
  • VPC - Virtual Private Cloud
207
Q

_______allow you to partition your
network inside your VPC
(Availability Zone resource)

208
Q
  • A __________ is a subnet that is
    accessible from the internet
A

public subnet

209
Q
  • A _________ is a subnet that is
    not accessible from the internet
A

private subnet

210
Q
  • _____ helps our VPC
    instances connect with the internet
A

Internet Gateway

211
Q
  • ______ (AWS-managed) allow
    your instances in your Private Subnets
    to access the internet while remaining
    private
A

NAT Gateways

212
Q

We want to use ________
* Access an AWS service privately without
going over the public internet
* Usually powered by AWS PrivateLink
* Keep your network traffic internal to AWS
* Example: your application deployed in a VPC
can access a Bedrock model privately

A

VPC endpoints