Gen AI & Bedrock Flashcards

Deck for the AWS AI Practitioner BETA Exam

1
Q

Generative AI

A

Field of computer science as a subset of Deep Learning for generating new data similar to the data it was trained on, such as images, text, audio, video, code, etc.

Unlabeled Data is used to pre-train a Foundation Model backed by a neural network; this model can then be adapted for more specific uses like text generation, info extraction, chatbots, and more

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Foundational Model

A

Large, deep learning neural networks that are used as starting points to develop ML models that power new applications more quickly and cost-effectively

Trained on a wide variety of input data, can cost tens of millions of dollars to train

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Language Learning Model

A

Type of AI designed to generate coherent, human-like text; give it prompts to generate content based on its training data

Trained on large corpus of text data, and are usually very big models; billions of parameters, trained on books, articles, websites, etc.

Can perform language-related tasks, like translation, summarization, question-answering, content creation

Non-deterministic; responses can be different for each user with the same prompt

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Bedrock

A

AWS service for building GenAI applications on AWS; fully managed and serverless, you keep control of training data

Leverage wide array of Foundation Models; this service creates a copy of the FM available only to you, which you can further fine-tune with your own data

None of your data is used to train the FM

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Amazon Titan

A

High-performing series of Foundational Models proprietary to AWS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Fine-Tuning

A

Bedrock feature for adapting a copy of an FM with your own data; this will change the “weight” of the base FM

Training data must adhere to a specific format, and must be stored in S3

Must purchase Provisioned Throughput to use a model with this feature; not all models support this feature

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Instruction-Based

A

Bedrock fine-tuning method that uses Labeled examples that consist of prompt-response pairs

Improves the performance of a pre-trained FM on domain-specific tasks; further trained on a particular field or area of knowledge

Single-Turn Messaging consists of system, messages, role, and content; intended for answering single, specific prompts

Multi-Turn Messaging to provide fine-tuning for a conversation like chatbots; must alternate between user and assistant messages

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Continued Pre-Training

A

Bedrock fine-tuning method where you provide Unlabeled Data to continue training an FM; aka domain-adaptation fine-tuning to make a model expert in a specific domain

Good to feed industry-specific terminology to a model; i.e. give the entire AWS documentation to a model to make it an AWS expert

You can continue training the model as more data becomes available

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Automatic Evaluation

A

Bedrock fine-tuning method where you provide Unlabeled Data to continue training an FM; aka domain-adaptation fine-tuning to make a model expert in a specific domain

Good to feed industry-specific terminology to a model; i.e. give the entire AWS documentation to a model to make it an AWS expert

You can continue training the model as more data becomes available

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Human Evaluation

A

Bedrock fine-tuning method where you provide Unlabeled Data to continue training an FM; aka domain-adaptation fine-tuning to make a model expert in a specific domain

Good to feed industry-specific terminology to a model; i.e. give the entire AWS documentation to a model to make it an AWS expert

You can continue training the model as more data becomes available

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

ROUGE

A

Automated metric for FM evaluation for evaluating automatic summarization and machine translation software in natural language processing

N: measure the # of matching n-grams between reference text and generated text

L: find the Longest Common Subsequence between reference text and generated text

Recall-Oriented Understudy for Gisting Evaluation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

BLEU

A

Automated metric for FM evaluation that evaluates the quality of generated text, especially for translations

Considers precision and penalizes too much brevity

Looks at a combination of n-grams (1, 2, 3, 4)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

BERTScore

A

Automated metric for FM evaluation that checks the semantic similarity between generated text

Uses pre-trained models to compare the contextualized embeddings of both texts and computes the cosine similarity between them

Capable of capturing more nuance between the texts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Retrieval-Augmented Generation

A

The process of optimizing the output of an LLM so that it references an authoritative knowledge base outside of its training data sources before generating a response

Bedrock takes care of creating Vector Embeddings in the database of your choice based on your data

Use where real-time data is needed to be fed into the FM; building CS chatbots, legal research and analysis, healthcare Q&A

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Vector Database

A

Database that hold the vector embeddings of your data

Data is stored as high-dimensional points that help your model determine which to use

Bedrock uses OpenSearch Serverless by default; can use Aurora PostgreSQL, as well as Pinecone, MongoDB Atlas, and Redis Enterprise Cloud

For general AWS use, can use OpenSearch and DocumentDB for real time similarity queries and storing millions of vector embeddings; RDS/Aurora PostgreSQL for relational DBs; Neptune for graph

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Tokenization

A

The AI process of converting raw text into a sequence of tokens

Word-based: text is split into individual words

Subword: some words can be split into smaller, sub-words; helpful for long words

17
Q

Context Window

A

The number of tokens an LLM can consider when generating text

The larger you use, the more information and coherence in your model

Larger of these require more memory and processing power as well

First factor to look at when considering a model

18
Q

Embedding

A
19
Q

Guardrails

A
20
Q

Agent

A
21
Q

Studio

A

Bedrock feature that gives Bedrock access to your teams so they can easily create AI-powered apps

Must first setup IAM IdC and create a Workspace; authorized IdC users can access the Workspace

22
Q

Watermark Detection

A

Bedrock feature that checks if an image was generated by Amazon Titan Generator

23
Q

On-Demand

A

Bedrock feature that checks if an image was generated by Amazon Titan GeneratorBedrock pricing model where you pay as you go, with no upfront commitment; only works with Base Models

Text Models charge for every input/output token processed

Embedding Models charge for every input token processed

Image Models charge for every image generated

24
Q

Provisioned Throughput

A

Bedrock pricing model where you purchase Model units for a specific time; works with Base Models and custom/fine-tuned models

Throughput represents the maximum number of input/output tokens processed per minute

You must purchase this for custom models; optional for Base Models