Gen AI & Bedrock Flashcards

Question 1

Q

Generative AI

Answer

A

Field of computer science as a subset of Deep Learning for generating new data similar to the data it was trained on, such as images, text, audio, video, code, etc.

Unlabeled Data is used to pre-train a Foundation Model backed by a neural network; this model can then be adapted for more specific uses like text generation, info extraction, chatbots, and more

Question 2

Q

Foundational Model

Answer

A

Large, deep learning neural networks that are used as starting points to develop ML models that power new applications more quickly and cost-effectively

Trained on a wide variety of input data, can cost tens of millions of dollars to train

Question 3

Q

Language Learning Model

Answer

A

Type of AI designed to generate coherent, human-like text; give it prompts to generate content based on its training data

Trained on large corpus of text data, and are usually very big models; billions of parameters, trained on books, articles, websites, etc.

Can perform language-related tasks, like translation, summarization, question-answering, content creation

Non-deterministic; responses can be different for each user with the same prompt

Question 4

Q

Bedrock

Answer

A

AWS service for building GenAI applications on AWS; fully managed and serverless, you keep control of training data

Leverage wide array of Foundation Models; this service creates a copy of the FM available only to you, which you can further fine-tune with your own data

None of your data is used to train the FM

Question 5

Q

Amazon Titan

Answer

A

High-performing series of Foundational Models proprietary to AWS

Question 6

Q

Fine-Tuning

Answer

A

Bedrock feature for adapting a copy of an FM with your own data; this will change the “weight” of the base FM

Training data must adhere to a specific format, and must be stored in S3

Must purchase Provisioned Throughput to use a model with this feature; not all models support this feature

Question 7

Q

Instruction-Based

Answer

A

Bedrock fine-tuning method that uses Labeled examples that consist of prompt-response pairs

Improves the performance of a pre-trained FM on domain-specific tasks; further trained on a particular field or area of knowledge

Single-Turn Messaging consists of system, messages, role, and content; intended for answering single, specific prompts

Multi-Turn Messaging to provide fine-tuning for a conversation like chatbots; must alternate between user and assistant messages

Question 8

Q

Continued Pre-Training

Answer

A

Bedrock fine-tuning method where you provide Unlabeled Data to continue training an FM; aka domain-adaptation fine-tuning to make a model expert in a specific domain

Good to feed industry-specific terminology to a model; i.e. give the entire AWS documentation to a model to make it an AWS expert

You can continue training the model as more data becomes available

Question 9

Q

Automatic Evaluation

Answer

A

Bedrock fine-tuning method where you provide Unlabeled Data to continue training an FM; aka domain-adaptation fine-tuning to make a model expert in a specific domain

Good to feed industry-specific terminology to a model; i.e. give the entire AWS documentation to a model to make it an AWS expert

You can continue training the model as more data becomes available

Question 10

Q

Human Evaluation

Answer

A

Bedrock fine-tuning method where you provide Unlabeled Data to continue training an FM; aka domain-adaptation fine-tuning to make a model expert in a specific domain

Good to feed industry-specific terminology to a model; i.e. give the entire AWS documentation to a model to make it an AWS expert

You can continue training the model as more data becomes available

Question 11

Q

ROUGE

Answer

A

Automated metric for FM evaluation for evaluating automatic summarization and machine translation software in natural language processing

N: measure the # of matching n-grams between reference text and generated text

L: find the Longest Common Subsequence between reference text and generated text

Recall-Oriented Understudy for Gisting Evaluation

Question 12

Q

BLEU

Answer

A

Automated metric for FM evaluation that evaluates the quality of generated text, especially for translations

Considers precision and penalizes too much brevity

Looks at a combination of n-grams (1, 2, 3, 4)

Question 13

Q

BERTScore

Answer

A

Automated metric for FM evaluation that checks the semantic similarity between generated text

Uses pre-trained models to compare the contextualized embeddings of both texts and computes the cosine similarity between them

Capable of capturing more nuance between the texts

Question 14

Q

Retrieval-Augmented Generation

Answer

A

The process of optimizing the output of an LLM so that it references an authoritative knowledge base outside of its training data sources before generating a response

Bedrock takes care of creating Vector Embeddings in the database of your choice based on your data

Use where real-time data is needed to be fed into the FM; building CS chatbots, legal research and analysis, healthcare Q&A

Question 15

Q

Vector Database

Answer

A

Database that hold the vector embeddings of your data

Data is stored as high-dimensional points that help your model determine which to use

Bedrock uses OpenSearch Serverless by default; can use Aurora PostgreSQL, as well as Pinecone, MongoDB Atlas, and Redis Enterprise Cloud

For general AWS use, can use OpenSearch and DocumentDB for real time similarity queries and storing millions of vector embeddings; RDS/Aurora PostgreSQL for relational DBs; Neptune for graph

Question 16

Q

Tokenization

Answer

Study These Flashcards

A

The AI process of converting raw text into a sequence of tokens

Word-based: text is split into individual words

Subword: some words can be split into smaller, sub-words; helpful for long words

Question 17

Q

Context Window

Answer

Study These Flashcards

A

The number of tokens an LLM can consider when generating text

The larger you use, the more information and coherence in your model

Larger of these require more memory and processing power as well

First factor to look at when considering a model

Question 18

Q

Embedding

Answer

Study These Flashcards

A

Question 19

Q

Guardrails

Answer

Study These Flashcards

A

Question 20

Q

Agent

Answer

Study These Flashcards

A

Question 21

Q

Studio

Answer

Study These Flashcards

A

Bedrock feature that gives Bedrock access to your teams so they can easily create AI-powered apps

Must first setup IAM IdC and create a Workspace; authorized IdC users can access the Workspace

Question 22

Q

Watermark Detection

Answer

Study These Flashcards

A

Bedrock feature that checks if an image was generated by Amazon Titan Generator

Question 23

Q

On-Demand

Answer

Study These Flashcards

A

Bedrock feature that checks if an image was generated by Amazon Titan GeneratorBedrock pricing model where you pay as you go, with no upfront commitment; only works with Base Models

Text Models charge for every input/output token processed

Embedding Models charge for every input token processed

Image Models charge for every image generated

Question 24

Q

Provisioned Throughput

Answer

Study These Flashcards

A

Bedrock pricing model where you purchase Model units for a specific time; works with Base Models and custom/fine-tuned models

Throughput represents the maximum number of input/output tokens processed per minute

You must purchase this for custom models; optional for Base Models

Gen AI & Bedrock Flashcards

Deck for the AWS AI Practitioner BETA Exam (24 cards)