Amazon Bedrock and Generative AI Flashcards
What is Generative AI?
A subset of deep learning that generates new data similar to its training data.
What types of data can Generative AI be trained on?
Text, images, audio, code, video, and more.
What is a foundation model?
A large, general-purpose AI model trained on massive amounts of data for a variety of tasks.
Name a few companies that create foundation models.
OpenAI, Meta, Amazon, Google, Anthropic.
What is an example of an open-source foundation model?
Meta’s LLaMA, Google’s BERT.
What is an LLM?
A Large Language Model trained to understand and generate human-like text.
How are LLMs trained?
On massive text datasets like books, websites, articles.
What does non-deterministic output mean in LLMs?
Same prompt can produce different outputs due to probabilistic word generation.
Why is LLM output non-deterministic?
It selects next words based on probability distributions, not fixed rules.
What are some tasks LLMs can perform?
Translation, summarization, Q&A, content generation.
How do diffusion models generate images?
By reversing a process that gradually adds noise to images.
What is forward diffusion?
A process where noise is added to an image over time until it’s unrecognizable.
What is reverse diffusion?
The process of removing noise step-by-step to generate an image from random noise.
What is Stable Diffusion?
A model/company using diffusion methods to generate images from text or other images.
Can Gen AI generate text from images?
Yes, it can analyze an image and generate descriptive text or answer questions.
What is Amazon Bedrock?
A fully managed AWS service to build and scale generative AI applications using various foundation models.
Does your data leave your AWS account when using Bedrock?
No, all operations occur within your AWS account; data stays private.
What is the pricing model of Amazon Bedrock?
Pay-per-use.
What is meant by ‘unified API’ in Bedrock?
A standardized interface to access all supported foundation models, simplifying integration.
What companies provide models on Amazon Bedrock?
AI21 Labs, Cohere, Stability.ai, Amazon, Anthropic, Meta, Mistral AI, and more.
Can you fine-tune foundation models on Amazon Bedrock?
Yes, using your own data, within your own account.
Does fine-tuning share your data with the model provider?
No, your data is never sent back to the model provider.
What is the Amazon Bedrock Playground?
An interactive interface to experiment with foundation models by submitting prompts.
What advanced features does Amazon Bedrock offer?
RAG (Retrieval-Augmented Generation), LLM agents, knowledge bases, security, and responsible AI features.
What is RAG in Amazon Bedrock?
A method to enhance model answers by retrieving relevant information from external data sources.
What is a knowledge base in Bedrock?
An external data store connected to Bedrock to provide domain-specific context for more accurate responses.
How does Amazon Bedrock support application integration?
Through a single unified API, making it easy to interact with different models programmatically.
Can you use Bedrock to build a chatbot?
Yes, using LLMs and additional tools like knowledge bases and RAG to create intelligent conversational agents.
What factors should you consider when selecting a foundation model on Amazon Bedrock?
Model type, performance, customization options, inference capabilities, licensing, context window, latency, modality support, compliance, and cost.
What is a multimodal foundation model?
A model that can accept and produce multiple types of data, such as text, audio, image, and video.
What is Amazon Titan?
A high-performing foundation model developed by AWS with support for text and image generation, available via Amazon Bedrock.
Can Amazon Titan be customized with your own data?
Yes, it supports fine-tuning using your own data within your AWS account.
What is the trade-off between smaller and larger models?
Smaller models are more cost-effective but have limited knowledge; larger models are more capable but expensive.
What is Llama-2 and who created it?
A foundation model created by Meta, focused on English text generation and large-scale tasks.
What is Claude and who developed it?
A foundation model developed by Anthropic, known for its large context window and strong document analysis capabilities.
What is Stability AI known for on Bedrock?
Image generation using the Stable Diffusion model, useful for advertising and media content.
Why might a larger context window be useful?
It allows you to input large documents, code bases, or books, enabling the model to reason over more content.
What are use cases for Amazon Titan?
Content creation, classification, and educational applications.
What are use cases for Claude?
Analysis, forecasting, and document comparison due to its large context window.
What are use cases for Stability AI?
Image generation for advertising, media, and creative projects.
How does pricing affect foundation model choice?
More capable models may be more expensive; choosing a model that balances cost and performance is crucial.
How is pricing typically measured on Amazon Bedrock?
By the number of tokens processed (e.g., cost per 1,000 tokens).
What is a potential risk when using foundation models with pay-per-use pricing?
Costs can escalate quickly if usage isn’t carefully monitored.
What is fine-tuning in Amazon Bedrock?
Adapting a copy of a foundation model by training it with your own data to improve performance on domain-specific tasks.
Where must training data be stored for fine-tuning in Amazon Bedrock?
In Amazon S3.
Does fine-tuning change the foundation model itself?
Yes, it updates the model’s weights based on your data.
What pricing model must you use for a fine-tuned model on Amazon Bedrock?
Provisioned throughput.
Are all models on Amazon Bedrock fine-tunable?
No, only some models, typically open-source ones, support fine-tuning.
What is instruction-based fine-tuning?
Fine-tuning using labeled data with prompt-response pairs to improve performance on specific tasks.
What kind of data is used for instruction-based fine-tuning?
Labeled data with prompt-response pairs.
What is continued pre-training in Bedrock?
Fine-tuning using unlabeled data to adapt a foundation model to a specific domain.
What is another name for continued pre-training?
Domain-adaptation fine-tuning.
When should you use continued pre-training?
When you have large amounts of unlabeled domain-specific data.
What is an example use case of continued pre-training?
Feeding the entire AWS documentation to make the model an AWS expert.
What are single-turn and multi-turn messaging in fine-tuning?
Fine-tuning approaches that teach a model how to handle one-turn or conversational multi-turn chat interactions.
What roles are defined in multi-turn messaging format?
System (optional context), User, and Assistant.
Which fine-tuning method is cheaper: instruction-based or continued pre-training?
Instruction-based fine-tuning is generally cheaper and uses less data.
What does continued pre-training require?
A large amount of unlabeled data and more computation, thus higher cost.
What is transfer learning?
Using a pre-trained model and adapting it to a new but related task—fine-tuning is a form of transfer learning.
What is a practical use case for transfer learning in image classification?
Using a pre-trained model for edge detection and adapting it to classify a specific kind of image.
What’s the difference between transfer learning and fine-tuning?
Fine-tuning is a specific application of transfer learning tailored to refining model behavior with new data.
When is fine-tuning a good idea?
When you need a custom tone/persona, work with proprietary data, or aim to improve accuracy for specific tasks.
What kind of data would trigger instruction-based fine-tuning?
Labeled data with prompt-response examples.
What kind of data would trigger continued pre-training?
Unlabeled data, such as raw domain-specific documentation.
Why is provisioned throughput more expensive?
It provides dedicated infrastructure for consistent performance with fine-tuned models.
What type of expert might be needed for fine-tuning a model?
A machine learning engineer, though Bedrock simplifies the process.
What is Automatic Evaluation in Amazon Bedrock?
It’s a feature to evaluate a model for quality control by submitting it tasks and using benchmark datasets, then automatically scoring its performance using judge models.
What are the built-in task types available for automatic evaluation in Bedrock?
Text summarization, question and answer, text classification, and open-ended text generation.
What are benchmark questions and answers used for?
They help test the model by comparing its generated answers to ideal (benchmark) answers to assess accuracy.
What is the purpose of a judge model in automatic evaluation?
The judge model compares the model-generated answer to the benchmark answer and assigns a score based on similarity.
Can you bring your own benchmark dataset in Amazon Bedrock?
Yes, you can use your own or a curated dataset from AWS.
What are the benefits of using benchmark datasets?
They help measure accuracy, speed, scalability, and detect bias in the model.
What is the difference between automatic and human evaluation?
Automatic uses judge models and metrics, while human evaluation involves people scoring the outputs based on criteria like relevance or correctness.
What kind of metrics are used in human evaluation?
Thumbs up/down, ranking, and other grading scales.
What does ROUGE stand for?
Recall-Oriented Understudy for Gisting Evaluation.
What is ROUGE used for?
Evaluating summarization and machine translation by comparing n-grams in reference and generated text.
What is ROUGE-N?
A ROUGE metric measuring how many n-grams (e.g., 1-gram, 2-gram) match between reference and generated texts.
What is ROUGE-L?
It computes the longest common subsequence between the reference and generated text.
What does BLEU stand for?
Bilingual Evaluation Understudy.
What is BLEU used for?
Evaluating the quality of translated text, focusing on precision and penalizing brevity.
What does BERTScore evaluate?
Semantic similarity between texts using embeddings and cosine similarity.
Why is BERTScore better than ROUGE or BLEU for nuanced text?
Because it compares meanings using embeddings rather than just word overlap.
What is perplexity in the context of language models?
A measure of how well the model predicts the next token; lower is better.
What does low perplexity indicate?
That the model is confident and accurate in predicting the next token.
What can be done with evaluation metrics in a feedback loop?
They can be used to retrain and improve model outputs over time.
Name some business metrics to evaluate a foundation model.
User satisfaction, average revenue per user, cross-domain performance, conversion rates, efficiency.
Why would you create a custom benchmark dataset?
To evaluate the model using criteria specific to your business needs.
What does RAG stand for in generative AI?
Retrieval Augmented Generation
What is the core idea behind RAG?
It allows a foundation model to reference external data sources without fine-tuning.
What AWS service is used to manage the knowledge base in a RAG system?
Amazon Bedrock
What storage service is commonly used as the data source for the knowledge base in AWS Bedrock?
Amazon S3
What type of database underlies a knowledge base in a RAG system?
Vector database
What does a vector database store in the context of RAG?
Vector embeddings of chunks of data for semantic search
What are embeddings in the context of RAG?
Numerical representations of text used to measure similarity
What happens to a user’s query in RAG before being sent to the foundation model?
It’s augmented with retrieved information from the knowledge base, First model will search all related data to query from vector DB then pass it to main FM “Original Query + Retrieved Text “ . then main FM generates final output
Name two AWS services that can be used as vector databases for RAG.
Amazon OpenSearch Service, Amazon Aurora
Name three third-party vector databases supported by AWS Bedrock.
MongoDB, Redis, Pinecone
What happens if no vector database is specified in AWS Bedrock?
AWS automatically creates a serverless OpenSearch vector database
Which two models can be used for embeddings in AWS Bedrock?
Amazon Titan, Cohere
Can the embeddings model and foundation model be different in AWS Bedrock?
Yes
What is the purpose of chunking documents in RAG?
To split them into smaller parts for vector embedding and search
What kind of performance does Amazon OpenSearch offer for RAG?
Real-time similarity search with scalable index management and KNN support
What is Amazon DocumentDB best known for in RAG use cases?
NoSQL compatibility and support for real-time vector similarity search
Which two relational databases are supported for vector storage in AWS Bedrock?
Amazon Aurora, Amazon RDS for PostgreSQL
What AWS service should you choose for graph-based data in a RAG system?
Amazon Neptune
What are the common data sources for AWS Bedrock knowledge bases?
Amazon S3, Confluence, SharePoint, Salesforce, Webpages
Give one use case for RAG in customer support.
Building a chatbot that retrieves answers from product documentation and FAQs
Give one use case for RAG in legal research.
Chatbot answering legal queries based on case law, regulations, and legal opinions
Give one use case for RAG in healthcare.
AI assistant answering medical questions based on treatments and research papers