Amazon Bedrock & Generative AI Flashcards
What is generative AI ?
Generative AI takes a set of data (training data) and learns patterns from that data. The data can be in the form of
Text
Image
Audio
Code
Video
What is a foundational model ?
A foundation model is a machine learning (ML) model that’s trained on large amounts of data and can be used for many tasks.
In LLMs what is a non deterministic response ?
non-deterministic meaning that for the same question answers will vary if only slightly
What is a Bedrock Agent ?
Agents can do the following
Manage and carry out various multi task steps related to infrastructure provisioning, application deployment and operational activities
Task Coordination: Perform tasks in the correct order and ensure information is passed correctly between tasks
Agents are configure to perform specific pre-defined action groups
Integrate with other systems, services, databases and API to exchange data or initiate actions
Leverage RAG to retrieve information when necessary
So an agent understands what it can do and will extract information from openapi, aws lambda or knowledge bases to satisfy the tasks asked of it. Think of it like a personal shopper in that it responds to your query and interrogates the data sources that are most applicab
What are the functions of Bedrock Guardrails
Guardrails as the name suggests are a protection mechanism that allow the following
Control the interaction between users and foundation models
Filter undesirable and harmful content
Remove PII
Enhance Privacy
Reduce Hallucinations
Ability to create an monitor multiple Guardrails
So in AWS bedrock you can set up a guardrail to be made up of several types of filter such as content, harmful categories, prompt attacks, sensitive information and word attacks.
What is RAG ?
Sometimes there are prompts that are only answered from a niche data set. A typical question is for example whose role is accounts is it to submit VAT returns. This relies on niche data not commonly available.
To get this to work we supply external data (S3) to build a knowledge base. When the prompt is submitted the knowledge base is used as a source of information to augment the prompt which is then submitted to the Foundation Model to generate a response.
What role in RAG does the vector database play ?
The vector database is responsible for supplying the information from the knowledge base. The two main ones are Aurora and Opensearch. Other available are MongoDb, Redis and Pinecone.
The rough workflow is that of the data in S3 being chunked and then vectored via amazon titan or cohere and then the vectors are stored in the vector database.
What data sources can be used to seed a knowledge base ?
- S3
- Confluence
- Sharepoint
- Salesforce
- Web Pages
What are the basic setup steps for a knowledge base in RAG
So there is a chat setup which is really like a playground that you use to get a feel for what is involved. In this playground you can supply a datasource and select a model and template. The template is the format for the answer coming out of the knowledge base that will sent into the foundational model (brain://Fd5CPf7KIkOOFIah23UmNw/FoundationalModels).
You can then interactively ask questions so that you can see how it all works.
Name the four cost areas in Bedrock ?
Prompt Engineering
Cheap no model retraining or fine tuning
RAG
Some cost to support external knowledge base
Instruction Based Fine Tuning
FM is fine tuned with specific instructions
Domain Adaption Fine-Tuning
Most expensive option as Model is trained on a domain specific dataset - requires intensive computation
Best strategy for cost savings in Bedrock is to minimise input tokens (prompt engingeering) and output tokens (response brevity)
What two functions of cloudwatch are available to Bedrock ?
Logging
Sends logs of all invocations to Amazon Cloudwatch or S3
Can include text, images and embeddings
Metrics
Metrics are available such as ContentFilteredCount to monitor if GuardRails are working
Alarms can be built on top of Metrics
Is Clustering supervised or unsupervised learning ?
Un supervised
Does high bias lead to overfitting or underfitting ?
Underfitting
Does high variance lead to overfitting or underfitting ?
Overfitting
What is bias versus variance trade-off
The bias versus variance trade-off refers to the challenge of balancing the error due to the model’s complexity (variance) and the error due to incorrect assumptions in the model (bias), where high bias can cause underfitting and high variance can cause overfitting
What is the terminology hierarchy ?
Artificial Intelligence > Machine Learning > Deep Learning > Generative AI
What is the difference between image processing and computer vision
Image processing focuses on enhancing and manipulating images for visual quality, whereas computer vision involves interpreting and understanding the content of images to make decisions
What is the difference in Feature Engineering between structured and unstructured data ?
Feature engineering for structured data typically includes tasks like normalization, handling missing values, and encoding categorical variables. For unstructured data, such as text or images, feature engineering involves different tasks like tokenization (breaking down text into tokens), vectorization (converting text or images into numerical vectors), and extracting features that can represent the content meaningfully.
In Bedrock which deployment type allows me to use a customised model ?
Provisioned Throughput
Under what model do FMs create labels ?
Foundation models use self-supervised learning to create labels from input data. In self-supervised learning, models are provided vast amounts of raw completely unlabeled data and then the models generate the labels themselves. This means no one has instructed or trained the model with labeled training data sets.