AWS BEDROCK Flashcards
what is aws bedrock?
one of the fastest growing AI services, genAI Foundation Models are used to create new data by training on a wide variety of input data
bedrock makes a copy of the FM you choose, which you then fine-tune with your own data
it’s fully-managed by aws, with a pay-per-use pricing model, unified APIs, with out of the box features like RAG and LLM Agents
what are LLMs?
Large Language Models are designed to generate coherent human-like text, their output is NON-DETERMINISTIC
what are diffusion models?
genAI, create images or data by transforming noise into the desired output (reversing the diffusion proccess)
what are multimodal models?
models that accept various types of inputs AND outputs such as text, images, audio or video
what is amazon titan?
high performing aws foundation model
multimodal choices via a fully-managed API, customizable with your own data, 8k tokens, +100 languages
content creation, classification, education
what is fine-tuning in the aws bedrock?
adapting a copy of a FM with your data, fine-tuning changes the weight of the base FM
data must adhere to a specific format and be stored in aws S3
you must use “provisioned throughput” as the pricing model for a fine-tuned FM as oposed to on-demand pricing for non fine-tuned FMs
what is instructions-based fine tuning?
improving the performance of the FM on domain specific tasks with labeled examples called PROMPT RESPONSE PAIRS
there is also single-turn and multi-turn messaging for chatbots, giving hints into the interaction between a user and an assistant (bot)
what is continued pre-training?
since we know FMs have been trained with unlabelled data, we provide more unlabeled data
only input, no output
what is domain-adaptation fine-tuning?
to make a model an expert in a specific domain, good technique for feeding industry-specific terminology
as more data becomes available the model can be continually trained, only input no output
what is transfer learning?
using pre-trained model to adapt it to a new task, it’s widely used for image classification and for NLP models (chatGPT)
fine-tuning is a specific kind of transfer learning
what are the built-in task types for automatic evaluation in aws bedrock and how does it work?
text summarization
question and answer
text classification
open-ended text generation
you bring your own prompt dataset or use built-in curated datasets, scores are calculared automatically, the generated answers are compared between FM using a genAI model called judge model
what are benchmark datasets?
curated collection of data designed specifically for evaluating the performance of a language model
very quick, very low administrative effort to evaluate your models for potential bias, you can also create your own
there are also human evaluations, which are the same with the benchmark answers and questions but compared by humans
what are the automatic metrics to evaluate a FM?
ROUGE (Recall-Oriented Understudy for Gisting Evaluation): for automatic summarization and language translation systems
ROUGE-N: measures matching n-grams between reference and generated text
ROUGE-L: computes longest common subsequence between reference and generated text
F1 Score for BINARY CLASSIFICATION (spam vs. not spam) of UNBALANCED DATASETS
ACCURACY gor BINARY CLASSIFICATION OF BALANCED DATASETS
BLEU (Bilingual Evaluation Understudy): evaluates the quality of generated text, specially for translation; penalizes too much brevity
BERTScore (Bidirectional Encoder Representations from Transformers): for SEMANTIC SIMILARITY, compares embeddings
Perplexity: how well the model will predict the next token, meaning that if a model is very confident about the next token it will be less perplexed
what is RAG?
Retrieval Augmented Generation allows the FM to reference a data source outside of its training data without being fine-tuned
knowledge bases are built and managed by aws bedrock with a data source (like your S3 bucket), and when a user needs information from there it will be searched (retrieved if you will)
knowledge bases are backed by vector databases, which bedrock populates by creating vector embeddings, which enable the searches
RAG is used when real-time data is needed to be fed into the FM
what are RAG vector databases?
aws opensearch service - built to handle search and analytics workloads
aws aurora - online transaction processing workloads
as well as others like mongodb, redis, pinecone
if you don’t specify it aws is going to create an opensearch service serverless database
what are embedding models?
aws titan
cohere
they take the docs in your S3, chunk (split) them and feed them into the embedding model, generating vectors into the vector database
the outcome is that now these vectors are easily searchable, so when RAG looks into the database it’s able to find what it needs
what are the types of vector database?
aws opensource service: search and analytics database
aws documentDB: nosql database
relational databases:
aws aurora: relational database, cloud
aws rds for postgresql: relational, open-source
and also aws neptune: graph database
what is aws bedrock agent?
the model is able to ask questions, perform multi-step (chain of thought) tasks like creating infrastructure, deploying applications and do operations on our systems
by creating action groups, the agents are configured to understand what these action groups do and mean, and be able to integrate with other systems to initiate action
it can also look at RAG
why use cloudwatch for bedrock?
for cloud monitoring, with alarms, logs, and so on
also for model invocation logging
cloudwatch metrics: related to bedrock and also guardrails you set up (like content filtered count)
other features of aws bedrock
aws bedrock studio: give access to your teams so they can easily create AI powered applications
watermark detection: check if an image was generated by aws titan generator
what are the pricing models on aws bedrock?
on demand:
text models: charged for in/output token
embeddings: charged for input token
image models: charged for image
batch mode pricing: multiple predictions at a time (output is a file in s3) and can provide discounts of up to 50%
provisioned throughput: purchase model units for a certain time period, good for capacity and performance but more expensive; necessary for fine-tuned models
what are the model improvement techniques from cheapest to most expensive?
$ prompt engineering
$$ RAG
$$$ instruction-based fine-tuning
$$$$ domain adaptation fine-tuning
what is prompt engineering and what are some prompt engineering techniques?
prompt engineering is developing, designing and optimizing prompts to enhance the output of FM for your needs
improved prompting technique: instructions, context, input data and output indicators
negative prompting technique: explicit instructions on what NOT to do
zero-shot prompting: presenting a task without any examples or training
few-shots prompting: presenting a task with a few examples - user input and user intent
one-shot prompting
chain-of-thought prompting: divide tasks into a sequence of reasoning steps, more structure and coherence
retrieval augmented generation: combining the model’s capability with external data sources
what are some customization options for foundation models on bedrock?
system prompts: how the model should behave and reply
temperature: the higher the more creative the model, maybe less coherent
top p: P IS FOR PERCENTAGE. the lower the percentage the more coherent the response
top k: K IS FOR PROBABILITY. the lower the probability the more coherent the output
length: of the output
stop sequence: tokens that signal the model to stop generating output
what is prompt latency?
how fast the model responds, it’s impacted by:
model size
model type
number of in/output tokens
latency is not impacted by top p, top k or temperature
what are prompt templates?
a way to simplify and standardize the proccess of generating prompts by steering the users into giving us specific information into a template to generate a prompt
what is model invocation logging?
feature in bedrock that allows for detailed logging of all requests and responses during model invocations