Domain 3: Application of Foundation Models 28% Flashcards
What’s the goal with Gen AI?
To architect an efficient scalable solution that does not reduce model performance.
Cost, latency constraints, and required modalities
considerations for apps that use FMs
Understanding the requirements for your use case in terms of _____ is important when deciding on AI model.
inference speed
Cost
find the balance between training time, cost, and model performance
Latency
Consider real-time results requirements, inference times
_____ is the duration it takes a model to process data and produce a prediction.
Inference speed
Modalities
specific embedding, multi-model, multilingual, pre-trained (architecture/complexity)
Accuracy, precision, recall, F1 score, root mean squared error or RMSE, mean average precision or MAP, and mean absolute error, MAE.
standard metrics to evaluate and compare different models
It’s important when you are choosing an appropriate metric or set of metrics to _____ before selecting a model.
assess your model’s performance
Framework, language, environment, license, documentation, whether the model has been updated and maintained regularly, known issues or limitations, customization, and explainability.
Compatibilities should you consider when using a pre-trained model online
flexible, modular, transparent, provide tools or methods to visualize or interpret their inner workings, interpret and explain model outcomes
What to look for when considering a pre-trained model
T/F: Foundation models are not interpretable by design because they are extremely complex.
True
_____ attempts to explain the black box nature of FMs, by approximating it locally with a simpler model that is interpretable.
Explainability
T/F: If interpretability is a requirement, then pre-trained foundation models might not be the best choice.
True
T/F: Linear regression and decision trees might be better when it comes to explainability.
True
The complexity of a model is important and can help you uncover intricate patterns within the data, but it can add challenges to _____ and _____.
maintenance and interpretability
T/F: Greater complexity might lead to enhanced performance, but can increase costs.
True
T/F: The more complicated the model is, the harder it is to explain the outputs of the model.
True
The _____ is where you process new data through the model to make predictions. It is the process of generating an output from an input that you provided to model.
inference
_____ gives you the ability to run inference in the foundation model you choose.
Amazon Bedrock
A _____, which is an input, is provided to the model for it to generate a response.
prompt
_____ are a set of values that can be adjusted to limit or influence the model response.
inference parameters
What kind of models can you run inference with?
base, custom, and provision to test FM responses
Amazon Bedrock foundation models support the inference parameters of _____ to control randomness and diversity in the response.
temperature, Top K, Top P
What parameters are supported by Bedrock to limit the length of responses
response length, penalties, and stop sequences
These inputs guide LLMS to generate an appropriate response or output for a given task or instruction.
Prompts
You can integrate additional domain-specific data from these data stores or vector data stores that add to your prompts semantically relevant inputs.
retrieval augmented generation, RAG
A _____ is a collection of data that is stored as mathematical representations.
vector database
It requires millions of graphic processing units, GPUs, compute hours, terabytes and petabytes of data, trillions of tokens, trial and error, and time; generative AI models learn its capabilities.
pre-training
_____ add additional capabilities for efficient and fast lookup, and to provide data management, fault tolerance, authentication, access control, and query engine.
Vector databases
_____ enhances language models to retrieve and use external knowledge during the generation process. It is a technique in which the retrieval of information from data sources augments the generation of model responses.
RAG
RAG combines two components, a _____ component, which searches through a knowledge base and a _____ component, which produces outputs based on the retrieved information.
retriever / generator
Why does RAG combine two components?
helps the model access up-to-date and domain-specific knowledge beyond their training data
Prompt is passed into the query encoder, which encodes or embeds the data into the same format as the external data. Then the embedding can be passed to the vector database to search and return similar embeddings that have been through the model. Those embeddings are then attached to my new query and can also be mapped back to their original location. If the vector database finds similar data, then the retriever retrieves that data, the LLM combines or augments the new data or text with the original prompt, then the prompt is sent to the LLM to return a completion.
How to use a vector database in the real world
How does RAG solve hallucinations?
By complimenting generative LLMs with an external knowledge base that is typically built using a vector database, hydrated with vector-coded knowledge articles
Amazon OpenSearch Service, Amazon Aurora, Redis, Amazon Neptune, Amazon DocumentDB with MongoDB compatibility, and Amazon RDS with PostgreSQL
AWS services that help store embeddings within vector databases
The _____ delivers low-latency search and aggregations, dashboards, visualization, and dashboarding tools. It also has plugins that provide advanced capabilities such as alerting, fine-grained access control, observability, security monitoring and vector storage and processing. With this service’s vector database capabilities, you can implement semantic search, retrieval of augmented generation, RAG with LLMs, recommendation engines, and search media too.
OpenSearch search engine
With _____ you can securely connect foundation models, FMs, to your company data. It is stored as embeddings in the vector engine for more relevant, context-specific, and accurate responses without continuously re-training the FM. Amazon RDS for PostgreSQL also supports the pgvector extension to store embeddings and perform efficient searches.
a fully managed RAG offered by knowledge bases for Amazon Bedrock