Domain 3: Application of Foundation Models 28% Flashcards
What does finding the balance between training time, cost, and model performance yield?
An efficient scalable solution that does not reduce model performance.
Cost, latency constraints, and required modalities
considerations for apps that use FMs
Understanding the requirements for your use case in terms of _____ is important when deciding on AI model.
inference speed
Cost
find the balance between training time, cost, and model performance
Latency
Consider real-time results requirements, inference times
_____ is the duration it takes a model to process data and produce a prediction.
Inference speed
Modalities
specific embedding, multi-model, multilingual, pre-trained (architecture/complexity)
Accuracy, precision, recall, F1 score, root mean squared error or RMSE, mean average precision or MAP, and mean absolute error, MAE.
standard metrics to evaluate and compare different models
It’s important when you are choosing an appropriate metric or set of metrics to _____ before selecting a model.
assess your model’s performance
Framework, language, environment, license, documentation, whether the model has been updated and maintained regularly, known issues or limitations, customization, and explainability.
Compatibilities should you consider when using a pre-trained model online
flexible, modular, transparent, provide tools or methods to visualize or interpret their inner workings, interpret and explain model outcomes
What to look for when considering a pre-trained model
T/F: Foundation models are not interpretable by design because they are extremely complex.
True
_____ attempts to explain the black box nature of FMs, by approximating it locally with a simpler model that is interpretable.
Explainability
T/F: If interpretability is a requirement, then pre-trained foundation models might not be the best choice.
True
T/F: Linear regression and decision trees might be better when it comes to explainability.
True
The complexity of a model is important and can help you uncover intricate patterns within the data, but it can add challenges to _____ and _____.
maintenance and interpretability
T/F: Greater complexity might lead to enhanced performance, but can increase costs.
True
T/F: The more complicated the model is, the harder it is to explain the outputs of the model.
True
The _____ is where you process new data through the model to make predictions. It is the process of generating an output from an input that you provided to model.
inference
_____ gives you the ability to run inference in the foundation model you choose.
Amazon Bedrock
A _____, which is an input, is provided to the model for it to generate a response.
prompt
_____ are a set of values that can be adjusted to limit or influence the model response.
inference parameters
What kind of models can you run inference with?
base, custom, and provision to test FM responses
Amazon Bedrock foundation models support the inference parameters of _____ to control randomness and diversity in the response.
temperature, Top K, Top P
What parameters are supported by Bedrock to limit the length of responses
response length, penalties, and stop sequences
These inputs guide LLMS to generate an appropriate response or output for a given task or instruction.
Prompts
You can integrate additional domain-specific data from these data stores or vector data stores that add to your prompts semantically relevant inputs.
retrieval augmented generation, RAG
A _____ is a collection of data that is stored as mathematical representations.
vector database
It requires millions of graphic processing units, GPUs, compute hours, terabytes and petabytes of data, trillions of tokens, trial and error, and time; generative AI models learn its capabilities.
pre-training
_____ add additional capabilities for efficient and fast lookup, and to provide data management, fault tolerance, authentication, access control, and query engine.
Vector databases
_____ enhances language models to retrieve and use external knowledge during the generation process. It is a technique in which the retrieval of information from data sources augments the generation of model responses.
RAG
RAG combines two components, a _____ component, which searches through a knowledge base and a _____ component, which produces outputs based on the retrieved information.
retriever / generator
Why does RAG combine two components?
helps the model access up-to-date and domain-specific knowledge beyond their training data
Prompt is passed into the query encoder, which encodes or embeds the data into the same format as the external data. Then the embedding can be passed to the vector database to search and return similar embeddings that have been through the model. Those embeddings are then attached to my new query and can also be mapped back to their original location. If the vector database finds similar data, then the retriever retrieves that data, the LLM combines or augments the new data or text with the original prompt, then the prompt is sent to the LLM to return a completion.
How to use a vector database in the real world
How does RAG solve hallucinations?
By complimenting generative LLMs with an external knowledge base that is typically built using a vector database, hydrated with vector-coded knowledge articles
Amazon OpenSearch Service, Amazon Aurora, Redis, Amazon Neptune, Amazon DocumentDB with MongoDB compatibility, and Amazon RDS with PostgreSQL
AWS services that help store embeddings within vector databases
The _____ delivers low-latency search and aggregations, dashboards, visualization, and dashboarding tools. It also has plugins that provide advanced capabilities such as alerting, fine-grained access control, observability, security monitoring and vector storage and processing. With this service’s vector database capabilities, you can implement semantic search, retrieval of augmented generation, RAG with LLMs, recommendation engines, and search media too.
OpenSearch search engine
With _____ you can securely connect foundation models, FMs, to your company data. It is stored as embeddings in the vector engine for more relevant, context-specific, and accurate responses without continuously re-training the FM. Amazon RDS for PostgreSQL also supports the pgvector extension to store embeddings and perform efficient searches.
a fully managed RAG offered by knowledge bases for Amazon Bedrock
- A fully managed AI capability from AWS to help you build applications foundation models.
- Can automatically break down tasks and generate the required orchestration logic or write custom code, and they can securely connect to your databases through APIs.
- They can ingest and structure the data for machine consumption and augment it with contextual details to produce more accurate responses and fulfill requests.
- They are an additional piece of software that orchestrates the prompt completion workflows and interactions between the user requests, foundation model, and external data sources or applications.
- They automatically call APIs to take actions and invoke knowledge bases to supplement information for these actions.
Agents for Amazon Bedrock
_____ are a specific set of inputs provided by you the user. They guide LLMs to generate an appropriate response or output for a given task or instruction.
Prompts
A _____ contains components that you want the LLM to perform such as the task or instruction. You might also need the context of that task or instruction and the input text that you want for the response or output.
prompt
When you provide a few examples to help the LLM models better perform and calibrate their output to meet your expectations.
few-shot prompting
A sentiment classification prompt with no examples provided to the prompt.
zero-shot prompting
Where the actual prompt text is replaced with a continuous embedding backer that is optimized during training. This technique helps the prompt to be fine-tuned for a specific task. At the same time, it keeps the rest of the model parameters frozen, which can be more efficient than full fine-tuning.
prompt tuning
The practice of crafting and optimizing input prompts. It selects appropriate words, phrases, sentences, punctuation, and separator characters to effectively use LLMs for a wide variety of applications.
prompt engineering
Classification, question and answer with and without context, summarization, open-ended text generation, code generation, math, and reasoning or logical thinking.
common tasks supported by LLMs on Amazon Bedrock
This is:
1. The encoded knowledge of language in an LLM.
2. The stored patterns of data that capture relationships and, when prompted, reconstruct language from those patterns.
3. An understanding of patterns that the model can use to generate new outputs.
4. A statistical database.
Latent space
When you write a prompt for a language model, that prompt is ingested by the model and _____. It returns a pile of statistics that then get assembled as words.
refers to its latent space against its database of statistics
Designing and refining the input prompts that are fed into the model to guide it towards producing the desired outputs.
prompt engineering
- Be specific and provide clear instructions or specifications for the task at hand. For example, include the desired format, examples, comparison, style, tone, output length, and detailed context.
- Include examples of the desired behavior and direction, such as sample texts, data formats, templates, code, graphs, charts, and more.
- Experiment and use an iterative process to test prompts and understand how the modifications alter the responses.
- Know the strengths and weaknesses of your model.
- Balance simplicity and complexity in your prompts to avoid vague, unrelated, or unexpected answers.
- Specifically for your prompt engineers, use multiple comments to offer more context without cluttering your prompt.
- Add guardrails.
prompt engineering techniques
Attacks of prompt manipulation with an untrusted input that is created by a user to produce malicious, undesired, or elicit response.
prompt injection
When an attacker tries to bypass the guardrails that you have established, this is called _____.
jailbreaking
_____ is an attempt to change or manipulate the original prompt with new instructions.
Hijacking
_____ is another risk of prompt engineering where harmful instructions are embedded in messages, emails, web pages, and more.
Poisoning
- Use these services to build applications that generate high-quality text for use cases such as content creation summarization, question answering and chatbots.
- Offer pre-trained language models that can be customized and controlled through prompt engineering.
- They provide APIs and tools for constructing and refining prompts, along with monitoring and analyzing the resulting outputs.
Amazon Bedrock and Amazon Titan
What are the key elements of training a foundation model?
They include pre-training, fine-tuning, and continuous pre-training.
With _____, you train the LLM by using huge amounts of unstructured data with self-supervised learning.
pre-training
_____ is a process that extends the training of the model to improve the generation of completions for a specific task. It is a supervised learning process and you use a dataset of labeled examples to update the weights of the LLM, it helps to adapt foundation models to your custom datasets and use cases.
Fine-tuning
_____ happens when the whole fine-tuning process modifies the weights of the original LLM. This can improve the performance of the single task fine-tuning, but it can degrade performance on other tasks.
Catastrophic forgetting
Load the model parameters and add memory for the optimizer, gradients, forward activations, and temporal memory.
How to train and tune a foundation model
_____ is a process and set of techniques that freeze or preserve the parameters and weights of the original LLM and fine-tune or train a small number of task-specific adaptor layers and parameters. It reduces the compute and memory that’s needed because it’s fine-tuning a small set of model parameters.
Parameter-efficient fine-tuning, PEFT
_____ is a popular PEFT technique that also preserves or freezes the original weights of the foundation model and creates new trainable low-rank matrices into each layer of a transformer architecture.
Low-rank adaptation or LoRA
PEFT and LoRA modify the _____ of your model, but not the representations.
weights
_____ encode semantic information similar to embeddings.
Representations
_____ is a fine-tuning process that freezes the base model and learns task-specific interventions on hidden representations.
Representation fine-tuning, ReFT
The _____ says that concepts are encoded in linear subspaces of representation in a neural network.
linear representation hypothesis
_____ is an extension of fine-tuning a single task. This requires a lot of data. For this process, the training dataset has examples of inputs and outputs for multiple tasks.
Multitask fine-tuning
_____ gives you the ability to use the pre-trained foundation models and adapt them to specific tasks by using limited domain-specific data. You can use this to help your model work with domain-specific language such as industry jargon, technical terms, or other specialized data.
Domain adaptation fine-tuning
_____ provides the capability to fine-tune a large language model, particularly a text generation model, on a domain- specific dataset so you can improve the performance of your model and help it better understand human-like prompts to generate human-like responses.
Amazon SageMaker JumpStart
During _____, you select prompts from your training dataset and pass them to the LLM to generate completions. Then, compare the distribution of completions, and the training label, to calculate a loss between the two token distributions, which you can use to update your model’s weights so the model’s performance on the task improves.
fine-tuning
You can define separate evaluation steps to measure your LLM’s performance, by using the _____. You will get the validation accuracy, and after you’ve completed your fine-tuning, you can perform a final performance evaluation by using this, and the last result will give you the test accuracy.
holdout validation dataset