LLM Flashcards
What are 4 common methods of building LLM based applications?
- Training models from scratch
- Fine Tuning Open Source Models
- Using Hosted APIs
- In-Context Learning
https://a16z.com/2023/06/20/emerging-architectures-for-llm-applications/
What is the core idea of In-context learning?
To use LLMs “off the shelf” (i.e. without any fine tuning), then control their behavior through clever prompting and conditioning on private “contextual” data.
https://a16z.com/2023/06/20/emerging-architectures-for-llm-applications/
What is a context window?
The amount of text that can be entered/processed in one prompt.
https://a16z.com/2023/06/20/emerging-architectures-for-llm-applications/
What is the approximate context window of current GPT models?
50 pages of text.
https://a16z.com/2023/06/20/emerging-architectures-for-llm-applications/
How does in-context learning solve the context limit problem?
Instead of sending the complete volume of a dataset to be analyzed, we only send a subset and then determine which elements of of the complete dataset are most relevant by using….. LLMs!
https://a16z.com/2023/06/20/emerging-architectures-for-llm-applications/
What are the 3 stages of the In-context learning workflow?
- Data preprocessing / embedding
- Prompt construction / retrieval
- Prompt execution / inference
https://a16z.com/2023/06/20/emerging-architectures-for-llm-applications/
What is involved in the Data preprocessing / embedding stage of the in-context workflow?
Storing private data (e.g. legal docs) to be retrieved later. Typically these datasets (documents) are broken into chunks, passed through an embedding model and then stored in a specialized database called a vector database.
https://a16z.com/2023/06/20/emerging-architectures-for-llm-applications/
What is involved in the Prompt Construction / retrieval stage of the in-context workflow?
When a user submits a query (e.g. a legal question). The application constructs a series of prompts to submit to the language model.
Each of these complied prompts consists of:
* Prompt template (often hard coded)
* Examples of valid outputs (few-shot examples)
* Additional required information acquired from external APIs
* Relevant documents retrived from a vector databases (loaded during pre-processing/embedding)
https://a16z.com/2023/06/20/emerging-architectures-for-llm-applications/
What is involved in the Prompt Execution / Inference stage of the in-context workflow?
After prompts are compiled in the retrieval stage, they are sent to a pre-trained LLM for inference.
Types of LLMs used here can include:
* Proprietary model APIs
* Open Source
* Self Trained
In some cases, logging, caching and validation are implemented in this stage.
https://a16z.com/2023/06/20/emerging-architectures-for-llm-applications/
What is the alternative to in-context learning and why would that not be a palatble option?
Alternative: Training the LLM itself, which likely requires a team of ML-engineers, which is not the case for in-context learning.
Other Benefits:
* No Need to host your own infrastructure
* No need to buy an expensive instance from OpenAI
https://a16z.com/2023/06/20/emerging-architectures-for-llm-applications/
How often does a specific piece of information need to occur in a training set before an LLM will remember it through fine-tuning?
At least ~10 times
https://a16z.com/2023/06/20/emerging-architectures-for-llm-applications/
What is the problem with just increasing the context window of the underlying model?
Currently, the cost and time of inference scale quadratically witht the size of the prompt. Even linear scaling would be cost-prohibitive at this point (A 10k page GPT query could cost hundreds of dollars).
https://a16z.com/2023/06/20/emerging-architectures-for-llm-applications/
What are 4 types of files/documents that would be used for contextual data?
- Text
- CSV
- SQL Extracts
https://a16z.com/2023/06/20/emerging-architectures-for-llm-applications/
What are 2 standard approaches used for data loading and trasformation in the pre-processing/embedding stage?
What are two examples of each?
Traditional ETL Tools
* Databricks
* Airflow
Document Loaders built into LLM orchestration frameworks:
* LangChain (powered by Unstructured)
* LlamaIndex (powered by Llama Hub)
https://a16z.com/2023/06/20/emerging-architectures-for-llm-applications/
What is the most commonly used proprietary API(*i.e. an embedding model) used for *embedding?…and what is an emergency choice amonst enterprises?
- OpenAI
- Cohere (focuses specifically on embedding)
https://a16z.com/2023/06/20/emerging-architectures-for-llm-applications/
What is a popular open-source library(i.e embedding model) for embedding?
The Sentence Transformers library from Hugging Face
https://a16z.com/2023/06/20/emerging-architectures-for-llm-applications/
What is a vector database responsible for?
Efficiently storing, comparing and retrieving up to billions of embeddings (i.e. vectors).
Vector databases offer optimzed storage and query capabilities for embeddings.
https://a16z.com/2023/06/20/emerging-architectures-for-llm-applications/
What is a vector embedding?
noun
A type of data representation that carries within it semantic information that’s critical for AI to gain understanding and maintain a long-term memory they can draw upon when executing complex tasks.
https://www.pinecone.io/learn/vector-database/
What is contained within an embedding and why are they important?
They contain a large number of attributes (or features) each representing difference dimensions of the data that are essential for understanding patterns, relationships and underlying structures.
https://www.pinecone.io/learn/vector-database/
What does an embedding model do?
It creates vector embeddings for the context we want to index.
The vector embeddings take the form of arrays, but still maintain relationships between vecotrs that make sense (in the real world).
https://www.pinecone.io/learn/vector-database/
What are 3 common similarity measures used by Pinecone?
- Cosine Similarity
- Euclidean Distance
- Dot Product
https://www.pinecone.io/learn/vector-database/
What is involved in the cosine similarity similarity measure?
Measures the cosine of the angles between two vectors, with result raning from -1 to 1.
1= identical
0 = orthogonal
-1 = diametrically opposed
https://www.pinecone.io/learn/vector-database/