Optimizing Foundation Models Flashcards
What are vector embeddings? How do they play a part in RAG?
Embedding is the process by which text, images, and audio are given numerical representation in a vector space. Embedding is usually performed by a machine learning (ML) model.
In RAG, enterprise datasets, such as documents, images and audio, are passed to ML models as tokens and are vectorized. These vectors in an n-dimensional space, along with the metadata about them, are stored in purpose-built vector databases for faster retrieval.
What is a vector database?
- Compactly stores billions of high-dimensional vectors representing works and entities.
- They provide ultra-fast similarity searches across these billions of vectors in real time
- Uses K-NN for searches.
- AWS Services for VDBs - OpenSearch, pgvector extension in RDS, Kendra
What are Agents in an AI system?
Agents interact with the environment to perform intermediary operations
Coordinate multi-step functions
Example of an agent:
A chatbot may have an agent to modify/reset a customer’s password or phone plan
Another agent may send a CSAT survey to the customer when the conversation ends.
How do you evaluate a Gen AI system?
- Human Evaluation - evaluates user experience, contextual appropriateness, creativity, and flexibility.
- Benchmark datasets - a quantitative way to evaluate generative AI models (e.g. Accuracy, Speed, Scalability).
What is involved in creating a benchmark dataset?
SMEs have to do this manually.
They create intelligent questions.
Then they craft answers for them.
These datasets are then used to judge the performance of the model.
A “judge model” could be used to automate this process - i.e. a Judge Model takes the output of the model under evaluation and compares it to the benchmark dataset created by the SME and issue a grading score.
What are the benefits of fine tuning?
- Increases specificity
- Improves accuracy
- Reduces biases
- Boosts efficiency
What are the different types of FT?
- Instruction tuning - involves retraining the model on a new dataset that consists of prompts followed by the desired outputs.
- Reinforcement learning from human feedback (RLHF): uses a reward model based on human feedback.
- Adapting models for specific domains - e.g. legal or healthcare
- Continuous pre-training - the initial training phase is extended to keep the model current.
What are the key steps in data preparation for fine tuning?
- Data curation - more rigorous that for base FM. High impact data, highly relevant, labelled.
- Labeling - accurate labeling is essential
- Governance and compliance - specialized data to be handled with care
- Bias checking - ensure data is balanced and does not introduce any new bias.
- Feedback integration - integrating user feedback back into the training process.
What are a few standard metrics for evaluating LLMs?
- ROUGE - evaluate automatic summarization of texts, in addition to machine translation quality in NLP; e.g. measure overlaps, unigrams, bigrams and n-grams between machine and human generated reference text. (effectively ensures completeness of information)
- BLEU - evaluate the quality of text that has been machine-translated from one natural language to another (accurate inclusion of critical features).
- BERT - evaluate the quality of text-generation tasks; measures the cosine similarity between generated and reference texts. Effectively estimates the semantic appropriateness.