Retrieval Augmented Generation Flashcards
GPT: Definiere Temperature
More Temperature -> More randomness
0 Temperature -> Deterministic
Die Idee hinter RAG
4
- More precise with respect to private data
- More up-to-date information
- Less Hallucination
- Cheaper than FineTuning
GPT: Definiere Top_P
Considers top_p tokes for choice of next token prediction
3 Bausteine der RAG-Architektur
- Indexing
- Retrieval
- Generation
Challenges Building RAG Systeme
5
- Prompt Design (Experimentation)
- Grounding & Accuracy (Evaluation)
- Privacy, Security & Compliance (Data may be sensitive)
- Performance (Quality vs Quantity, Inference Time)
- Integration & Adoption (UX needs feedback)
Definiere Chunking/Splitting, Nenne Probleme, Nenne Techniken
Splitte ein Document für bessere Indizierung der einzelnen Segmente
-> Größere Texte sind schlechter zu vergleichen als kleine Texte
Probleme:
Welches Splitting?
Welche Chunking-Size, Separator oder Overlap?
TokenSplitting
FixedSplitting
DocumentSpecificSplitting (HTML, JSON, Markup)
Chunking - Query Extension (Extend Chunk by rest of document)
Problem und Approach for finding Embedding Model
Problems:
- Different goals on “semantic encoding”
- unclear which is best
- embedding is time consuming
Approach:
- Benchmarks (HF MTEB)
- Evaluations
Problem und Approach for finding VectorDB
Problems:
- Difference in Functionality
- Difference in Storage Efficiency
- Which one is best suited?
Approach:
- Pre-Select Choice of Functionality & Ease to Implement
- Benchmark them
- Consider Scaling and Metadata Storage
Was sind Schlüsselfähigkeiten von VectorDB’s?
4
Vector Indexing: Pre-processing of vectors to speed up distance computations
Inverted Indexing: Fast full text and keyword search on raw text data (by mapping contents to their location in the databse)
Vector Quantization: Compress original vector to lower dimension
Seach Techniques: Dense, Sparse, Hybrid
Nenne die zwei Typen der Quantifizierung und beschreibe sie
Scalar Quantization
- Float64 to Float32 or f16 or int8
Product Quantization
- Define M subvectors
- Perform on Clustering on each subvector
- assign NN’s centroid
- replace centroid values with an ID
- return only ids of centroids of subvectors
Beschreibe Aproximate Nearest Neighbor (ANN)
KNN does not scale
ANN:
- pre calc distances between vectors to organize and store similar vectors in clusters
- search only in a cluster
-> Hierarchical Navigable Small World (HNSW)
Was ist Document Composition, nenne das zugrundeliegende Problem und die Lösungen innerhalb Document Composition
8
Problem: Zu viel Dokumtente resultieren in irrelevante Retrievals
Lösung:
- So viel Metadaten wie möglich
- Entferne Duplikationen oder irrelevante Texte
- Standardisiere Text (zB Sprache)
- Unterschiedliche Indices für versch. Themen
- Reranking
- Dynamische Thresholding
- Text Summarization
- Diversity Ranker
Was ist sind die 3 Probleme bei der Positionierung von Dokumenten in der Query?
Problem 1:
Positionierung Matters: der mittlere Part einer Query ist weniger relevant für das LLM
Adding random Documents improved Accuracy by 36%
Problem 2:
Retrieved nodes sind sehr ähnlich. Redundant und wenig relevant
Problem 3:
Top_k Cutoff schneidet relevante Dokumente. Es gibt keinen fixen guten Wert dafür
Was ist ein Diversity Ranker?
Versuche diverse Statements für bessere Informationscoverage zu erhalten
- Similarity Scores der retrieved Docs berechnen
- Diversity Score: Summe der paarweisen Similarities zw. Dokumenten und allen anderen Dokumenten
Take only highest Diversity Scores
Nenne und beschreibe 7 Reranker Strategien
LM-based agents: Use LLMs to score the relevance of the documents according to the user query
Ensemble models: Use multiple language models or algorithms to combine their pros.
Contextual reranking: Include contextual information, such as preferences and interaction history for reranking
Query expansion: Modify or extent the user query to better capture its intent (e.g., using synonyms paraphrases, etc.)
Feature-based reranking: Use features, such as term frequency, document length, and entity overlap to score the docs
Learning to rerank: Train a model to predict the most relevant documents given the user query
User feedback: Use user feedback (e.g., likes and ratings) to consider their preferences during reranking