Optimizing Foundation Models Flashcards
Embedding is the process by which
text, images, and audio are given numerical representation in a vector space.
Embedding is usually performed by
a machine learning (ML) model.
Enterprise datasets, such as documents, images and audio, are passed to ML models as tokens and are vectorized. These vectors in an n-dimensional space, along with the metadata about them, are stored in purpose-built vector databases for faster retrieval.
Two words that relate to each other will have similar
embeddings.
Here is an example of two words: sea and ocean. They are randomly initialized and their early embeddings are diverse. As the training progresses, their embeddings become more
similar because they often appear close to each other and in similar context.
The core function of vector databases is to
compactly store billions of high-dimensional vectors representing words and entities. Vector databases provide ultra-fast similarity searches across these billions of vectors in real time.
The most common algorithms used to perform the similarity search are
k-nearest neighbors (k-NN)
cosine similarity.
Agents- Intermediary operations:
Agents can act as intermediaries, facilitating communication between the generative AI model and various backend systems. The generative AI model handles language understanding and response generation. The various backend systems include items such as databases, CRM platforms, or service management tools.
Agents - Action launch
Agents can be used to run a wide variety of tasks. These tasks might include adjusting service settings, processing transactions, retrieving documents, and more. These actions are based on the users’ specific needs understood by the generative AI model.
Agents - Feedback integration
Agents can also contribute to the AI system’s learning process by collecting data on the outcomes of their actions. This feedback helps refine the AI model, enhancing its accuracy and effectiveness in future interactions.
Human evaluation involves real users interacting with the AI model to provide feedback based on their experience. This method is particularly valuable for assessing qualitative aspects of the model, such as the following:
Human evaluation is often used for iterative improvements and tuning the model to better meet user expectations.
User experience: How intuitive and satisfying is the interaction with the model from the user’s perspective?
Contextual approriateness: Does the model respond in a way that is contextually relevant and sensitive to the nuances of human communication?
Creativity and flexibility: How well does the model handle unexpected queries or complex scenarios that require a nuanced understanding?
Benchmark datasets, on the other hand, provide a quantitative way to evaluate generative AI models. These datasets consist of predefined datasets and associated metrics that offer a consistent, objective means to measure model performances, like
Accuracy
Speed and Efficiency
Scalability
Creating a benchmark dataset is a
manual process that is necessary to properly evaluate LLM performances using RAG systems.
In practice, a combination of
both human evaluation and benchmark datasets is often used to provide a comprehensive overview of a model’s performance.
LLM as a judge
evaluation of LLM performance using a benchmark dataset can be automated using this