LLM Terms Flashcards
Large Language Model (LLM)
A type of neural network model designed to understand and generate human-like text by learning from vast amounts of language data.
Transformer Architecture
A deep learning model architecture that uses self-attention mechanisms to process input data in parallel, making it the foundation of modern LLMs like GPT and BERT.
Self-Attention
A mechanism that enables the model to focus on different parts of the input sequence, assigning importance scores to each token relative to others in the context.
Context Length
The maximum number of tokens an LLM can process in a single input. Longer context lengths allow the model to capture more extensive dependencies in the input text.
Tokenization
The process of converting text into smaller units (tokens) that the LLM can process, such as words, subwords, or characters.
Subword Tokenization
A method of breaking words into smaller units (subwords), enabling the model to handle out-of-vocabulary words and morphological variations.
Byte-Pair Encoding (BPE)
A tokenization technique that iteratively merges the most frequent pairs of characters or character sequences to create subword tokens.
Prompt Engineering
The practice of crafting input prompts to guide the LLM in generating specific, desired outputs, essential for steering model behavior in applications.
Few-Shot Learning
A model’s ability to learn and generalize from only a few examples provided in the prompt, enabling the generation of contextually relevant responses.
Zero-Shot Learning
The capability of an LLM to perform a task without explicit examples in the prompt, relying instead on its pre-trained knowledge.
Fine-Tuning
The process of training a pre-trained LLM on a smaller, task-specific dataset to adapt the model for specific applications.
Retrieval-Augmented Generation (RAG)
A technique combining information retrieval and LLMs, where relevant documents are retrieved and provided as context to the model during inference.
Knowledge Base
A structured repository of information used to provide factual data for an LLM, enhancing its ability to generate accurate and contextually appropriate responses.
Semantic Search
A search method that leverages the meaning of words and their relationships to find relevant documents or information, often used in conjunction with LLMs.
Embeddings
Dense vector representations of words, sentences, or documents that capture their semantic meanings and are used for tasks like semantic search in LLM applications.
Memory Management
Techniques used in LLMs to handle the model’s context length, including how to store and retrieve information across long conversations.
Temperature
A parameter that controls the randomness of an LLM’s output. Higher values generate more diverse outputs, while lower values make responses more deterministic.
Top-k Sampling
A decoding method that restricts the model’s predictions to the top k most likely tokens at each step, enhancing the quality of generated text.
Top-p (Nucleus) Sampling
A sampling method that dynamically selects tokens based on their cumulative probability distribution, allowing for more flexible and coherent text generation.
Beam Search
A decoding strategy that explores multiple possible sequences during text generation, keeping the top n most probable sequences at each step.