Book Flashcards
What does LLM stand for?
Large Language Model
What are some common applications of LLMs?
- Machine translation
- Generating novel texts
- Sentiment analysis
- Text summarization
- Content creation (e.g., writing fiction, articles, code)
- Powering chatbots and virtual assistants
- Knowledge retrieval
What are the three main stages of building and using LLMs?
- Implementing the LLM architecture and data preparation process
- Pretraining the LLM to create a foundation model
- Fine-tuning the foundation model to become a personal assistant or text classifier
What is the core architecture used in many modern LLMs?
The transformer architecture
What are the two submodules of the transformer architecture?
Encoder and decoder
What is the purpose of word embeddings in LLMs?
To represent words as continuous-valued vectors that can be processed by the model
What is tokenization?
The process of splitting text into individual units (tokens), which can be words or subword units
What is byte pair encoding (BPE)?
A tokenization scheme that iteratively merges frequent character pairs into subword units
How are training examples sampled for LLM training?
Using a sliding window approach to create input-output pairs
What is the purpose of the attention mechanism in LLMs?
To allow the model to weigh the importance of different parts of the input sequence when making predictions
What is the difference between causal attention and self-attention?
Causal attention only allows the model to attend to past and present tokens, while self-attention allows the model to attend to all tokens in the sequence
What is multi-head attention?
An extension of the attention mechanism that allows the model to attend to different parts of the input sequence in parallel
What are the main components of a transformer block in an LLM?
- Multi-head attention module
- Feedforward network
- Layer normalization
- Shortcut connections
What is pretraining in the context of LLMs?
Training the LLM on a large, unlabeled text dataset to develop a general understanding of language
What is fine-tuning in the context of LLMs?
Further training the pretrained LLM on a smaller, labeled dataset to adapt it to a specific task
What are the two most common types of fine-tuning for LLMs?
Instruction fine-tuning and classification fine-tuning
What is the purpose of layer normalization in LLMs?
To stabilize training and improve performance by normalizing the activations of each layer
What is the purpose of shortcut connections in LLMs?
To improve training of deep networks by allowing gradients to flow more easily through the layers
What is the GELU activation function?
A smooth, non-monotonic activation function that has been found to be effective in LLMs
What is the purpose of the softmax function in text generation?
To convert model outputs (logits) into a probability distribution over the vocabulary
What is temperature scaling in text generation?
A technique used to control the randomness of the generated text by scaling the logits before applying the softmax function
What is top-k sampling in text generation?
A technique used to reduce the number of tokens considered during text generation by only selecting from the top-k most likely tokens