Generative AI Flashcards
RLHF
Reinforcement learning from human feedback
PEFT
Parameter efficient fine tuning
Self-Attention
- In order to predict the next word accurately, models need to be able to see the whole sentence or whole document
- The transformer architecture unlocked this ability
- Able to pay attention to the meaning of words it’s processing and attention is all you need
Multi-headed Self-Attention
-Multiple sets of self-attention weights or heads are learned in parallel independently of each other
- The outputs of the multi-headed attention layers are fed through a feed-forward network to the output of the encoder
How many parameters does a model with general knowledge about the world have?
Hundreds of billions
How many parameters do you need for a single task like summarizing dialog or acting as a customer service agent for a single company?
Often just 500-1,000 examples can result in good performance
Context window
Space available for the prompt
Inference
- Generating a prediction
- For LLMs, that would be using the model to generate text
Completion
Output of the model
Entity recognition
Word classification to identify all the people and places
Foundational models by decreasing number of parameters
Bloom -> GPT -> Flan-T5 -> LLaMa -> PaLM -> BERT
RNN
- Recurrent neural networks
- Used by previous generations of language models
What’s so important about the transformer architecture?
- The ability to learn the relevance and context of all the words in a sentence
- It can be scaled efficiently to use multi-core GPUs
- It can parallel process input data making use of much larger training datasets
- Dramatically improved the performance of natural language tasks over earlier generation of RNNs
Instruction Fine Tuning
Adapting pre-trained models to specific tasks and datasets
RAG
Retrieval Augmented Generation
Knowledge base data is used for the retrieval portion of the solution
What’s significant about the transformer architecture
- Can be scaled efficiently to use multi-core GPUs
- Parallel process input data making use of much larger training datasets
- Dramatically improved the
Origin of the Transformer Architecture
Attention Is All You Need
What are attention weights?
The model learns the relevance of each word to all other words during training
What are the two distinct parts of the transformer architecture
Encoder and decoder
Tokenize
- Convert words to numbers with each representing a position in a dictionary of all possible words
- There are multiple tokenization methods. Token IDS can match two complete words or parts of words
- The same tokenizer used to train the model must be used to generate the text
Embedding Layer
- Trainable vector embedding space
- High-D space where each token is represented as a vector and occupies a unique location within that space
- Each token id in the vocabulary is matched to a multi-dimensional vector
- During model training, the vectors learn to encode the meaning and context of individual tokens in the input sequence
What was the vector size in Attention Is All You Need
512 dimensions
Positional encoding
Position of word in sentence/document
What is passed into the encoder/decoder
- Token vectors and positional encoding
- Processed in parallel
What is passed to the self-attention layer
- The combined positional encodings and token vectors (single vector)
- The model analyzes the relationships between the tokens in your input sequence
GPT
Generative Pre-trained Transformers
Limitations of RNNs
- Limited by the amount of compute and memory needed to perform well on generative AI tasks
- Next word prediction only
- No self-attention
- Not possible to scale the models from next word prediction to self attention
Heads
Attention weights
Feed Forward Neural Network
- Information moves only in one direction, from the input layer through any hidden layers and finally to the output layer
- There are no cycles or loops in the network
- connections between the units do not form a cycle, unlike in RNN
ReAct Prompting
- Reasoning + Acting
- LLMs are used to generate both reasoning traces and task-specific actions in an interleaved manner
- Reasoning models allow the model to induce, track, update action plans, and handle exceptions. That allows the LLM to interact with external tools to retrieve additional information
LangChain
- provides AI developers with tools to connect language models with external data sources
- prompt templates