Generative AI Flashcards
RLHF
Reinforcement learning from human feedback
PEFT
Parameter efficient fine tuning
Self-Attention
- In order to predict the next word accurately, models need to be able to see the whole sentence or whole document
- The transformer architecture unlocked this ability
- Able to pay attention to the meaning of words it’s processing and attention is all you need
Multi-headed Self-Attention
-Multiple sets of self-attention weights or heads are learned in parallel independently of each other
- The outputs of the multi-headed attention layers are fed through a feed-forward network to the output of the encoder
How many parameters does a model with general knowledge about the world have?
Hundreds of billions
How many parameters do you need for a single task like summarizing dialog or acting as a customer service agent for a single company?
Often just 500-1,000 examples can result in good performance
Context window
Space available for the prompt
Inference
- Generating a prediction
- For LLMs, that would be using the model to generate text
Completion
Output of the model
Entity recognition
Word classification to identify all the people and places
Foundational models by decreasing number of parameters
Bloom -> GPT -> Flan-T5 -> LLaMa -> PaLM -> BERT
RNN
- Recurrent neural networks
- Used by previous generations of language models
What’s so important about the transformer architecture?
- The ability to learn the relevance and context of all the words in a sentence
- It can be scaled efficiently to use multi-core GPUs
- It can parallel process input data making use of much larger training datasets
- Dramatically improved the performance of natural language tasks over earlier generation of RNNs
Instruction Fine Tuning
Adapting pre-trained models to specific tasks and datasets
RAG
Retrieval Augmented Generation
Knowledge base data is used for the retrieval portion of the solution