- In order to predict the next word accurately, models need to be able to see the whole sentence or whole document - The transformer architecture unlocked this ability - Able to pay attention to the meaning of words it's processing and attention is all you need

Generating a prediction For LLMs, that would be using the model to generate text

- Recurrent neural networks - Used by previous generations of language models

Generative AI Flashcards by Katie D

RLHF

Reinforcement learning from human feedback

How well did you know this?

Not at all

Perfectly

PEFT

Parameter efficient fine tuning

How well did you know this?

Not at all

Perfectly

Self-Attention

In order to predict the next word accurately, models need to be able to see the whole sentence or whole document
The transformer architecture unlocked this ability
Able to pay attention to the meaning of words it’s processing and attention is all you need

How well did you know this?

Not at all

Perfectly

Multi-headed Self-Attention

-Multiple sets of self-attention weights or heads are learned in parallel independently of each other
- The outputs of the multi-headed attention layers are fed through a feed-forward network to the output of the encoder

How well did you know this?

Not at all

Perfectly

How many parameters does a model with general knowledge about the world have?

Hundreds of billions

How well did you know this?

Not at all

Perfectly

How many parameters do you need for a single task like summarizing dialog or acting as a customer service agent for a single company?

Often just 500-1,000 examples can result in good performance

How well did you know this?

Not at all

Perfectly

Context window

Space available for the prompt

How well did you know this?

Not at all

Perfectly

Inference

Generating a prediction
For LLMs, that would be using the model to generate text

How well did you know this?

Not at all

Perfectly

Completion

Output of the model

How well did you know this?

Not at all

Perfectly

Entity recognition

Word classification to identify all the people and places

How well did you know this?

Not at all

Perfectly

Foundational models by decreasing number of parameters

Bloom -> GPT -> Flan-T5 -> LLaMa -> PaLM -> BERT

How well did you know this?

Not at all

Perfectly

RNN

Recurrent neural networks
Used by previous generations of language models

How well did you know this?

Not at all

Perfectly

What’s so important about the transformer architecture?

The ability to learn the relevance and context of all the words in a sentence
It can be scaled efficiently to use multi-core GPUs
It can parallel process input data making use of much larger training datasets
Dramatically improved the performance of natural language tasks over earlier generation of RNNs

How well did you know this?

Not at all

Perfectly

Instruction Fine Tuning

Adapting pre-trained models to specific tasks and datasets

How well did you know this?

Not at all

Perfectly

RAG

Retrieval Augmented Generation

Knowledge base data is used for the retrieval portion of the solution

How well did you know this?

Not at all

Perfectly

What’s significant about the transformer architecture

Study These Flashcards

Can be scaled efficiently to use multi-core GPUs
Parallel process input data making use of much larger training datasets
Dramatically improved the

Origin of the Transformer Architecture

Study These Flashcards

Attention Is All You Need

What are attention weights?

Study These Flashcards

The model learns the relevance of each word to all other words during training

What are the two distinct parts of the transformer architecture

Study These Flashcards

Encoder and decoder

Tokenize

Study These Flashcards

Convert words to numbers with each representing a position in a dictionary of all possible words
There are multiple tokenization methods. Token IDS can match two complete words or parts of words
The same tokenizer used to train the model must be used to generate the text

Embedding Layer

Study These Flashcards

Trainable vector embedding space
High-D space where each token is represented as a vector and occupies a unique location within that space
Each token id in the vocabulary is matched to a multi-dimensional vector
During model training, the vectors learn to encode the meaning and context of individual tokens in the input sequence

What was the vector size in Attention Is All You Need

Study These Flashcards

512 dimensions

Positional encoding

Study These Flashcards

Position of word in sentence/document

What is passed into the encoder/decoder

Study These Flashcards

Token vectors and positional encoding
Processed in parallel

What is passed to the self-attention layer

- The combined positional encodings and token vectors (single vector) - The model analyzes the relationships between the tokens in your input sequence

GPT

Generative Pre-trained Transformers

Limitations of RNNs

* Limited by the amount of compute and memory needed to perform well on generative AI tasks * Next word prediction only * No self-attention * Not possible to scale the models from next word prediction to self attention

Heads

Attention weights

Feed Forward Neural Network

* Information moves only in one direction, from the input layer through any hidden layers and finally to the output layer * There are no cycles or loops in the network * connections between the units do not form a cycle, unlike in RNN

ReAct Prompting

- Reasoning + Acting - LLMs are used to generate both reasoning traces and task-specific actions in an interleaved manner - Reasoning models allow the model to induce, track, update action plans, and handle exceptions. That allows the LLM to interact with external tools to retrieve additional information

LangChain

- provides AI developers with tools to connect language models with external data sources - prompt templates

Generative AI Flashcards

(31 cards)