Jay Alammar Flashcards
How do we transform words into things that the Blocks can work with?
Words are tokenized (integer ID)
Each token is converted into its embedding (a large vector corresponding to its token integer ID)
What are the two components of an Encoder/Decoder block?
Self-attention
Feed Forward Neural Network
What’s the name for the scores given to each token in the vocabulary?
Logits
(these are then soft-maxed in the final layer, where they sum up to 1)
We can pick the highest one, or sample to give a wider selection from which to choose from
Picking the highest probabilty for the next word is known as …
Greedy sampling
In Self-Attention what are three matrices that are used to manipulate the inputs?
Query
Key
Value
What’s the difference in architecture between GPT and Bert?
GPT - decoders
Bert - encoders
What is auto-regression (in the context of LLMs)?
After each new token is generated, it gets added to the input prompt (ready for the next token to be generated)
What’s the difference between Encoder and Decoder stacks?
They both have Self-Attention and Feed Forward Neural Network
In addition, the Decoder has **Encoder-Decoder Self Attention **
What is length of embedding vector in GPT-3?
Largest version is 12,288 (dimension of vector)
When a token is sent to a Transformer (in GPT) how is it processed? two steps
- Embedding vector is looked up
- To this is added the position encoding vector
this is then passed onto the first Decoder block
Describe what Self-Attention does? give an example
Its purpose is to create a vector based on the current token, but modified for context
E.g. “the chicken crossed the road and then painted it”
each word will get a self-attention score, so “the” and “road” will score highly, and the current word “it”. Each of these vectors is then scored and the sum produces a vector which is then passed onto the FFNN (Feed Forward Neural Network)
What is the list of tokens and their probabilities, output from the model, called?
Logits
How can we consider only the top 5 probabilty words?
Set top_k to 5