Generative AI Flashcards

Question 1

Q

RLHF

Answer

A

Reinforcement learning from human feedback

Question 2

Q

PEFT

Answer

A

Parameter efficient fine tuning

Question 3

Q

Self-Attention

Answer

A

In order to predict the next word accurately, models need to be able to see the whole sentence or whole document
The transformer architecture unlocked this ability
Able to pay attention to the meaning of words it’s processing and attention is all you need

Question 4

Q

Multi-headed Self-Attention

Answer

A

-Multiple sets of self-attention weights or heads are learned in parallel independently of each other
- The outputs of the multi-headed attention layers are fed through a feed-forward network to the output of the encoder

Question 5

Q

How many parameters does a model with general knowledge about the world have?

Answer

A

Hundreds of billions

Question 6

Q

How many parameters do you need for a single task like summarizing dialog or acting as a customer service agent for a single company?

Answer

A

Often just 500-1,000 examples can result in good performance

Question 7

Q

Context window

Answer

A

Space available for the prompt

Question 8

Q

Inference

Answer

A

Generating a prediction
For LLMs, that would be using the model to generate text

Question 9

Q

Completion

Answer

A

Output of the model

Question 10

Q

Entity recognition

Answer

A

Word classification to identify all the people and places

Question 11

Q

Foundational models by decreasing number of parameters

Answer

A

Bloom -> GPT -> Flan-T5 -> LLaMa -> PaLM -> BERT

Question 12

Q

RNN

Answer

A

Recurrent neural networks
Used by previous generations of language models

Question 13

Q

What’s so important about the transformer architecture?

Answer

A

The ability to learn the relevance and context of all the words in a sentence
It can be scaled efficiently to use multi-core GPUs
It can parallel process input data making use of much larger training datasets
Dramatically improved the performance of natural language tasks over earlier generation of RNNs

Question 14

Q

Instruction Fine Tuning

Answer

A

Adapting pre-trained models to specific tasks and datasets

Question 15

Q

RAG

Answer

A

Retrieval Augmented Generation

Knowledge base data is used for the retrieval portion of the solution

Question 16

Q

What’s significant about the transformer architecture

Answer

A

Can be scaled efficiently to use multi-core GPUs
Parallel process input data making use of much larger training datasets
Dramatically improved the

Question 17

Q

Origin of the Transformer Architecture

Answer

A

Attention Is All You Need

Question 18

Q

What are attention weights?

Answer

A

The model learns the relevance of each word to all other words during training

Question 19

Q

What are the two distinct parts of the transformer architecture

Answer

A

Encoder and decoder

Question 20

Q

Tokenize

Answer

A

Convert words to numbers with each representing a position in a dictionary of all possible words
There are multiple tokenization methods. Token IDS can match two complete words or parts of words
The same tokenizer used to train the model must be used to generate the text

Question 21

Q

Embedding Layer

Answer

A

Trainable vector embedding space
High-D space where each token is represented as a vector and occupies a unique location within that space
Each token id in the vocabulary is matched to a multi-dimensional vector
During model training, the vectors learn to encode the meaning and context of individual tokens in the input sequence

Question 22

Q

What was the vector size in Attention Is All You Need

Answer

A

512 dimensions

Question 23

Q

Positional encoding

Answer

A

Position of word in sentence/document

Question 24

Q

What is passed into the encoder/decoder

Answer

A

Token vectors and positional encoding
Processed in parallel

Question 25

Q

What is passed to the self-attention layer

Answer

A

The combined positional encodings and token vectors (single vector)
The model analyzes the relationships between the tokens in your input sequence

Question 26

Q

GPT

Answer

A

Generative Pre-trained Transformers

Question 27

Q

Limitations of RNNs

Answer

A

Limited by the amount of compute and memory needed to perform well on generative AI tasks
Next word prediction only
No self-attention
Not possible to scale the models from next word prediction to self attention

Question 28

Q

Heads

Answer

A

Attention weights

Question 29

Q

Feed Forward Neural Network

Answer

A

Information moves only in one direction, from the input layer through any hidden layers and finally to the output layer
There are no cycles or loops in the network
connections between the units do not form a cycle, unlike in RNN

Question 30

Q

ReAct Prompting

Answer

A

Reasoning + Acting
LLMs are used to generate both reasoning traces and task-specific actions in an interleaved manner
Reasoning models allow the model to induce, track, update action plans, and handle exceptions. That allows the LLM to interact with external tools to retrieve additional information

Question 31

Q

LangChain

Answer

A

provides AI developers with tools to connect language models with external data sources
prompt templates

Brainscape's Knowledge GenomeTM

Generative AI Flashcards

Brainscape's Knowledge Genome^TM