Terms Flashcards

Question 1

Q

Natural Language Processing (NLP)

Answer

A

Ability for machines to understand human language thanks to machine learning. OpenAI’s ChatGPT is a basic example: it can understand your text queries and generate text in response. Another powerful tool that can do NLP is OpenAI’s Whisper speech recognition technology, which the company reportedly used to transcribe audio from more than 1 million hours of YouTube videos to help train GPT-4.

Question 2

Q

Inference

Answer

A

When a generative AI application actually generates something, like ChatGPT responding to a request about how to make chocolate chip cookies by sharing a recipe. This is the task your computer does when you execute local AI commands.

Question 3

Q

Tokens

Answer

A

Chunks of text, such as words, parts of words, or even individual characters. For example, LLMs will break text into tokens so that they can analyze them, determine how tokens relate to each other, and generate responses. The more tokens a model can process at once (a quantity known as its “context window”), the more sophisticated the results can be.

Question 4

Q

Neural Network

Answer

A

Computer architecture that helps computers process data using nodes, which can be sort of compared to a human’s brain’s neurons. Neural networks are critical to popular generative AI systems because they can learn to understand complex patterns without explicit programming — for example, training on medical data to be able to make diagnoses.

Question 5

Q

Transformer

Answer

A

Type of neural network architecture that uses an “attention” mechanism to process how parts of a sequence relate to each other.

Consider this input sequence: “What is the color of the sky?” The transformer model uses an internal mathematical representation that identifies the relevancy and relationship between the words color, sky, and blue. It uses that knowledge to generate the output: “The sky is blue.”

Not only are transformers very powerful, but they can also be trained faster than other types of neural networks. Since former Google employees published the first paper on transformers in 2017, they’ve become a huge reason why we’re talking about generative AI technologies so much right now. (The T in ChatGPT stands for transformer.)

Question 6

Q

Retrieval-Augmented Generation (RAG)

Answer

A

“RAG lets the model find and add context from beyond what it was trained on, which can improve the accuracy of what it ultimately generates.

Let’s say you ask an AI chatbot something that, based on its training, it doesn’t actually know the answer to. Without RAG, the chatbot might just hallucinate a wrong answer. With RAG, however, it can check external sources — like, say, other sites on the internet — and use that data to help inform its answer.

Question 7

Q

Emergent Behaviors

Answer

A

The ability to
perform tasks they have not been directly trained for.

Question 8

Q

Fine Tuning

Answer

A

Adapting a model to solve
specific tasks where performance out of the box is not at the level desired. Requires significantly less data and computational resources than
training an LLM from scratch.

Question 9

Q

Prompt Engineering

Answer

A

The art and science of composing the
prompt and the parameters of an LLM to get the desired response.

Question 10

Q

Recurrent Neural Networks

Answer

A

RNNs process input and output sequences sequentially. They generate a sequence of hidden states based on the previous hidden state and the current input. The sequential nature of RNNs makes them compute-intensive and hard to parallelize during training (though recent work in state space modeling is attempting to overcome
these challenges).

Question 11

Q

Transformer

Answer

A

Developed at Google in 2017. Transformers, on the other hand, are a type of neural network that can process sequences of tokens in parallel thanks to the self-attention mechanism.This means that transformers can better model long-term contexts and are easier to parallelize than RNNs. This makes them significantly faster to train, and more powerful compared to RNNs for handling long-term dependencies in long sequence tasks. However, the cost of self-attention in the original transformers is quadratic in the context length which limits the size of the context, while
RNNs have a theoretically infinite context length. Transformers have become the most
popular approach for sequence modeling and transduction problems in recent years.

Question 12

Q

Input vs. Output Embeddings

Answer

A

Input embeddings are used to represent the input tokens to the model. An input embedding is a high-dimensional vector that represents the meaning of each token in the sentence. This embedding is then fed into the transformer for processing.Output embeddings are used to represent the output tokens that the model predicts. Input embeddings are used to represent the input tokens to the model. Output embeddings are used to represent the output tokens that the model predicts.

Question 13

Q

Input, Output, and Hidden Layers

Answer

A

The output layer (e.g., Softmax) is the final layer that produces the output of the network. The
hidden layers (e.g., Multi-Head Attention) are between the input and output layers and are
where the magic happens!

Question 14

Q

Encoder

Answer

A

Processes the input sequence into a continuous representation that holds contextual information for each token. The input sequence is first normalized, tokenized, and converted into embeddings. Positional encodings are added to
these embeddings to retain sequence order information. Through self-attention mechanisms, each token in the sequence can dynamically attend to any other token, thus understanding the contextual relationships within the sequence. The output from the encoder is a series of
embedding vectors.

Question 15

Q

Decoder

Answer

A

The decoder is tasked with generating an output sequence based on the context provided. It operates in a token-by-token fashion, beginning with a start-
of-sequence token. The decoder layers employ two types of attention mechanisms: masked self-attention and encoder-decoder cross-attention. Masked self-attention ensures that each position can only attend to earlier positions in the output sequence, preserving the auto-regressive property. This is crucial for preventing the decoder from having access to future tokens in the output sequence. The encoder-decoder cross-attention mechanism
allows the decoder to focus on relevant parts of the input sequence, utilizing the contextual embeddings generated by the encoder. This iterative process continues until the decoder predicts an end-of-sequence token, thereby completing the output sequence generation.

Question 16

Q

Training vs. Inference

Answer

A

Training typically refers to modifying the parameters of the model and involves loss functions and backpropagation. Inference is when the model is used only for the predicted output without updating the model weights. The model parameters are fixed during inference.

Question 17

Q

Supervised Fine-Tuning (SFT)

Answer

A

Improving an LLMs performance on a specific task or set of tasks by further training it on domain-specific, labeled data. Typically a much smaller dataset than pretraining - usually human-curated and of high quality. Each data point consists of an input (prompt) and demonstration (target response). Can be used to improve responses but also to be safer, less toxic, more conversational, and better at following instrucitons.

Question 18

Q

Multi-Turn Dialogue

Answer

A

Conversational data in the form of questions and responses.

Question 19

Q

Safety Tuning

Answer

A

Involves careful data selection, HITL validation, and incorporating safety guardrails. RLHF enable LLM to prioritize safe and ethical responses.

Question 20

Q

Reinforcement Learning from Human Feedback (RLHF)

Answer

A

Enables LLM to learn from human-preferred responses to make them more helpful, truthful, safer, etc. Can leverage negative outputs to penalize the LLM for unhelpful or unsafe responses.

Question 21

Q

Reinforcement Learning from AI Feedback

Answer

A

Uses AI feedback instead of human feedback to generate preference labels.

Question 22

Q

Parameter Efficient Fine-Tuning

Answer

A

A smaller set of weights (thousands of parameters) is used to perturb the pre-trained weights. Trains smaller sets of weights versus the entire model.

Question 23

Q

Cost Performance Tradeoff

Answer

A

Balance the expense of serving a model in terms of time, money, and energy. Often needs adjusting for particular use cases.

Question 24

Q

Distillation

Answer

A

Using a large model to train a smaller model. Large model generates more synthetic data to train smaller student. model. Synthetic data needs to be approached carefully since it must be of high quality.

Question 25

Q

Natural Language Inference

Answer

A

Determining whether a given textual hypothesis can be logically inferred from a textual premise.

Question 26

Q

Singularity

Answer

A

Often referred to as the point at which AI surpasses human intelligence and sparks unstoppable progress.

Question 27

Q

Prompt Chaining / Assistants

Answer

A

Many companies claiming to offer “agents” today are really just connecting prompts to each other or to tools like databases and web search. While these can be useful, it’s mostly marketing spin. More accurate terms would be “prompt chaining” or “assistants” - essentially pre-prompted chatbots with custom data access.

Question 28

Q