Terms Flashcards
Natural Language Processing (NLP)
Ability for machines to understand human language thanks to machine learning. OpenAI’s ChatGPT is a basic example: it can understand your text queries and generate text in response. Another powerful tool that can do NLP is OpenAI’s Whisper speech recognition technology, which the company reportedly used to transcribe audio from more than 1 million hours of YouTube videos to help train GPT-4.
Inference
When a generative AI application actually generates something, like ChatGPT responding to a request about how to make chocolate chip cookies by sharing a recipe. This is the task your computer does when you execute local AI commands.
Tokens
Chunks of text, such as words, parts of words, or even individual characters. For example, LLMs will break text into tokens so that they can analyze them, determine how tokens relate to each other, and generate responses. The more tokens a model can process at once (a quantity known as its “context window”), the more sophisticated the results can be.
Neural Network
Computer architecture that helps computers process data using nodes, which can be sort of compared to a human’s brain’s neurons. Neural networks are critical to popular generative AI systems because they can learn to understand complex patterns without explicit programming — for example, training on medical data to be able to make diagnoses.
Transformer
Type of neural network architecture that uses an “attention” mechanism to process how parts of a sequence relate to each other.
Consider this input sequence: “What is the color of the sky?” The transformer model uses an internal mathematical representation that identifies the relevancy and relationship between the words color, sky, and blue. It uses that knowledge to generate the output: “The sky is blue.”
Not only are transformers very powerful, but they can also be trained faster than other types of neural networks. Since former Google employees published the first paper on transformers in 2017, they’ve become a huge reason why we’re talking about generative AI technologies so much right now. (The T in ChatGPT stands for transformer.)
Retrieval-Augmented Generation (RAG)
“RAG lets the model find and add context from beyond what it was trained on, which can improve the accuracy of what it ultimately generates.
Let’s say you ask an AI chatbot something that, based on its training, it doesn’t actually know the answer to. Without RAG, the chatbot might just hallucinate a wrong answer. With RAG, however, it can check external sources — like, say, other sites on the internet — and use that data to help inform its answer.
Emergent Behaviors
The ability to
perform tasks they have not been directly trained for.
Fine Tuning
Adapting a model to solve
specific tasks where performance out of the box is not at the level desired. Requires significantly less data and computational resources than
training an LLM from scratch.
Prompt Engineering
The art and science of composing the
prompt and the parameters of an LLM to get the desired response.
Recurrent Neural Networks
RNNs process input and output sequences sequentially. They generate a sequence of hidden states based on the previous hidden state and the current input. The sequential nature of RNNs makes them compute-intensive and hard to parallelize during training (though recent work in state space modeling is attempting to overcome
these challenges).
Transformer
Developed at Google in 2017. Transformers, on the other hand, are a type of neural network that can process sequences of tokens in parallel thanks to the self-attention mechanism.This means that transformers can better model long-term contexts and are easier to parallelize than RNNs. This makes them significantly faster to train, and more powerful compared to RNNs for handling long-term dependencies in long sequence tasks. However, the cost of self-attention in the original transformers is quadratic in the context length which limits the size of the context, while
RNNs have a theoretically infinite context length. Transformers have become the most
popular approach for sequence modeling and transduction problems in recent years.
Input vs. Output Embeddings
Input embeddings are used to represent the input tokens to the model. An input embedding is a high-dimensional vector that represents the meaning of each token in the sentence. This embedding is then fed into the transformer for processing.Output embeddings are used to represent the output tokens that the model predicts. Input embeddings are used to represent the input tokens to the model. Output embeddings are used to represent the output tokens that the model predicts.
Input, Output, and Hidden Layers
The output layer (e.g., Softmax) is the final layer that produces the output of the network. The
hidden layers (e.g., Multi-Head Attention) are between the input and output layers and are
where the magic happens!
Encoder
Processes the input sequence into a continuous representation that holds contextual information for each token. The input sequence is first normalized, tokenized, and converted into embeddings. Positional encodings are added to
these embeddings to retain sequence order information. Through self-attention mechanisms, each token in the sequence can dynamically attend to any other token, thus understanding the contextual relationships within the sequence. The output from the encoder is a series of
embedding vectors.
Decoder
The decoder is tasked with generating an output sequence based on the context provided. It operates in a token-by-token fashion, beginning with a start-
of-sequence token. The decoder layers employ two types of attention mechanisms: masked self-attention and encoder-decoder cross-attention. Masked self-attention ensures that each position can only attend to earlier positions in the output sequence, preserving the auto-regressive property. This is crucial for preventing the decoder from having access to future tokens in the output sequence. The encoder-decoder cross-attention mechanism
allows the decoder to focus on relevant parts of the input sequence, utilizing the contextual embeddings generated by the encoder. This iterative process continues until the decoder predicts an end-of-sequence token, thereby completing the output sequence generation.