Models Flashcards

Question 1

Q

Tokens

Answer

A

Tokens are chunks of text or an image encoded as a vector (a number with position and direction). LLMs use tokens to parse the given input and generate an output. Each model has a maximum number of tokens it can process, and the cost of using each model is based on the number of tokens in the input prompt and the generated output.

Many words map to single tokens, though longer or more complex words often break down into multiple tokens. On average, tokens are roughly four characters long. Typically, text tokens tend to represent short and common words. Longer or uncommon words tend to be broken into smaller tokens.

Languages like English and Spanish generally are 1:1 words:tokens. However, languages like Korean, Arabic or German tend to have more tokens per word due to their grammar and morphology. The same request in English might be 50% to 100% more expensive depending on the language you’re using.

A general rule is that about 750 words equals 1000 tokens.

Question 2

Q

Foundation aka “Base” Model

Answer

A

Can be given additional capabilities to perform a variety of downstream user-facing tasks, such as chat, and enable more efficient creation of AI applications. Can apply knowledge gained from their broad training to new, specific tasks through fine-tuning or prompt engineering. Can be used as a starting point for developing more specialized downstream applications, saving time and resources compared to training models from scratch.

Examples: GPT-3, GPT-4, BERT, and Llama 2, DALL-E, Stable Diffusion.

Question 3

Q

Multi-Modal

Answer

A

Model can operate in multiple modalities, such as text, images, video, audio, etc.

Question 4

Q

Diffusion

Answer

A

Image generation technique that involves an image and a prompt and slowly adding noise, then using each increasingly noisy image as a datapoint for how to create the image in reverse, from a fully noise image. Trained by first adding noise — such as static — to an image and then reversing the process so that the AI has learned how to create a clear image. There are also diffusion models that work with audio and video.

Question 5

Q

Frontier Model

Answer

A

An advanced AI system that surpasses the capabilities of existing models, enabling it to perform a wide array of tasks. Typically large-scale machine-learning systems trained on extensive datasets, allowing them to generalize across various applications.

The term often used interchangeably with “foundation model,” though distinctions exist. While both are large-scale AI systems capable of general-purpose tasks, frontier models represent the cutting edge, embodying the latest advancements in AI technology. They are designed to handle more complex tasks and exhibit emergent behaviors not present in earlier models.

Due to their advanced capabilities, frontier models can pose unique challenges, including potential risks to public safety if misused. This has led to discussions about the need for regulatory frameworks to manage their development and deployment responsibly.

Technically just a marketing term for their unreleased future models. Theoretically, these models could be far more powerful than the AI models that are available today, though there are also concerns that they could pose significant risks.

Question 6

Q

Foundation Models

Answer

A

These generative AI models are trained on a huge amount of data and, as a result, can be the foundation for a wide variety of applications without specific training for those tasks. OpenAI’s GPT, Google’s Gemini, Meta’s Llama, and Anthropic’s Claude are all examples of foundation models.

Question 7

Q

Multimodal

Answer

A

Can process multiple types of data, such as text, images, and video.

Question 8

Q

Training

Answer

A

Process by which AI models learn to understand data in specific ways by analyzing datasets so they can make predictions and recognize patterns. For example, large language models have been trained by “reading” vast amounts of text. That means that when AI tools like ChatGPT respond to your queries, they can “understand” what you are saying and generate answers that sound like human language and address what your query is about.

Training often requires a significant amount of resources and computing power, and many companies rely on powerful GPUs to help with this training. AI models can be fed different types of data, typically in vast quantities, such as text, images, music, and video. This is — logically enough — known as training data.

Question 9

Q

Parameters

Answer

A

Variables an AI model learns as part of its training. The numbers inside an AI model that determine how an input (e.g., a chunk of prompt text) is converted into an output (e.g., the next word after the prompt). The process of ‘training’ an AI model consists in using mathematical optimization techniques to tweak the model’s parameter values over and over again until the model is very good at converting inputs to outputs.

Companies sometimes boast about how many parameters a model has as a way to demonstrate that model’s complexity.

Question 10

Q

Answer

A

GPT-1 (Generative pre-trained transformer version 1)15 was a decoder-only model developed
by OpenAI in 2018. It was trained on the BooksCorpus dataset (containing approximately
several billion words) and is able to generate text, translate languages, write different kinds
of creative content, and answer questions in an informative way.