Fundation Models Flashcards

Question 1

Q

What are Fundation Models?

Answer

A

A large AI model trained on a wide variety of data, which can do many tasks without much extra training.

Question 2

Q

What is the difference between Fundation Models and Traditional Models?

Answer

A

Foundation Models, which are built on large, diverse datasets, have the incredible ability to adapt and perform well on many different tasks.
Examples: GPT, Bard, LLaMa, BERT, DALL-E

In contrast, Traditional Models specialize in specific tasks by learning from smaller, focused datasets, making them more straightforward and efficient for targeted applications.
Examples: Linear Regression, decision trees, CNNs

Question 3

Q

What is the a common architecture for Gen AI? What are the Key Features of it?

Answer

A

In modern language models, the predominant architecture is called a Transformer.

Key Features of the Transformer Architecture
Self-Attention Mechanism
Definition: The model “looks” at every position in a sequence and decides how important each other position is when producing an output.
Benefit: It allows the model to capture relationships between words or tokens without reading the sequence purely in order (left-to-right or right-to-left).
Example: In a sentence like “The cat, which was hungry, jumped over the fence,” the word “hungry” has a strong relationship to “cat,” even though they’re separated by commas and other words. Self-attention allows the model to link “cat” and “hungry” directly.
Parallelization
Definition: Unlike older models (e.g., RNNs or LSTMs) that read tokens one at a time, Transformers process multiple tokens in parallel.
Benefit: Training is much faster and more scalable on modern hardware (GPUs, TPUs).
Comparison: A classic RNN has to iterate through each word step by step. A Transformer can process them all at once in a “matrix multiplication” style, speeding up training.
Layers and Blocks
Definition: Transformers stack multiple “layers” or “blocks” of self-attention and feed-forward sub-layers.
Benefit: Each layer refines what the previous one learned, so deeper models (with many layers) can capture more sophisticated patterns.
Example: GPT-based models or LLaMA might have dozens of Transformer blocks stacked on top of each other, each block learning increasingly abstract patterns (e.g., from understanding simple word adjacency at the lower layers to complex semantic relationships at higher layers).

Question 4

Q

What is GLUE Benchmark?

Answer

A

GLUE, also known as General Language Understanding Evaluation, is an evaluation benchmark designed to measure the performance of language understanding models in a range of natural language processing (NLP) tasks. It provides a standardized set of diverse NLP tasks, allowing researchers and practitioners to evaluate and compare the effectiveness of different language models on these tasks.

GLUE consists of a collection of nine representative NLP tasks, including sentence classification, sentiment analysis, and question answering. Each task in the benchmark comes with a training set, a development set for fine-tuning the models, and an evaluation set for testing the performance of the models. Participants in the benchmark can submit their models and evaluate their performance on the GLUE leaderboard, which tracks the progress and advancements in language understanding.

Question 5

Q

What is SuperGLUE benchmark?

Answer

A

SuperGlue is designed as a successor to the original GLUE benchmark. It’s a more advanced benchmark aimed at presenting even more challenging language understanding tasks for AI models.

Question 6

Q

What kind of data is used for traning LLMs? Why this diversity is important?

Answer

A

Data Used for Training LLMs
* Websites
* Scientific Papers
* Encyclopedias
* Books and Literature
* Conversational Data
* Social Media Posts
* Legal Documents
* Multilingual Texts

To help the model understand and generate text across various topics and styles.

Question 7

Q

What is Common Crawl ?

Answer

A

Common Crawl maintains a free, open repository of web crawl data that can be used by anyone.
Common Crawl is a 501(c)(3) non–profit founded in 2007.
‍
They make wholesale extraction, transformation and analysis of open web data accessible to researchers.

Question 8

Q

How could one best describe the data in CommonCrawl dataset?

Answer

A

Unstructured and noisy. The CommonCrawl website contains webpages in their original form, and these pages have not been filtered for things like spam.

Question 9

Q

What is Project Gutenberg?

Answer

A

Project Gutenberg is a library of over 70,000 free eBooks

Question 10

Q

What is one ethical risk associated with using foundation models?

Answer

A

They can be misused for creating deepfakes or phishing content

Question 11

Q

Why are foundation models considered a “black box”?

Answer

A

Because they lack explainability, making it hard to understand how decisions or outputs are generated.

Question 12

Q

How can foundation models impact privacy?

Answer

A

They might inadvertently expose sensitive data if such data was part of the training dataset.

Question 13

Q

What is an environmental concern related to foundation models?

Answer

A

Training and deploying large models have a high carbon footprint due to significant energy use.

Question 14

Q

How do hallucinations in foundation models pose risks?

Answer

A

They can generate factually incorrect or fabricated information that appears plausible.

Question 15

Q

Question 16

Q