Intro to LLMs Flashcards

Question 1

Q

What is the Turing Test?

Answer

A

The Turing Test, proposed by Alan Turing in 1950, is a test to determine if a machine can exhibit human-like intelligence by engaging in a conversation where a judge cannot distinguish it from a human.

Question 2

Q

What is NLP?

Answer

A

Natural Language Processing: a field of AI that enables computers to understand, interpret, and generate human language.

Question 3

Q

Define Corpus.

Answer

A

A large collection of text data used to train, fine-tune, or evaluate a language model in NLP

Question 4

Q

What is an Autoregressive Statistical Model?

Answer

A

A model architecture where each token is generated sequentially by predicting the next token based on previously generated tokens.

Question 5

Q

What is a Count-based Language Model?

Answer

A

A statistical model that predicts the next work in a sequence by analyzing word frequency and co-occurrence in a corpus, often using n-grams and probability distributions.

Question 6

Q

What does P(do | what) represent in language modeling?

Answer

A

It represents the conditional probability of the word “do” occurring given that the previous word was “what”.

This is commonly used in n-gram models and probabilistic language modeling to predict the next word based on prior context.

Question 7

Q

Define Generalization.

Answer

A

Generalization is the ability of a model to adapt to new and unseen data.

In the context of language models generating text, it means that they can produce text that doesn’t exist in its training data.

Ex: A model is aware of the sentence “The cat jumps on the bed”. A model then can reasonable assume that “The dog jumps on a couch” even if it is unaware of that exact sentence.

Question 8

Q

What is Semantic Language Model?

Answer

A

A Semantic Language Model understands and generates text based on meaning rather than just statistical word relationships.

It captures context, synonyms, and relationships between words to produce more accurate and natural language responses.

Question 9

Q

What are Word Embeddings?

Answer

A

Word Embeddings are numerical vector representations of words that capture their meaning based on context.

They allow machine learning models to understand relationships between words, such as similarity and analogy, by placing them in a multi-dimensional space where similar words are closer together.

Question 10

Q

What is a Neural Network?

Answer

A

A Neural Network is a computational model inspired by the human brain, consisting of layers of interconnected nodes (neurons).

It processes data through weighted connections and activation functions, making it useful for tasks like pattern recognition, language modeling, and decision-making in AI.

Question 11

Q

What is a Neural Language Model?

Answer

A

A Neural Language Model is an AI model that predicts and generates text using neural networks.

It learns statistical patterns in language through deep learning techniques, such as recurrent or transformer-based architectures, to improve tasks like speech recognition, translation, and text generation.

Question 12

Q

The phrase “non-zero probablity” means:

Answer

A

“It is possible that…”

Question 13

Q

The phrase “zero probability” means:

Answer

A

“It is impossible that…”

Question 14

Q

Define Compression.

Answer

A

Compression occurs when language models mistakenly think a valid sentence is impossible because it hasn’t seen it before.

Question 15

Q

What is Machine Learning?

Answer

A

ML refers to a class of algorithms that learn patterns from vast amounts of data to perform tasks like prediction, inference, and generation.

Question 16

Q

What is Deep Learning?

Answer

Study These Flashcards

A

Deep Learning is a subset of machine learning methods that use many-layered or deep neural networks.

Question 17

Q

What is a GPT?

Answer

Study These Flashcards

A

Generative Pre-Trained Transformers are a class of LLMs that employ transformers to learn from a vast text corpuses to be able to generate coherent “human-sounding” text.

Question 18

Q

Define Temperature.

Answer

Study These Flashcards

A

Temperature is a parameter that controls the randomness of an LLM’s responses.

Low Temp (ex. 0.2) = More Deterministic, choosing high-probability words. “The sky is blue.”

High Temp (ex. 1.0 or higher) = More Creative, allowing diverse or unexpected words. “The sky is a vast canvas of shifting hues.”

Question 19

Q

How do Count-Based Language Models Work?

Answer

Study These Flashcards

A

Use statistics from a large text corpus to predict words.

Relies on n-grams (fixed sequences of words) to estimate probabilities.

Uses frequency counts to determine the likelihood of a word appearing after another.

Ex. A phrasebook - it only knows exact phrases it has seen. If you say something slightly different, it gets confused.

Question 20

Q

How do Neural Language Models ( Transformers, GPT) work?

Answer

Study These Flashcards

A

Uses deep learning (neural networks) to model relationships between words.

Doesn’t rely on explicit frequency counts but learns patterns in language.

Can capture context better, even for unseen words, by understanding meaning through embeddings.

Ex. A person who had studied language deeply- has the ability to understand new phrases by recognizing patterns and context.

Question 21

Q

Answer

Study These Flashcards

A

Intro to LLMs Flashcards

(21 cards)