Lecture 3 Flashcards

Question 1

Q

What is Language Modeling?

Answer

A

Language modeling is the task of predicting the next word in a sequence based on previous words, a core part of NLP tasks like text generation and machine translation.

Question 2

Q

Describe Causal Language Modeling.

Answer

A

Causal language modeling predicts the next token in a sequence incrementally, mimicking human language processing.

Question 3

Q

What is an N-Gram Model?

Answer

A

An N-Gram model uses fixed-length word sequences (e.g., bigrams, trigrams) to predict the likelihood of a word based on the previous words.

Question 4

Q

Define the Markov Assumption in language modeling.

Answer

A

The Markov assumption limits the dependency of a word to only the previous n−1 words, simplifying the computation of probabilities in N-Gram models.

Question 5

Q

What is Smoothing in N-Gram models?

Answer

A

Smoothing is a technique to handle unseen N-Grams by assigning a small probability instead of zero, improving model robustness.

Question 6

Q

What is Zipf’s Law?

Answer

A

Zipf’s Law states that in a language, a few words are very common, while most words are rare, leading to a “long tail” distribution of word frequency.

Question 7

Q

How is Perplexity used in language models?

Answer

A

Perplexity measures a model’s ability to predict a sample sequence; lower perplexity indicates better predictive accuracy.

Question 8

Q

How is Cross-Entropy related to Perplexity?

Answer

A

Cross-entropy measures the difficulty of a model predicting a corpus; perplexity is the exponent of cross-entropy, representing uncertainty in prediction

Question 9

Q

What are Word Embeddings?

Answer

A

Word embeddings are dense vector representations of words, capturing semantic similarity by positioning similar words closer in vector space.

Question 10

Q

What are the two main training setups in word2vec?

Answer

A

CBOW (Continuous Bag of Words), which predicts a target word from context, and Skip-gram, which predicts context words from a target word.

Question 11

Q

How does CBOW differ from Skip-gram in word2vec?

Answer

A

CBOW predicts a word from its context, ideal for frequent words; Skip-gram predicts surrounding words from a target word, better for rare words.

Question 12

Q

What is the Bag of Words (BoW) approach in sentence embeddings?

Answer

A

BoW represents sentences by their word count, ignoring grammar and word order, but does not capture context between words.

Question 13

Q

Describe a Naive Approach to sentence embeddings.

Answer

A

One naive approach averages word embeddings in a sentence, which can serve as a simple baseline but ignores word order and context.

Question 14

Q

What are LSTMs and their role in language modeling?

Answer

A

Long Short-Term Memory networks (LSTMs) are a type of RNN that handle long-distance dependencies in sequences, making them suitable for sequential data.

Question 15

Q

What is the Transformer architecture?

Answer

A

The Transformer uses self-attention mechanisms to capture relationships between words without sequence limitations, powering models like BERT and GPT.