Lecture 3 Flashcards
What is Language Modeling?
Language modeling is the task of predicting the next word in a sequence based on previous words, a core part of NLP tasks like text generation and machine translation.
Describe Causal Language Modeling.
Causal language modeling predicts the next token in a sequence incrementally, mimicking human language processing.
What is an N-Gram Model?
An N-Gram model uses fixed-length word sequences (e.g., bigrams, trigrams) to predict the likelihood of a word based on the previous words.
Define the Markov Assumption in language modeling.
The Markov assumption limits the dependency of a word to only the previous n−1 words, simplifying the computation of probabilities in N-Gram models.
What is Smoothing in N-Gram models?
Smoothing is a technique to handle unseen N-Grams by assigning a small probability instead of zero, improving model robustness.
What is Zipf’s Law?
Zipf’s Law states that in a language, a few words are very common, while most words are rare, leading to a “long tail” distribution of word frequency.
How is Perplexity used in language models?
Perplexity measures a model’s ability to predict a sample sequence; lower perplexity indicates better predictive accuracy.
How is Cross-Entropy related to Perplexity?
Cross-entropy measures the difficulty of a model predicting a corpus; perplexity is the exponent of cross-entropy, representing uncertainty in prediction
What are Word Embeddings?
Word embeddings are dense vector representations of words, capturing semantic similarity by positioning similar words closer in vector space.
What are the two main training setups in word2vec?
CBOW (Continuous Bag of Words), which predicts a target word from context, and Skip-gram, which predicts context words from a target word.
How does CBOW differ from Skip-gram in word2vec?
CBOW predicts a word from its context, ideal for frequent words; Skip-gram predicts surrounding words from a target word, better for rare words.
What is the Bag of Words (BoW) approach in sentence embeddings?
BoW represents sentences by their word count, ignoring grammar and word order, but does not capture context between words.
Describe a Naive Approach to sentence embeddings.
One naive approach averages word embeddings in a sentence, which can serve as a simple baseline but ignores word order and context.
What are LSTMs and their role in language modeling?
Long Short-Term Memory networks (LSTMs) are a type of RNN that handle long-distance dependencies in sequences, making them suitable for sequential data.
What is the Transformer architecture?
The Transformer uses self-attention mechanisms to capture relationships between words without sequence limitations, powering models like BERT and GPT.