lesson_12_flashcards
What is a language model?
A model that estimates the probability of sequences of words and enables applications like predictive typing, text completion, and speech recognition.
What is the chain rule of probability in language modeling?
A technique to compute the probability of a sequence by multiplying conditional probabilities of each word given its history.
What is perplexity in language modeling?
A metric that evaluates a language model’s performance by measuring how well it predicts a sample, with lower values indicating better performance.
What is teacher forcing in RNN training?
A training method where the actual next word from the dataset, not the model’s prediction, is used as the input at each time step.
What are recurrent neural networks (RNNs)?
A family of neural architectures for sequence modeling, processing inputs sequentially and maintaining a state vector to represent past inputs.
What is the vanishing gradient problem in RNNs?
Gradients become too small during backpropagation through time, making it difficult to learn long-term dependencies.
What are LSTMs and GRUs?
Variants of RNNs designed to address the vanishing gradient problem by incorporating gating mechanisms for better long-term memory retention.
What is masked language modeling?
A pretraining task where certain words in a sequence are masked and the model predicts them, improving performance on downstream NLP tasks.
What is cross-lingual transfer in masked language models?
The ability of a model trained on one language (e.g., English) to perform well on another language (e.g., French) without additional training.
What is knowledge distillation in NLP?
A technique where a smaller model (student) learns to replicate the predictions of a larger model (teacher), reducing computation costs while retaining accuracy.
What is conditional language modeling?
Language modeling conditioned on additional information (e.g., a topic, an image, or another language) for tasks like translation or image captioning.
What is the role of cross-entropy in language modeling?
A loss function used to measure the difference between the predicted probability distribution and the true distribution of sequences.
What are attention mechanisms in RNNs?
Mechanisms that allow models to focus on specific parts of a sequence dynamically, improving the representation of long-range dependencies.
What is sequence-to-sequence modeling?
Mapping an input sequence to an output sequence, used in tasks like machine translation, summarization, and speech recognition.
What is the importance of embeddings in language models?
Word embeddings represent words as dense vectors, capturing semantic relationships and improving the input representation for NLP tasks.