Lecture 5 Flashcards
What is Semantics in NLP?
Semantics is the study of meaning in language, focusing on understanding references or truth within text and speech.
Why are Word Representations important in NLP?
Word representations allow us to find documents with similar meaning rather than exact word matches, improving search and retrieval.
What does “You shall know a word by the company it keeps” mean?
It implies that words used in similar contexts tend to have similar meanings, a foundation for distributional semantics.
What are Word Embeddings?
Word embeddings are vector representations of words, capturing semantic similarity by placing similar words close in vector space.
Describe Word2Vec and its two main methods.
Word2Vec creates word embeddings through Skip-gram (predicting context from target word) and CBOW (predicting target word from context).
What is a limitation of Word2Vec?
Word2Vec generates a single representation per word, which doesn’t account for polysemy (words with multiple meanings).
What is Cosine Similarity in the context of word embeddings?
Cosine similarity measures the similarity between two word vectors as the normalized dot product of those vectors.
What are Contextualized Embeddings?
Contextualized embeddings, like those from BERT, create word representations that vary depending on the word’s context within a sentence.
Name two examples of Contextualized Embedding Models.
ELMo and BERT are examples of models that create context-dependent embeddings.
What is Sentence-BERT used for?
Sentence-BERT is used to create sentence embeddings, allowing for efficient comparison of sentence meanings.
Define TF-IDF and its purpose.
TF-IDF (Term Frequency-Inverse Document Frequency) is a method to weight terms in a document based on their frequency and importance, improving information retrieval.
How does TF-IDF work?
TF-IDF assigns higher weights to words that are frequent in a document but rare in the entire corpus, reducing the impact of common words.
What is Intrinsic Evaluation in evaluating embeddings?
Intrinsic evaluation assesses embeddings by comparing algorithm-generated word similarity scores to human-annotated scores.
What is Extrinsic Evaluation in evaluating embeddings?
Extrinsic evaluation tests embeddings in real NLP tasks (e.g., information retrieval) to measure their practical effectiveness.
What are Bilingual Embeddings?
Bilingual embeddings align words from two languages in the same vector space, enabling cross-lingual tasks.