Lecture 5 Flashcards

Question 1

Q

What is Semantics in NLP?

Answer

A

Semantics is the study of meaning in language, focusing on understanding references or truth within text and speech.

Question 2

Q

Why are Word Representations important in NLP?

Answer

A

Word representations allow us to find documents with similar meaning rather than exact word matches, improving search and retrieval.

Question 3

Q

What does “You shall know a word by the company it keeps” mean?

Answer

A

It implies that words used in similar contexts tend to have similar meanings, a foundation for distributional semantics.

Question 4

Q

What are Word Embeddings?

Answer

A

Word embeddings are vector representations of words, capturing semantic similarity by placing similar words close in vector space.

Question 5

Q

Describe Word2Vec and its two main methods.

Answer

A

Word2Vec creates word embeddings through Skip-gram (predicting context from target word) and CBOW (predicting target word from context).

Question 6

Q

What is a limitation of Word2Vec?

Answer

A

Word2Vec generates a single representation per word, which doesn’t account for polysemy (words with multiple meanings).

Question 7

Q

What is Cosine Similarity in the context of word embeddings?

Answer

A

Cosine similarity measures the similarity between two word vectors as the normalized dot product of those vectors.

Question 8

Q

What are Contextualized Embeddings?

Answer

A

Contextualized embeddings, like those from BERT, create word representations that vary depending on the word’s context within a sentence.

Question 9

Q

Name two examples of Contextualized Embedding Models.

Answer

A

ELMo and BERT are examples of models that create context-dependent embeddings.

Question 10

Q

What is Sentence-BERT used for?

Answer

A

Sentence-BERT is used to create sentence embeddings, allowing for efficient comparison of sentence meanings.

Question 11

Q

Define TF-IDF and its purpose.

Answer

A

TF-IDF (Term Frequency-Inverse Document Frequency) is a method to weight terms in a document based on their frequency and importance, improving information retrieval.

Question 12

Q

How does TF-IDF work?

Answer

A

TF-IDF assigns higher weights to words that are frequent in a document but rare in the entire corpus, reducing the impact of common words.

Question 13

Q

What is Intrinsic Evaluation in evaluating embeddings?

Answer

A

Intrinsic evaluation assesses embeddings by comparing algorithm-generated word similarity scores to human-annotated scores.

Question 14

Q

What is Extrinsic Evaluation in evaluating embeddings?

Answer

A

Extrinsic evaluation tests embeddings in real NLP tasks (e.g., information retrieval) to measure their practical effectiveness.

Question 15

Q

What are Bilingual Embeddings?

Answer

A

Bilingual embeddings align words from two languages in the same vector space, enabling cross-lingual tasks.

Question 16

Q

What is the goal of Cross-Lingual Embedding?

Answer

Study These Flashcards

A

Cross-lingual embedding aims to position words with similar meanings from different languages close together in vector space.

Question 17

Q

Describe the Direct Transfer approach in bilingual embeddings.

Answer

Study These Flashcards

A

Direct Transfer trains a model in a resource-rich language (e.g., English) and applies it to a low-resource language by mapping embeddings.

Question 18

Q

What is the Naive Approach to bilingual embeddings?

Answer

Study These Flashcards

A

The naive approach uses a bilingual dictionary to align words between languages, assigning similar embeddings to translation pairs.

Question 19

Q

What is a Transformation Matrix in bilingual embeddings?

Answer

Study These Flashcards

A

A transformation matrix aligns monolingual embeddings by learning a linear mapping between two languages based on word pairs.

Question 20

Q

What are Multilingual Embeddings?

Answer

Study These Flashcards

A

Multilingual embeddings map more than two languages into a shared vector space, supporting tasks across multiple languages.

Lecture 5 Flashcards

(20 cards)