NLP Flashcards
What is vector representation in NLP?
A method to represent words or phrases as numerical vectors in a continuous vector space.
What is the primary goal of the CBOW model?
To predict a target word based on its context words.
Fill in the blank: GloVe stands for __________.
Global Vectors for Word Representation
How does GloVe differ from Word2Vec?
GloVe uses global word co-occurrence statistics while Word2Vec focuses on local context.
What is the main advantage of FastText over Word2Vec?
FastText represents words as bags of character n-grams, allowing it to capture subword information.
What is Word2Vec primarily used for?
To create word embeddings that capture semantic meanings based on context.
True or False: Matrix factorization is a technique used in dimensionality reduction.
True
What are the two main architectures of Word2Vec?
Skip-gram and Continuous Bag of Words (CBOW).
Multiple Choice: Which model is designed to capture the meaning of out-of-vocabulary words?
FastText
What kind of data does GloVe use to create word vectors?
Global word co-occurrence matrices.
Fill in the blank: In Word2Vec, the Skip-gram model predicts __________ from a given word.
context words
True or False: FastText can generate embeddings for words not seen in the training data.
True
What is the purpose of dimensionality reduction in NLP?
To reduce the number of features while preserving important information.
How does matrix factorization work in the context of NLP?
It decomposes a large matrix into products of smaller matrices to uncover latent patterns.
What is a key benefit of using vector representations of words?
They allow for mathematical operations that reveal semantic relationships.
Multiple Choice: Which of the following is NOT a method for word vector representation?
Term Frequency-Inverse Document Frequency (TF-IDF)
What is the output dimension in a Word2Vec model determined by?
The size of the word embeddings specified during model training.
True or False: The primary goal of NLP vector representations is to replace traditional text processing methods.
False