Word2Vec Flashcards
What is Word2Vec
Word2Vec is a popular algorithm for generating word embeddings( dense, continuous-valued vector representations of words.)
What is the idea behind word2vec?
Distributional Hypothesis
What is distributional hypothesis?
Words occurring in similar context have similar meanings
What are the two primary training methods in Word2Vec?
SkipGram and CBOW(Continuous Bag of Words)
What happens in skipgram model?
Skipgram tries to predict the context words given a target word.
What happens in CBOW?
CBOW predicts the target word given its context words.
General Training Architecture for Word2Vec?
Shallow NN with a single hidden layer.
Input corresponds to one hot encoding of target or context words
Hidden Layer corresponds to learned word embeddings
Output layer predicts the probabilities of context words or target words.
General Training Objective of Word2Vec?
Maximize the likelihood of observing the context words given the target word (or vice versa). This is typically achieved by minimizing the negative log-likelihood loss function.
What contributes to effective training
Negative sampling or hierarchical softmax, which speed up the training process by approximating the full softmax function used in traditional neural network training.
Explain CBOW architecture?
Input is one hot encoded vectors for context words.
Hidden Layer dimensionality is determined by the desired size of the word vectors.
Uses a softmax activation function to output probabilities of each word in the vocabulary.
What is Negative Sampling?
Negative sampling involves randomly sampling negative examples (non-context words) to train the model on distinguishing between context and non-context words.
What is hierarchial softmax?
In hierarchical softmax, instead of representing the output layer as a flat layer with individual neurons for each word in the vocabulary, the vocabulary is organized into a binary tree structure, typically a Huffman tree or a binary Huffman-like tree.