Word2Vec Flashcards

1
Q

What is Word2Vec

A

Word2Vec is a popular algorithm for generating word embeddings( dense, continuous-valued vector representations of words.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the idea behind word2vec?

A

Distributional Hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is distributional hypothesis?

A

Words occurring in similar context have similar meanings

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the two primary training methods in Word2Vec?

A

SkipGram and CBOW(Continuous Bag of Words)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What happens in skipgram model?

A

Skipgram tries to predict the context words given a target word.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What happens in CBOW?

A

CBOW predicts the target word given its context words.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

General Training Architecture for Word2Vec?

A

Shallow NN with a single hidden layer.
Input corresponds to one hot encoding of target or context words
Hidden Layer corresponds to learned word embeddings
Output layer predicts the probabilities of context words or target words.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

General Training Objective of Word2Vec?

A

Maximize the likelihood of observing the context words given the target word (or vice versa). This is typically achieved by minimizing the negative log-likelihood loss function.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What contributes to effective training

A

Negative sampling or hierarchical softmax, which speed up the training process by approximating the full softmax function used in traditional neural network training.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Explain CBOW architecture?

A

Input is one hot encoded vectors for context words.
Hidden Layer dimensionality is determined by the desired size of the word vectors.
Uses a softmax activation function to output probabilities of each word in the vocabulary.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is Negative Sampling?

A

Negative sampling involves randomly sampling negative examples (non-context words) to train the model on distinguishing between context and non-context words.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is hierarchial softmax?

A

In hierarchical softmax, instead of representing the output layer as a flat layer with individual neurons for each word in the vocabulary, the vocabulary is organized into a binary tree structure, typically a Huffman tree or a binary Huffman-like tree.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly