Word2Vec Flashcards by Ben Boyce

Is Word2Vec a dense or sparse embedding?

Dense

How well did you know this?

Not at all

Perfectly

Are dense or sparse embeddings better?

Dense embeddings work better - not sure why

How well did you know this?

Not at all

Perfectly

What does static embedding mean?

It means that the words are single embeddings that do not change

How well did you know this?

Not at all

Perfectly

How does contextual embedding differ from static embedding?

The vector for the word will vary depending on its context

How well did you know this?

Not at all

Perfectly

What does self-supervision mean?

It means that a large tagged dataset is not needed

How well did you know this?

Not at all

Perfectly

What does Word2Vec do in simple terms?

Trains a binary classifier to predict if Word A is likely to show up near word B - we then use the embedding weights

How well did you know this?

Not at all

Perfectly

What model is used by Word2Vec?

The skip-gram model

How well did you know this?

Not at all

Perfectly

What is the skip-gram model?

It takes a set of target words and neighbouring context words (positive examples), random samples are taken to create negative examples, logistic regression is used to train classifier and the learned weights provide the embeddings

How well did you know this?

Not at all

Perfectly

What does a skip-gram model store?

A target embedding for matrix W for target words and a context embedding matrix C for context and noise words

How well did you know this?

Not at all

Perfectly

How do we avoid a bias towards common words when selecting noise words?

We use a weighted unigram sample frequency to select the noisy words

How well did you know this?

Not at all

Perfectly

How are Word2Vec embeddings learnt?

They are learnt by minimising a loss function using stochastic gradient descent (SGD)

How well did you know this?

Not at all

Perfectly

What does the loss function do in Word2Vec?

Aims to maximise the probability that the target word is close to the positive examples and maximise the probability that the target word is not close to the negative examples

How well did you know this?

Not at all

Perfectly

Visually, what are we trying to do with Word2Vec?

We are trying to move the weights such that there is an increased association between the target word and positive examples, and a decrease when we have the negative examples

How well did you know this?

Not at all

Perfectly

How are the target matrix and context matrix initialised?

They are randomly initialised, typically with Gaussian noise

How well did you know this?

Not at all

Perfectly

How is the word embedding matrix initialised?

Target Matrix (W) + Context Matrix (C)

How well did you know this?

Not at all

Perfectly

What are some other static embeddings?

Study These Flashcards

Fasttext and Global Vectors (GloVe)

What words appear in a context window when it is small?

Study These Flashcards

Words that are similar, such as those within a list

What words appear in a context window when it is larger?

Study These Flashcards

Words that are more associated rather than similar, so it will capture longer distance topical relationships

What do analogies mean in relation to Word2Vec?

Study These Flashcards

It looks at how A is to B as C is to …

How are analogies computed using Word2Vec?

Study These Flashcards

We compute the vector that takes you from the embedding space from A to B, and apply the offset to word C and find what words are similar

When using Word2Vec with analogies, what needs to be excluded?

Study These Flashcards

Morphological variants of the target word (i.e We don’t want potato → potato | potatoes, but we do want potato → brown)

What are the problems with bias and embeddings?

Study These Flashcards

Bias of the time will be included, which can be problematic when we consider how the world constantly changes

What is allocation harm?

Study These Flashcards

it is where bias in algorithms result in unfair real world outcomes (e.g. credit check that results in denial due to some underlying bias)

What is bias amplification?

Study These Flashcards

It is where embeddings exaggerate patterns making encodings even more bias than the original training resource. Implicit bias can be captured (racial, sexist, ageist bias)

What is representational harm?

It is where harm is caused by a system demeaning or ignoring some social groups

What is debiasing?

It is a way of manipulating embeddings to remove unwelcome stereotypes, it may reduce bias but will not eliminate it

When do two words have first-order co-occurrence?

When they are typically near each other

When do two words have second-order co-occurrence?

When they have similar neighbours

Word2Vec Flashcards

(28 cards)