Word Embeddings Flashcards by Katherine Fadeyeva

Embeddings

Word Embedding

Continuous vector representation of words
Can represent features of syntax and/or semantics
Can introduce more complex mappings

How well did you know this?

Not at all

Perfectly

Representations

Distributional Representations

Words are similar if they appear in similar contexts
In contrast: non-distributional representations are created from lexical resources, such as WordNet

How well did you know this?

Not at all

Perfectly

Embeddings

WordNet

Large database of word embeddings, including parts of speech, semantic relationships, etc.

How well did you know this?

Not at all

Perfectly

Representations

Distributed Representations

Each item is represented by vector of values
Each feature in the vector represents distinct attribute
In contrast: local representations are represented using discrete symbols

How well did you know this?

Not at all

Perfectly

Types of Embeddings

Two types of embeddings

Count-based
Prediction-based

How well did you know this?

Not at all

Perfectly

Types of Embeddings

Count-Based Embeddings

Represent a word using the normalized counts of its context words
Sparse vectors

How well did you know this?

Not at all

Perfectly

Types of Embeddings

Prediction-Based Embeddings

Represent a word using a small-dimension continuous vector
Vectors are learned by training a classifier to predict the word (unsupervised)
Word emdeddings are the byproduct
Dense vectors

How well did you know this?

Not at all

Perfectly

Count-Based Embeddings

How to create word-context count matrix

count # of co-occurences of word/context
rows as words
columns as contexts
Co-occuring words like “they” and “the” are low-importance and shouldn’t mean anything

How well did you know this?

Not at all

Perfectly

Count-Based Embeddings

tf-idf

w_(t,d) = tf_(t, d) * idf_t

Weight value for word t in document d
words like “the” and “it” have very low idf

How well did you know this?

Not at all

Perfectly

Count-Based Embeddings

PMI

PMI(w1, w2) = log( p(w1, w2) / ( p(w1)*p(w2) ) )

Pointwise mutual information
See if words like “good” appear more often with “great” than we would expect by chance

How well did you know this?

Not at all

Perfectly

Count-Based Embeddings

How to measure the closeness?

Measure the closeness using cosine similarity

How well did you know this?

Not at all

Perfectly

Prediction-Based Methods: WordVec

CBOW

Continuous bag of words
NLP model
Predict word based on sum of surrounding embeddings
Used in Word2Vec
Input: context of words, fixed num of words surrounding target
Output: target word context is associated with

How well did you know this?

Not at all

Perfectly

Prediction-Based Methods: WordVec

Skip-gram

NLP model
Predict each word in the context given the word
Used in Word2Vec
Input: target word in the middle of a context
Output: surrounding context words that are likley to appear around the context word

How well did you know this?

Not at all

Perfectly

Embeddings

How to learn word embeddings?

Pre-train on unsupervised task and use byproduct embeddings
Randomly initialize embeddings then train jointly with the task
Pre-train on unsupervised task (e.g. POS tagging) and test on another (e.g. parsing) and use byproduct embeddings

How well did you know this?

Not at all

Perfectly

Contexts

Small Context Window

Creates syntax-based embeddings

How well did you know this?

Not at all

Perfectly

Contexts

Large Context Window

Study These Flashcards

Creates more semantics-based embeddings

Contexts

Context based on syntax

Study These Flashcards

More functional embeddings, w/ words from same inflection grouped

Count-Based Embeddings

tf_(t, d)

Study These Flashcards

tf_(t,d) = (num of times term t appears in doc d) / (total num of terms in doc d)

Count-Based Embeddings

idf_t

Study These Flashcards

idf_t = log( (total num of of docs) / (num of documents containing term t) )

Prediction-Based Methods

Two Prediction-Based Methods

Study These Flashcards

WordVec (CBOW, Skip-gram)
Neural Language Models

Embeddings

How to use word embeddings?

Study These Flashcards

People release pre-trained word embeddings as resources. Two ways to use:
* Initialize
* Concatinate

Embeddings

How to initialize word embeddings?

Study These Flashcards

Initialize with pre-trained embeddings

Embeddings

How to create concatenated word embeddings?

Study These Flashcards

Concatenate pre-trained embeddings with learned embeddings. Makes larger embeddings.

Embeddings: Evaluating Embeddings

Two ways to evaluate word embeddings

Study These Flashcards

Intrinsic vs Extrinsic
Qualitative vs Quantitative

# Embeddings: Evaluating Embeddings Intrinsic vs Extrinsic

**Intrinsic:** How good is it based on its *features*? **Extrinsic:** How useful is it for the *downstream tasks*?

# Embeddings: Evaluating Embeddings Qualitative vs Quantitative

**Qualitative:** Examine the *characterisitics* of examples **Quantitative:** Calculate *statistics*

# Embeddings: Evaluating Embeddings Semantic-Syntatic Word Relationship Test

* 5 types of semantic questions * 9 types of syntatic questions * *Evaluation:* accuracy on retrieving correct word as the closest word

# Embeddings Visualization of Embeddings

Reduce high-dimensional embeddings into 2/3D for visualization

# Embeddings Non-Linear Projection

Groups things that are close in high-dimension space

# Embeddings Limitations of Embeddings

* Sensitive to superficial differences (dog/dogs) * Insensitive to context (financial bank, bank of river) * Non necessarily coordinated with knowledge across language, just based on training data * Not interpretable * Can encode bias

Word Embeddings Flashcards

(30 cards)