Word Embeddings Flashcards
Embeddings
Word Embedding
- Continuous vector representation of words
- Can represent features of syntax and/or semantics
- Can introduce more complex mappings
Representations
Distributional Representations
- Words are similar if they appear in similar contexts
- In contrast: non-distributional representations are created from lexical resources, such as WordNet
Embeddings
WordNet
Large database of word embeddings, including parts of speech, semantic relationships, etc.
Representations
Distributed Representations
- Each item is represented by vector of values
- Each feature in the vector represents distinct attribute
- In contrast: local representations are represented using discrete symbols
Types of Embeddings
Two types of embeddings
- Count-based
- Prediction-based
Types of Embeddings
Count-Based Embeddings
- Represent a word using the normalized counts of its context words
- Sparse vectors
Types of Embeddings
Prediction-Based Embeddings
- Represent a word using a small-dimension continuous vector
- Vectors are learned by training a classifier to predict the word (unsupervised)
- Word emdeddings are the byproduct
- Dense vectors
Count-Based Embeddings
How to create word-context count matrix
- count # of co-occurences of word/context
- rows as words
- columns as contexts
- Co-occuring words like “they” and “the” are low-importance and shouldn’t mean anything
Count-Based Embeddings
tf-idf
w_(t,d) = tf_(t, d) * idf_t
- Weight value for word t in document d
- words like “the” and “it” have very low idf
Count-Based Embeddings
PMI
PMI(w1, w2) = log( p(w1, w2) / ( p(w1)*p(w2) ) )
- Pointwise mutual information
- See if words like “good” appear more often with “great” than we would expect by chance
Count-Based Embeddings
How to measure the closeness?
Measure the closeness using cosine similarity
Prediction-Based Methods: WordVec
CBOW
- Continuous bag of words
- NLP model
- Predict word based on sum of surrounding embeddings
- Used in Word2Vec
Input: context of words, fixed num of words surrounding target
Output: target word context is associated with
Prediction-Based Methods: WordVec
Skip-gram
- NLP model
- Predict each word in the context given the word
- Used in Word2Vec
Input: target word in the middle of a context
Output: surrounding context words that are likley to appear around the context word
Embeddings
How to learn word embeddings?
- Pre-train on unsupervised task and use byproduct embeddings
- Randomly initialize embeddings then train jointly with the task
- Pre-train on unsupervised task (e.g. POS tagging) and test on another (e.g. parsing) and use byproduct embeddings
Contexts
Small Context Window
Creates syntax-based embeddings