Word Embeddings Flashcards
Embeddings
Word Embedding
- Continuous vector representation of words
- Can represent features of syntax and/or semantics
- Can introduce more complex mappings
Representations
Distributional Representations
- Words are similar if they appear in similar contexts
- In contrast: non-distributional representations are created from lexical resources, such as WordNet
Embeddings
WordNet
Large database of word embeddings, including parts of speech, semantic relationships, etc.
Representations
Distributed Representations
- Each item is represented by vector of values
- Each feature in the vector represents distinct attribute
- In contrast: local representations are represented using discrete symbols
Types of Embeddings
Two types of embeddings
- Count-based
- Prediction-based
Types of Embeddings
Count-Based Embeddings
- Represent a word using the normalized counts of its context words
- Sparse vectors
Types of Embeddings
Prediction-Based Embeddings
- Represent a word using a small-dimension continuous vector
- Vectors are learned by training a classifier to predict the word (unsupervised)
- Word emdeddings are the byproduct
- Dense vectors
Count-Based Embeddings
How to create word-context count matrix
- count # of co-occurences of word/context
- rows as words
- columns as contexts
- Co-occuring words like “they” and “the” are low-importance and shouldn’t mean anything
Count-Based Embeddings
tf-idf
w_(t,d) = tf_(t, d) * idf_t
- Weight value for word t in document d
- words like “the” and “it” have very low idf
Count-Based Embeddings
PMI
PMI(w1, w2) = log( p(w1, w2) / ( p(w1)*p(w2) ) )
- Pointwise mutual information
- See if words like “good” appear more often with “great” than we would expect by chance
Count-Based Embeddings
How to measure the closeness?
Measure the closeness using cosine similarity
Prediction-Based Methods: WordVec
CBOW
- Continuous bag of words
- NLP model
- Predict word based on sum of surrounding embeddings
- Used in Word2Vec
Input: context of words, fixed num of words surrounding target
Output: target word context is associated with
Prediction-Based Methods: WordVec
Skip-gram
- NLP model
- Predict each word in the context given the word
- Used in Word2Vec
Input: target word in the middle of a context
Output: surrounding context words that are likley to appear around the context word
Embeddings
How to learn word embeddings?
- Pre-train on unsupervised task and use byproduct embeddings
- Randomly initialize embeddings then train jointly with the task
- Pre-train on unsupervised task (e.g. POS tagging) and test on another (e.g. parsing) and use byproduct embeddings
Contexts
Small Context Window
Creates syntax-based embeddings
Contexts
Large Context Window
Creates more semantics-based embeddings
Contexts
Context based on syntax
More functional embeddings, w/ words from same inflection grouped
Count-Based Embeddings
tf_(t, d)
tf_(t,d) = (num of times term t appears in doc d) / (total num of terms in doc d)
Count-Based Embeddings
idf_t
idf_t = log( (total num of of docs) / (num of documents containing term t) )
Prediction-Based Methods
Two Prediction-Based Methods
- WordVec (CBOW, Skip-gram)
- Neural Language Models
Embeddings
How to use word embeddings?
People release pre-trained word embeddings as resources. Two ways to use:
* Initialize
* Concatinate
Embeddings
How to initialize word embeddings?
Initialize with pre-trained embeddings
Embeddings
How to create concatenated word embeddings?
Concatenate pre-trained embeddings with learned embeddings. Makes larger embeddings.
Embeddings: Evaluating Embeddings
Two ways to evaluate word embeddings
- Intrinsic vs Extrinsic
- Qualitative vs Quantitative
Embeddings: Evaluating Embeddings
Intrinsic vs Extrinsic
Intrinsic: How good is it based on its features?
Extrinsic: How useful is it for the downstream tasks?
Embeddings: Evaluating Embeddings
Qualitative vs Quantitative
Qualitative: Examine the characterisitics of examples
Quantitative: Calculate statistics
Embeddings: Evaluating Embeddings
Semantic-Syntatic Word Relationship Test
- 5 types of semantic questions
- 9 types of syntatic questions
- Evaluation: accuracy on retrieving correct word as the closest word
Embeddings
Visualization of Embeddings
Reduce high-dimensional embeddings into 2/3D for visualization
Embeddings
Non-Linear Projection
Groups things that are close in high-dimension space
Embeddings
Limitations of Embeddings
- Sensitive to superficial differences (dog/dogs)
- Insensitive to context (financial bank, bank of river)
- Non necessarily coordinated with knowledge across language, just based on training data
- Not interpretable
- Can encode bias