Word Embeddings Flashcards

1
Q

Embeddings

Word Embedding

A
  • Continuous vector representation of words
  • Can represent features of syntax and/or semantics
  • Can introduce more complex mappings
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Representations

Distributional Representations

A
  • Words are similar if they appear in similar contexts
  • In contrast: non-distributional representations are created from lexical resources, such as WordNet
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Embeddings

WordNet

A

Large database of word embeddings, including parts of speech, semantic relationships, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Representations

Distributed Representations

A
  • Each item is represented by vector of values
  • Each feature in the vector represents distinct attribute
  • In contrast: local representations are represented using discrete symbols
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Types of Embeddings

Two types of embeddings

A
  1. Count-based
  2. Prediction-based
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Types of Embeddings

Count-Based Embeddings

A
  • Represent a word using the normalized counts of its context words
  • Sparse vectors
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Types of Embeddings

Prediction-Based Embeddings

A
  • Represent a word using a small-dimension continuous vector
  • Vectors are learned by training a classifier to predict the word (unsupervised)
  • Word emdeddings are the byproduct
  • Dense vectors
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Count-Based Embeddings

How to create word-context count matrix

A
  • count # of co-occurences of word/context
  • rows as words
  • columns as contexts
  • Co-occuring words like “they” and “the” are low-importance and shouldn’t mean anything
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Count-Based Embeddings

tf-idf

A
w_(t,d) = tf_(t, d) * idf_t
  • Weight value for word t in document d
  • words like “the” and “it” have very low idf
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Count-Based Embeddings

PMI

A
PMI(w1, w2) = log( p(w1, w2) / ( p(w1)*p(w2) ) )
  • Pointwise mutual information
  • See if words like “good” appear more often with “great” than we would expect by chance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Count-Based Embeddings

How to measure the closeness?

A

Measure the closeness using cosine similarity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Prediction-Based Methods: WordVec

CBOW

A
  • Continuous bag of words
  • NLP model
  • Predict word based on sum of surrounding embeddings
  • Used in Word2Vec
    Input: context of words, fixed num of words surrounding target
    Output: target word context is associated with
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Prediction-Based Methods: WordVec

Skip-gram

A
  • NLP model
  • Predict each word in the context given the word
  • Used in Word2Vec
    Input: target word in the middle of a context
    Output: surrounding context words that are likley to appear around the context word
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Embeddings

How to learn word embeddings?

A
  1. Pre-train on unsupervised task and use byproduct embeddings
  2. Randomly initialize embeddings then train jointly with the task
  3. Pre-train on unsupervised task (e.g. POS tagging) and test on another (e.g. parsing) and use byproduct embeddings
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Contexts

Small Context Window

A

Creates syntax-based embeddings

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Contexts

Large Context Window

A

Creates more semantics-based embeddings

17
Q

Contexts

Context based on syntax

A

More functional embeddings, w/ words from same inflection grouped

18
Q

Count-Based Embeddings

tf_(t, d)

A

tf_(t,d) = (num of times term t appears in doc d) / (total num of terms in doc d)

19
Q

Count-Based Embeddings

idf_t

A
idf_t = log( (total num of of docs) / (num of documents containing term t) )
20
Q

Prediction-Based Methods

Two Prediction-Based Methods

A
  1. WordVec (CBOW, Skip-gram)
  2. Neural Language Models
21
Q

Embeddings

How to use word embeddings?

A

People release pre-trained word embeddings as resources. Two ways to use:
* Initialize
* Concatinate

22
Q

Embeddings

How to initialize word embeddings?

A

Initialize with pre-trained embeddings

23
Q

Embeddings

How to create concatenated word embeddings?

A

Concatenate pre-trained embeddings with learned embeddings. Makes larger embeddings.

24
Q

Embeddings: Evaluating Embeddings

Two ways to evaluate word embeddings

A
  1. Intrinsic vs Extrinsic
  2. Qualitative vs Quantitative
25
Q

Embeddings: Evaluating Embeddings

Intrinsic vs Extrinsic

A

Intrinsic: How good is it based on its features?
Extrinsic: How useful is it for the downstream tasks?

26
Q

Embeddings: Evaluating Embeddings

Qualitative vs Quantitative

A

Qualitative: Examine the characterisitics of examples
Quantitative: Calculate statistics

27
Q

Embeddings: Evaluating Embeddings

Semantic-Syntatic Word Relationship Test

A
  • 5 types of semantic questions
  • 9 types of syntatic questions
  • Evaluation: accuracy on retrieving correct word as the closest word
28
Q

Embeddings

Visualization of Embeddings

A

Reduce high-dimensional embeddings into 2/3D for visualization

29
Q

Embeddings

Non-Linear Projection

A

Groups things that are close in high-dimension space

30
Q

Embeddings

Limitations of Embeddings

A
  • Sensitive to superficial differences (dog/dogs)
  • Insensitive to context (financial bank, bank of river)
  • Non necessarily coordinated with knowledge across language, just based on training data
  • Not interpretable
  • Can encode bias