Vector Semantics and Embeddings Flashcards by Rayno Mostert

Distributional hypothesis

Words that occur in similar contexts tend to have similar meanings.

The hypothesis was formulated in the 1950s by Joos, Harris and Firth, who noticed that words which are synonyms tended to occur in the same environment with the amount of meaning difference between two words “corresponding to the amount of difference in their environments”.

How well did you know this?

Not at all

Perfectly

Vector semantics

Vector semantics instantiates the distributional hypothesis by learning representations of the meaning of words, called embeddings, directly from their distributions in texts.

How well did you know this?

Not at all

Perfectly

Representation learning

Self-supervised learning, where useful representations of the input text are automatically learned, instead of crafting representation by hand using feature engineering.

How well did you know this?

Not at all

Perfectly

Lexical symantics

The linguistic study of word meaning

How well did you know this?

Not at all

Perfectly

Propositional meaning

Two words are synonymous if they are substitutable for one another in any sentence without changing the truth conditions of the sentence - the situations in which the sentence would be true.

How well did you know this?

Not at all

Perfectly

Principle of contrast

A difference in linguistic form is always associated with some difference in meaning.

E.g. H₂O and water are synonymous. But H₂O is used in scientific contexts and would be inappropriate in a hiking guide.

How well did you know this?

Not at all

Perfectly

Semantic field

A set of words which cover a particular semantic domain and bear structured relations with each other.

E.g. the semantic field of hospitals (surgeon, scalpel, nurse, anesthetic, hospital), restaurants (waiter, menu, plate, food, chef).

How well did you know this?

Not at all

Perfectly

Semantic frame

A set of words that denote perspectives or participants in a particular type of event.

E.g. a commercial transation is a kind of event in which one entity trades money to another entity in return for some good or service, after which the good changes hands or perhaps the service is performed.

This event can be encoded lexically by using verbs like buy (the event from the perspective of the buyer), sell (from the persepctive of the seller), pay (focussing on the monetary aspect), or nouns like buyer.

Frames have semantic roles (buyer, seller, goods, money) and words in a sentence can take on these roles.

How well did you know this?

Not at all

Perfectly

Connotations

Words have affective meanings.

The aspects of a word’s meaning that are related to a writer or reader’s emotions, sentiment, opinions or evaluations.

How well did you know this?

Not at all

Perfectly

Sentiment

Positive or negative evaluation language.

How well did you know this?

Not at all

Perfectly

Vectors semantics

The standard way to represent word meaning in NLP.

The idea is to represent a word as a point in a multidimensional semantic space that is derived from the distributions of word neighbours.

How well did you know this?

Not at all

Perfectly

Embeddings

Vectors for representing words.

How well did you know this?

Not at all

Perfectly

Co-occurrence matrix

A way of representing how often words co-occur.

How well did you know this?

Not at all

Perfectly

Term-document matrix

Each row represents a word in the vocabulary and each column represents a document from some collection of documents.

Each cell represents the number of times a particular word occurs in a particular document.

How well did you know this?

Not at all

Perfectly

Information retrieval

The task of finding the document d from the D documents in some collection that best matches a query q.

How well did you know this?

Not at all

Perfectly

term-term matrix

Study These Flashcards

A matrix of dimensionality |V| x |V|, where each cell records the number of times the row word and the column word co-occur in some context in some training corpus.

The context could be the document. However, it is common to use smaller contexts, generally a window around the word, e.g. 4 words to the left and 4 words to the right.

Cosine similarity

Study These Flashcards

cosine(v, w) = v · w / (|v| |w|)

Value ranges from -1 to 1.

But since raw frequency values are non-negative, cosine similarity for these vectors ranges from 0-1.

tf-idf weighting

Study These Flashcards

A product of two terms: term frequency and inverse document frequency

w(t, d) = tf( t, d) x idf (t )

tf-idf

term frequency

Study These Flashcards

The frequency of the word t in the document d.

tf(t, d) = count(t, d)

Commonly a log weighting is used:

tf(t, d) = log₁₀ ( count(t, d) + 1)

tf-idf

inverse document frequency

Study These Flashcards

The document frequency of a term t is the number of documents it occurs in. df(t)

Inverse document frequency, idf, where N is the total number of documents in the collection:

idf(t) = N / df(t)

Commonly a log weighting is used:

idf(t) = log₁₀ ( N / df(t) )

The fewer documents a term occurs in, the higher this weight.

Positive Pointwise Mutual Information

Intuition

Study These Flashcards

The best way to weight the association between two words is to ask how much more two words co-occur in our corpus than we would have a priori expected them to appear by chance.

Pointwise Mutual Information

Study These Flashcards

A measure of how often two events x and y occur, compared with what we would expect if they were independent:

I(x, y) = log₂ P(x, y) / (P(x) P(y)

The Pointwise mutual information between a target word w and a context word c is then defined as:

PMI( w, c ) = log₂ P(w, c) / (P(w) P(c))

The numerator tells us how often we observed the two words together (assuming we compute probability by using the MLE).

The denominator tells us how often we would expect the two words to co-occur assuming they each occurred independently.

Word2Vec

Intuition of skip-gram

Study These Flashcards

Treat the target word and a neighbouring context word as positive examples
Randomly sample other words in the lexicon to get negative samples.
Use logistic regression to train a classifier to distinguish these two cases.
Use the learned weights as the embeddings.

First-order co-occurrence

A.k.a syntagmatic association

Study These Flashcards

Two words have first-order co-occurrence if they are typically nearby each other.

.e.g “wrote” is a first-order associate of “book” or “poem”

# Paradigmatic association Second-order co-occurrence

Two words have second-order co-occurrence if they have similar neighbours. | Wrote is a second-order associate of words like "said" or "remarked"

Vector Semantics and Embeddings Flashcards

(26 cards)