Vector Semantics and Embeddings Flashcards
Distributional hypothesis
Words that occur in similar contexts tend to have similar meanings.
The hypothesis was formulated in the 1950s by Joos, Harris and Firth, who noticed that words which are synonyms tended to occur in the same environment with the amount of meaning difference between two words “corresponding to the amount of difference in their environments”.
Vector semantics
Vector semantics instantiates the distributional hypothesis by learning representations of the meaning of words, called embeddings, directly from their distributions in texts.
Representation learning
Self-supervised learning, where useful representations of the input text are automatically learned, instead of crafting representation by hand using feature engineering.
Lexical symantics
The linguistic study of word meaning
Propositional meaning
Two words are synonymous if they are substitutable for one another in any sentence without changing the truth conditions of the sentence - the situations in which the sentence would be true.
Principle of contrast
A difference in linguistic form is always associated with some difference in meaning.
E.g. H₂O and water are synonymous. But H₂O is used in scientific contexts and would be inappropriate in a hiking guide.
Semantic field
A set of words which cover a particular semantic domain and bear structured relations with each other.
E.g. the semantic field of hospitals (surgeon, scalpel, nurse, anesthetic, hospital), restaurants (waiter, menu, plate, food, chef).
Semantic frame
A set of words that denote perspectives or participants in a particular type of event.
E.g. a commercial transation is a kind of event in which one entity trades money to another entity in return for some good or service, after which the good changes hands or perhaps the service is performed.
This event can be encoded lexically by using verbs like buy (the event from the perspective of the buyer), sell (from the persepctive of the seller), pay (focussing on the monetary aspect), or nouns like buyer.
Frames have semantic roles (buyer, seller, goods, money) and words in a sentence can take on these roles.
Connotations
Words have affective meanings.
The aspects of a word’s meaning that are related to a writer or reader’s emotions, sentiment, opinions or evaluations.
Sentiment
Positive or negative evaluation language.
Vectors semantics
The standard way to represent word meaning in NLP.
The idea is to represent a word as a point in a multidimensional semantic space that is derived from the distributions of word neighbours.
Embeddings
Vectors for representing words.
Co-occurrence matrix
A way of representing how often words co-occur.
Term-document matrix
Each row represents a word in the vocabulary and each column represents a document from some collection of documents.
Each cell represents the number of times a particular word occurs in a particular document.
Information retrieval
The task of finding the document d
from the D documents in some collection that best matches a query q
.