Week 7 - Distributional Semantics Flashcards
Semantic Processing
The computer needs to “understand” what words mean in a given context
Distributional Hypothesis
The hypothesis that we can infer the meaning of a word from the context it occurs in
Assumes contextual information alone constitutes a viable representation of linguistic items, in contrast to formal linguistics and the formal theory of grammar
Distributional Semantic Model
Generate a high-dimensional feature vector to characterise a linguistic item
Subsequently, the semantic similarity between the linguistic items can be quantified in terms of vector similarity
Linguistic Items
words (or word senses), phrases, text pieces (windows of words), sentences, documents, etc…
Semantic space
The high-dimensional space computed by the distributional semantic model, also called embeding space, (latent) representation space, etc…
Vector distance function
Used to measure how dissimilar two vectors corresponding linguistic items are
Vector similarity function
Used to measure how similar two vectors corresponding linguistic items are
Examples of vector distance/similarity function
Euclidean Distance
Cosine Similarity
Inner Product Similarity
Euclidean Distance
Given two d-dimensional vectors p and q:
sqrt( sum(pi-qi)^2 for i=0->d) )
Inner Product Function
Given two d-dimensional vectors p and q:
sum(pi*qi) for i=0->d
Cosine Function
Given two d-dimensional vectors p and q:
sum(pi*qi) for i=0->d
divided by
sqrt( sum(pi^2) for i=0->d )
* sqrt( sum(qi^2) for i=0->d )
Vector Space Model
count based
Algebraic model for representing a piece of text object (referred to as a document) as a vector of indexed terms (e.g. words, phrases)
In the document vector, each feature value represents the count of an indexed term appearing in a relevant piece of text
By collecting many document vectors and storing them as matrix rows (or columns), it results in the document-term matrix.
Might treat the context of a word as a mini-document
VSM term weighting schemes
Binary Weight
Term Frequency (tf)
Term Frequency Inverse Document Frequency (tf-idf)
VSM binary weighting
Each element in the document-term matrix is the binary presence (or absence) of a word in a document
VSM Term Frequency Weighting
Each element in the document-term matrix is the frequency a word appears in a document, called term frequency (tf)