Distributional Semantics Flashcards
How do we learn new words?
- look in a dictionary
- from experience of usage
- similar words from the past
What is the distributional hypothesis?
Similar context suggests similar meanings
In distributional semantics, we want to find f, where f is
a function that takes in and transforms and compresses contexts to produce a vector that encompasses the meaning of a word
meaning(w) = f(c1, c2, c3, c4)
How do we find function, f?
use co-occurrence vectors
what is a cooccurrence vector?
collect a corpus of documents or sentences
apply basic preprocessing like lower case
count how many times word u appears with word v
the meaning of u is vector [(count(u,v1), count(u,v2)…]
what are the benefits of cooccurrence vectors (3)
- meaning of a word is vector so we can compute similarities like the cosine similarities
- can visualise word meanings
- can directly use these vectors as input to machine learning models
what are the disadvantages of cooccurrence vectors
distributional semantics beyond words
cant capture all aspects of semantics