Topic 5: Word Sense Flashcards
recap on word ambiguity
same word can be used to mean different things.
“mouse”
- small rodent
- hand-operated device to control a cursor
“bank”
- hold investments in a custodial account
- river bank
is called polysemous..from Greek word meaning having many senses.
what is word sense?
discrete representation of one aspect of the meaning of word
Wordnet
this is an online thesaurus. database that represents word senses
Word sense disambiguation
task of determining disambiguation which sense of a word is being used in a particular context.
Homonym and orthographic
example bank1 and bank2 have same orthographic form but sense are unrelated
in this case, it’s homonym.
if the words are homonym, it means the words that have the same spelling and pronunciation but different meanings and origins.
Homograph
words that are not necessarily pronounce the same and having different meanings and origins.
bow1 and bow2
Homophones
each of two or more words have same pronunciation but different meaning, origins or spelling.
Dictionary or thesauruses
document/database that give textual definition for sense called glosses.
Dictionary
contain many fine-grained senses to capture meaning differences
Glosses
not a formal meaning representation. is written for people
Sentence embedding
sentences have glosses along to help build sense representation.
Relation between sense: synonymy
two sense of two different words which are identical or nearly identical
synonymy is a relationship between senses rather than words
example
count/sofa
vomit/throwup
car/automobile
Relation between sense: antonym
words with an opposite meaning example: long/short big/little fast/slow cold/hot dark/light rise/fall up/down in/out
Taxonomic relations: hyponym
and hypernym
One sense is a hyponym of another sense if the first sense is more specific
example
car is hyponym of vehicle
dog is hyponym of animal
mango is hyponym of fruit
hypernym is the other way around
superordinate is often used instead of hypernym
superordinate - subordinate
meronymy
part-whoe relation
example
leg is part of chair
wheel is part of car
Wordnet
English WordNet consists of nouns, verbs, adjectives-adverbs
8 sense for noun “bass”
usually have a gloss, synonym set and usage example
Synset
set of near-synonyms for a Wordnet sense..
a way of representing concept
synsets are the fundamental unit associated with Wordnet entries.
also labels synset with lexicographic category drawn from a semantic field.
Sense relation in wordnet
WordNet has two kinds of taxonomic entities: classes and instances
Verb relation in Wordnet
shows the relation such as hypernym, troponym, entails, antonym
Hyponymy Chain
Hyponymy chains for two
separate senses of the
lemma bass.
Note that the chains are
completely distinct, only
converging at the very
abstract level whole, unit
Thesaurus Method
Use the structure of the thesaurus to define word similarity.
Can use any information from gloss to synonym etc.
In practice, hypernym/hyponym hierarchy is used
the intuition is
Words or senses are more similar if there is a shorter path
between them in the thesaurus graph.
Path Length
Measure the number of edges between
the two concept nodes in the thesaurus
graph and adding one
pathlen(c1,c2) = 1 + the number of edges in
the shortest path in the thesaurus graph
between sense nodes c
1 and c2.
Path-Length based Similarity
path-length based similarity equation . ….
For most application, we do not have sense-tagged data, hence word
similarity algorithm gives similarity between words, taking the
maximum sense similarity
word similarity equation . . .
Information-Content Word Similarity
Rely on the structure of the thesaurus but also add probabilistic
information derived from a corpus
Define P(c) as the probability that a randomly selected word in a corpus is an instance of concept
P(root) = 1, since any word is subsumed by the root concept.
Intuitively, the lower a concept in the hierarchy, the lower its
probability.
Thesaurus with Probability
A fragment of the WordNet concept hierarchy augmented with the probabilities P(c)
Information Content Theory
Need two more definitions for the similarity computation, Information Content Theory (IC) and Lowest Common Subsumer (LCS)
equation
IC(c) = - logP(c)
Lowest Common Subsumer
LCS(c1,c2) = the lowest node in the hierarchy that subsumes both c1 and
c
Resnik Similarity
Think of similarity between two words as related to their
common information.
sim resnik 𝑐𝑐1, 𝑐𝑐2 = −logP(𝐿CS(𝑐𝑐1, 𝑐𝑐2))
Word Sense Disambiguation recap
the task of selecting the correct sense for a word
WSD algorithm takes the input of a word in context and a fixed
inventory of potential word senses and output the correct word sense
in context
WSD Datasets
The inventory of sense tags depends on the task.
need to make sure it is from the same domain
use set of sense from resource like WordNet or supersense if want a coarser-grain set
Baselines for WSD Systems
A surprisingly strong baseline
is simply to choose the most frequent sense for each word from
the senses in a labeled corpus
Supervised Word Sense Disambiguation
Labeled dataset - context sentences labeled with the correct sense for
the target word
can use any standard classification algo
as for the feature
- collocation features of words or n-grams of lengths 1, 2, 3
- Bag of word – words that occur in the neighborhood
- Weighted average of embeddings
- Part-of-speech tags (for a window of 3 words on each side,
stopping at sentence boundaries)