Week 5 - Word Sense Disambiguation Flashcards
Word Sense
one of the meanings of a word in a linguistic
Word Sense Disambiguation
(WSD) is the NLP task of selecting which sense of a word is used in a given piece of text (e.g. a sentence) from a set of multiple known possibilities
WSD applications
Machine translation - lexical choices for words that need different translations for different senses
Information Retrieval - Search choices for queries that are relevant to different topics for different senses
Bioinformatics - Assign a species identifier (e.g. human, mouse) to a gene and gene product entity (e.g. proteins)
Medical Applications - Find the correct meaning of acronyms in clinical text
Typical WSD approaches
Knowledge-based
- Use external lexical resources like dictionaries, thesaurus
Supervised machine Learning
- Use labelled training examples
Lesk Algorithm
Examine the definition overlap in all possible sense combinations among all the words in a given text
Implementation
- Retrieve from the dictionary all sense definitions of the words in the given piece of text
- Calculate the definition overlaps for all possible sense configurations
- Choose the senses that offers the highest overlap
Disadvantage:
Very impractical for long sentences
Disambiguating all words in the sentence takes m1xm2xm3x…xmn where mi is the number of definitions of the ith word
Simplified Lesk Algorithm
A faster version of Lesk for longer sentences
Examines overlap between sense definition of a word and its current context
Compare the senses to the context (the given sentence)
Disambiguating all words in the sentence takes m1+m2+m3+…+mn where mi is the number of definitions of the ith word
Corpus Lesk approach
Enhance performance using labelled data
Enhance the sense definition with labelled data
Add labelled examples to the definitions
Weigh each overlapped word using a weight
Examples says the idf of the word overlapping between the target sentence and the sense definition
Supervised machine learning - goal
To predict the output for an input data pattern
Training examples
A set of example data patterns are provided, where the ground-truth output is known for each example
Predictive mapping
A mapping from an input data pattern and the desired output built from training examples
Annotated training corpus
A collection of training examples
Classification
Assign an input data pattern to one of a pre-defined set of classes (categorical output)
Converting WSD to classification
An input data pattern: a word in context
Pre-defined set of classes: dictionary senses (called tag set)
Training corpus: A collection of words tagged in context with their sense
One option is to train one classifier to identify the sense for one word. N words in the dictionary requires to build N classifiers
Building a WSD classifier
Given an annotated corpus
Find a way to characterise each word pattern (along with its context) with a set of features (feature extraction)
With existing tools:
Choose a classifier (classification algorithm)
Train the classifier using the training examples
Test the trained classifier using new examples (evaluation)
Bag of word features (WSD)
Example:
“An electric guitar and bass player stand off to one side not really part of the scene”
+/-2 window, what is the set of features for “bass”
Based on words occurring anywhere with a window of the target word
Consider frequency (occurrence counts)
Answer: {guitar, and, player, stand}