Classification, with word sense disambiguation example Flashcards
Supervised machine learning
Learn from training data where word sense is already labelled.
Apply to test data.
Approaches:
- Bag of words
- Word n-grams
Support vector machines
Interpret the feature list geometrically, and try to derive a separating hyperplane between positive and negative
Encode features as numbers and think of them as coordinates
Naive Bayes classifiers
Probabilistic model
Learn
- Probability for different labels
- The probability of co-occurrence of a feature and a label
The assumptions used to derive the probability of a sense given a feature list
- Bayes rule
- The assumption of independence between features
How to compute the probability of a sense given a feature list
P(sense | feature list)
Using Bayes Rule, the equals argmax P(feature list | sense) * P(sense)
How to compute the probability of a sense
P(sense)
= Count (sense) / Count (all sense)
How to compute the probability of a feature given a sense
Assume that all features appear independently:
P(feature list | sense)
= P(feature1 | sense) * … * P(featureN | sense)
Calculate using counts (maximum likelihood estimation)
P (“red” | well known person)
= count (label W & feature “red”)/
count (label W)