POS Tagging Flashcards
What are backoff and smoothing?
Smoothing attempts to distribute probability mass for unseen events. The most primtive way of achieving this is add one smoothing, which simply adds one to all counts.
Another approach is to backoff to unigram probabilities, i.e to distribute unseen probability mass proportional to the unigram probabilities.
How is POS tagging typically evaluated?
POS tagging algorithms are evaluated in terms of the percentage of correct tags. The standard assumption is that every word should be tagged with exactly one tag, which is either deemed to be correct or incorrect. But there are some words which can only be tagged in one way (e.g punctuation). High success rates are therefore misleading.
What is the baseline for POS Tagging?
A baseline is simply to assign the most probable tag on the basis of the training data. The ceiling is set by the performance of human annotators.