POS Tagging Flashcards
What are POS tags?
Labels used for annotating words that come from a tagset
What different tagsets are there?
Penn treebank and the universal scheme
What are the pros and cons of the penn treebank tagset?
thorough but not intuitive
What are the pros and cons of the universal scheme tagset?
there are fewer tags, they are more generic
three classes within- open class, closed class and other
What are the challenges in POS Tagging?
syntactic ambiguity
different syntactic roles
How can we disambiguate a token?
use the surrounding tokens.
We can also return multiple possible taggings, but each token must have only one tag in each instance
What are the approaches to POS tagging?
Rule based, using syntactic Knowledge
statistical, using corpus evidence
What is a POS Corpus?
A gold standard with every token labelled with a POS tag by an expert
What are the two formats for POS tagging?
- each line is a word-tag pair
- one sentence per line
ADJ
adjective
ADV
adverb
INTJ
interjection
word or phrase that is grammatically independent from the words around it
E.g. hurrah, cheers, aww
NOUN
noun
PROPN
prpernoun
VERB
verb