POS tagging Flashcards
What is POS
Part of speech
a linguistic category of words, which is generally
defined by their syntactic or morphological behaviour
POS explains not what the word is about, but how it is used
What are open word classes
Constantly acquire new members
verb, noun, adverb, adjective, interjection
e.g. nouns: Internet, blog, Covid,
§ e.g. verbs: to google, to tweet, to self-isolate
Two classes for all words in POS tagging
open word classes
closed word classes
What are closed word classes
Generally do not acquire new members
pronouns, prepositions, conjunctions
e.g. prepositions: to, from, in
§ e.g. pronouns: I, you, he/she/it, we, you, they
What are content(lexical) words
words that carry the content or the meaning of a sentence
open class words
What are function words
have little lexical meaning, but instead serve to
express grammatical relationships
such as articles (the) or conjunctions (and) can be found in almost any utterance, no matter what it is about
Usually not inflected
What is the main issue with POS tagging
same word form can have different POS tags depending on the context: walk as a verb or a noun
The main task of POS tagging is to resolve the lexical ambiguities given the context
Internal Cues for POS tagging
Morphology is used for unknown words
extract-ed -> ed is common for verbs
External Cues for POS tagging
Using context
I will book a ticket
In this context book is a verb as will is usually follow by a verb
What are rule-based taggers
Uses a large set of rules
1. Starts with a dictionary
2. Assign all possible tags to words from dict
3. Apply rules to selectively remove tags leaving the correct tag for each word
What are stochastic taggers
resolve ambiguities by using a training dataset
to estimate the probability of given word having
a given POS in a given context
Considers all possible sequences of tags
Chooses the most probable given the sequence
Requires a trained POS tagged corpus
Want the highest P(t1…tn | w1…wn)
Stochastic tagging formula
argmax f(x) means the x for which f(x) is maximised
argmax P(POS|word) = argmaxP(word|POS) x P(POS)
What is the transition probability
what is the probability of tag ti following tag ti-1 (like n-gram does for words)
Eg tags: DT, NN
P(DT |NN) = C(NN,DT) / C(NN)
We are calculating the probability of seeing the DT tag after the NN tag
What is the emission probability
given a tag ti, how likely is it that the corresponding word is wi
P(is | VBZ) = C(VBZ, is) / C(VBZ)
C(VBZ, is): The count of occurrences where the pos tag is VBZ and the word is ‘is’
What are transformation taggers
shares features from both rule based and stochastic tagging
rules are automatically induced from a
previously tagged training dataset