POS Tagging and HMMs Flashcards
POS Tagging
Tree Representation
- Language hierarchal structure, so can represent with trees
- Syntax requires the use of trees (sentences have same shallow analysis)
POS Tagging
POS Tagging
- Different words can have multiple possible POS tags
- Different tags produce different sentence meaning
POS Tagging
What governs correct POS tag choice?
- Word + context
- Word identity: most words have <=2 tags, may have just one
- Context: nouns start sentences, nouns follow verbs
POS Tagging
What is POS tagging good for?
- Text-to-speech record, lead
- Preprocessesing step for syntatic parsers or other tasks
- Very shallow information extraction (isn’t able to get full meaning)
HMMs
Hidden Markov Models (HMMs)
- Model the sequence of tags y over words x as a Markov process
- Uses Markov property
HMMs
Markov Property
- Future is conditionally independent of the past given the present
- The next tag only corresponds on the current tag, not anything before
HMMs
Notations
- Input x=(x1, …, xn), set of words
- xi E V = vocab of words
- Output y=(y1, …, yn), set of POS tags
- yi E T = set of possible tags (including STOP)
HMMs
Equation
P(x,y)=P(y1)(n,product,i=2 -> P(yi|yi-1)) (n,product,i=1 -> P(xi|yi))
Initial Dist: P(y1)
Transition Probs: (n,product,i=2 -> P(yi|yi-1))
Emission Probs: (n,product,i=1 -> P(xi|yi))
HMMs
Parameters
Initial Dist: |T|x1 vector (dist over initial states)
Emission Dist: |T|x|V| matrix (dist over words per tag)
Transition Dist: |T|x|T| matrix (dist over next tags per tag)
Training HMMs
Maximum Likelyhood Estimation
- Read POS tag counts
- Normalize counts
Training HMMs
Transitions
- Count up all pairs (yi, yi+1) in the training data
- Count up occurences of what tag T can transition to
- Normalize to get distribution for P(next tag|T)
- Dist must be smoothed
Training HMMs
Emissions
Similar to Transitions, but harder smoothing
Training HMMs
Initials
- Count up occurences of possible first-positions tags
- Perform smoothing
HMM Inference
Viterbi Algorithm
- Used for inference in HMMS
- At test time, for a given sentence, what POS tag sequence should be predicted?
- “Think about” all possible immediate prior state values. Everything before that has already been accounted for by earlier stages
HMM Inference
Inference Problem
argmax_y P(y|x) = argmax_y P(y,x)
(because P(y,x)/P(x) cancels to P(y,x))
Produces exponentially many possible y
what is solution? …