POS Tagging and HMMs Flashcards

1
Q

POS Tagging

Tree Representation

A
  • Language hierarchal structure, so can represent with trees
  • Syntax requires the use of trees (sentences have same shallow analysis)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

POS Tagging

POS Tagging

A
  • Different words can have multiple possible POS tags
  • Different tags produce different sentence meaning
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

POS Tagging

What governs correct POS tag choice?

A
  • Word + context
  • Word identity: most words have <=2 tags, may have just one
  • Context: nouns start sentences, nouns follow verbs
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

POS Tagging

What is POS tagging good for?

A
  • Text-to-speech record, lead
  • Preprocessesing step for syntatic parsers or other tasks
  • Very shallow information extraction (isn’t able to get full meaning)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

HMMs

Hidden Markov Models (HMMs)

A
  • Model the sequence of tags y over words x as a Markov process
  • Uses Markov property
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

HMMs

Markov Property

A
  • Future is conditionally independent of the past given the present
  • The next tag only corresponds on the current tag, not anything before
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

HMMs

Notations

A
  • Input x=(x1, …, xn), set of words
  • xi E V = vocab of words
  • Output y=(y1, …, yn), set of POS tags
  • yi E T = set of possible tags (including STOP)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

HMMs

Equation

A
P(x,y)=P(y1)(n,product,i=2 -> P(yi|yi-1)) (n,product,i=1 -> P(xi|yi))  

Initial Dist: P(y1)
Transition Probs: (n,product,i=2 -> P(yi|yi-1))
Emission Probs: (n,product,i=1 -> P(xi|yi))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

HMMs

Parameters

A

Initial Dist: |T|x1 vector (dist over initial states)
Emission Dist: |T|x|V| matrix (dist over words per tag)
Transition Dist: |T|x|T| matrix (dist over next tags per tag)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Training HMMs

Maximum Likelyhood Estimation

A
  • Read POS tag counts
  • Normalize counts
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Training HMMs

Transitions

A
  • Count up all pairs (yi, yi+1) in the training data
  • Count up occurences of what tag T can transition to
  • Normalize to get distribution for P(next tag|T)
  • Dist must be smoothed
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Training HMMs

Emissions

A

Similar to Transitions, but harder smoothing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Training HMMs

Initials

A
  • Count up occurences of possible first-positions tags
  • Perform smoothing
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

HMM Inference

Viterbi Algorithm

A
  • Used for inference in HMMS
  • At test time, for a given sentence, what POS tag sequence should be predicted?
  • “Think about” all possible immediate prior state values. Everything before that has already been accounted for by earlier stages
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

HMM Inference

Inference Problem

A
argmax_y P(y|x) = argmax_y P(y,x)

(because P(y,x)/P(x) cancels to P(y,x))
Produces exponentially many possible y

what is solution? …

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

HMM Inference

Inference Problem Solution

A

Dynamic programming (possible because of Markov structure)

17
Q

HMM POS Tagging

Baseline POS Tagging

A

Assign each word its most frequent tag (90% accuracy)

18
Q

HMM POS Tagging

HMM-Based POS Taggers

A

Higher accuracy on both known and unknown/unseen words