Algorithms for HMM Flashcards
What is the formal definition of the goal of a HMM?
Find the best tagging sequence T for a given untagged sequence S. argmaxTp(T|S)
That is, given a sequence of words, we want to find the sequence of tags that could have created it.
How a HMM relates to a FSM
States: the tags
State transition probabilities: p(ti|ti-1)
An output alphabet: the words
Emission probabilities: p(wi|ti)
Initial state: the start of the sentence
What does P(S|T)P(T) represent and why are we trying to find it?
P(T|S) = (P(S|T)P(T))/P(S)
P(S|T)P(T) represents the probability of a given tag sequence T producing S.
The tag T that maximises this probability is the most probable tag sequence.
If we were to compare the probabilities of different tagging sequences to find the maximum. How many different sequences must we compare if we have c possible tags for a sentence of n words?
c to the power of n
Why do we use the Viterbi algorithm?
To reduce the computational cost of comparing every possible tagging sequence to one another.
What type of algorithm is the Viterbi algorithm?
Dynamic Programming.
Explain Viterbi algorithm to someone
-
Explain Expectation-Maximisation algorithm to someone
-
What other tasks can the Viterbi algorithm be used for besides finding the most probable sequence of tags?
1) (Forward-Algorithm) Compute the likelihood of a sequence of outputs 𝑃(𝑂|𝜆), i.e., the probability of a sentence regardless of tags (a language model!) - marginalise out the states to find the probabilities of the outputs
2) (Forward-Backwards Algorithm) Learn the best set of parameters λ= (A, B) given only an unannotated corpus of sentences.
What type of training does a HMM require? (supervised, unsupervised, semi-supervised)
supervised
What are the features of a HMM model for POS-tagging?
-States (POS tags)
-State transition probabilities between words and states
-An emission alphabet (represents the words)
-Emission probabilities
-Start and End states
What is the EM-algorithm guaranteed to do?
Find a local maximum of the likelihood, so it does not necessarily find the global maximum, it will probably need to be run multiple times.
Name a major assumption for HMMs that make it not viable for some contexts
It is sensitive to how good a model is. It has a strong independence assumption, it finds local maximums.
How to overcome EM-algorithm local maximums
Early stopping, better initialisation of parameters, random restarts.