Hidden Markov Models Flashcards
What is a Hidden Markov Model?
Combine aspects of the Markov chain model and feature based classifier
What is the difference to a Markov Model and a HMM?
The states of the Markov chain cannot be observed. Instead of producing a sequence of states, in the HMM each state emits a feature vector according to an emission probability p(xt | st). All we observe are a sequence of feature vectors and we cannot be sure which state has produced them.
What is an emission probability distribution?
Describes the distribution of features when each utterance is being spoken.
p(xt | st = SIL)
p(xt | st = yes)
p(xt | st = no)
In a HMM every time we make a transition to a new state we….
multiply the probability that the current feature vector was emitted by that state i.e.
p(x1, x2, …, xT, s1, s2, …sT) = p(s1)p(x1 | s1)p(s2 | s1)p(x2 | s2)….p(sT | sT-1)p(xT | sT)p(STOP | sT)
However we don’t know the state sequence s1…sT as it it hidden
What two approaches allow us to compute the probability of the observed sequence of features?
Classification, Decoding
How can we calculate the probability of a sequence of features using classification?
Bayes Theorem
p(C1 | x1, x2, …, xT) = p(x1, x2, …, xT | C1)p(C1) / p(x1, x2, …, xT | C1)p(C1) + p(x1, x2, …, xT | C2)p(C2)
calculate p(x1, x2, …, xT | C1) using the forward algorithm
p(x1, s1) = (x1 | s1)p(s1)
p(x1, x2, …, xt, st) = p(xt | st) . sum from 2 to T p(st | st-1)p(x1, x2, xt-1, st-1)
p(x1, x2, …, xT) = sum for each st p(x1, x2, …xT, sT)p(Stop | sT)
How do we recognise whole sentences?
Decoding. We want to determine which state sequence corresponds to a particular sequence of feature vectors. Calculate s* which is most likely given the sequence of feature vectors.
S* = argmax (s1, s2…sT) p(s | x1, x2, …xT)
Argmax means p(s | x1, x2, …xT) maximised with respect to (s1, s2, …sT)
Why is decoding an optimisation problem?
we are finding the state sequence that optimises a particular quantity
What algorithm works out a decoding?
Viterbi Algorithm.
How are transition and emission probabilities estimated?
Fitting them to training data. Transition probabilities calculated using counts. Emission probabilities estimated in a similar fashion to the normal density models