Hidden Markov Models Flashcards

1
Q

What is a Hidden Markov Model?

A

Combine aspects of the Markov chain model and feature based classifier

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the difference to a Markov Model and a HMM?

A

The states of the Markov chain cannot be observed. Instead of producing a sequence of states, in the HMM each state emits a feature vector according to an emission probability p(xt | st). All we observe are a sequence of feature vectors and we cannot be sure which state has produced them.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is an emission probability distribution?

A

Describes the distribution of features when each utterance is being spoken.
p(xt | st = SIL)
p(xt | st = yes)
p(xt | st = no)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

In a HMM every time we make a transition to a new state we….

A

multiply the probability that the current feature vector was emitted by that state i.e.
p(x1, x2, …, xT, s1, s2, …sT) = p(s1)p(x1 | s1)p(s2 | s1)p(x2 | s2)….p(sT | sT-1)p(xT | sT)p(STOP | sT)
However we don’t know the state sequence s1…sT as it it hidden

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What two approaches allow us to compute the probability of the observed sequence of features?

A

Classification, Decoding

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How can we calculate the probability of a sequence of features using classification?

A

Bayes Theorem
p(C1 | x1, x2, …, xT) = p(x1, x2, …, xT | C1)p(C1) / p(x1, x2, …, xT | C1)p(C1) + p(x1, x2, …, xT | C2)p(C2)
calculate p(x1, x2, …, xT | C1) using the forward algorithm
p(x1, s1) = (x1 | s1)p(s1)
p(x1, x2, …, xt, st) = p(xt | st) . sum from 2 to T p(st | st-1)p(x1, x2, xt-1, st-1)
p(x1, x2, …, xT) = sum for each st p(x1, x2, …xT, sT)p(Stop | sT)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How do we recognise whole sentences?

A

Decoding. We want to determine which state sequence corresponds to a particular sequence of feature vectors. Calculate s* which is most likely given the sequence of feature vectors.
S* = argmax (s1, s2…sT) p(s | x1, x2, …xT)
Argmax means p(s | x1, x2, …xT) maximised with respect to (s1, s2, …sT)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Why is decoding an optimisation problem?

A

we are finding the state sequence that optimises a particular quantity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What algorithm works out a decoding?

A

Viterbi Algorithm.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How are transition and emission probabilities estimated?

A

Fitting them to training data. Transition probabilities calculated using counts. Emission probabilities estimated in a similar fashion to the normal density models

How well did you know this?
1
Not at all
2
3
4
5
Perfectly