Algorithms for HMM Flashcards

1
Q

What is the formal definition of the goal of a HMM?

A

Find the best tagging sequence T for a given untagged sequence S. argmaxTp(T|S)

That is, given a sequence of words, we want to find the sequence of tags that could have created it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How a HMM relates to a FSM

A

States: the tags
State transition probabilities: p(ti|ti-1)
An output alphabet: the words
Emission probabilities: p(wi|ti)
Initial state: the start of the sentence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does P(S|T)P(T) represent and why are we trying to find it?

A

P(T|S) = (P(S|T)P(T))/P(S)

P(S|T)P(T) represents the probability of a given tag sequence T producing S.

The tag T that maximises this probability is the most probable tag sequence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

If we were to compare the probabilities of different tagging sequences to find the maximum. How many different sequences must we compare if we have c possible tags for a sentence of n words?

A

c to the power of n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Why do we use the Viterbi algorithm?

A

To reduce the computational cost of comparing every possible tagging sequence to one another.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What type of algorithm is the Viterbi algorithm?

A

Dynamic Programming.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Explain Viterbi algorithm to someone

A

-

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Explain Expectation-Maximisation algorithm to someone

A

-

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What other tasks can the Viterbi algorithm be used for besides finding the most probable sequence of tags?

A

1) (Forward-Algorithm) Compute the likelihood of a sequence of outputs 𝑃(𝑂|𝜆), i.e., the probability of a sentence regardless of tags (a language model!) - marginalise out the states to find the probabilities of the outputs

2) (Forward-Backwards Algorithm) Learn the best set of parameters λ= (A, B) given only an unannotated corpus of sentences.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What type of training does a HMM require? (supervised, unsupervised, semi-supervised)

A

supervised

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the features of a HMM model for POS-tagging?

A

-States (POS tags)
-State transition probabilities between words and states
-An emission alphabet (represents the words)
-Emission probabilities
-Start and End states

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the EM-algorithm guaranteed to do?

A

Find a local maximum of the likelihood, so it does not necessarily find the global maximum, it will probably need to be run multiple times.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Name a major assumption for HMMs that make it not viable for some contexts

A

It is sensitive to how good a model is. It has a strong independence assumption, it finds local maximums.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How to overcome EM-algorithm local maximums

A

Early stopping, better initialisation of parameters, random restarts.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly