Final Test Flashcards

1
Q

Fundamental HMM Assumptions (3)

A
  • Observation independence assumption
  • First-order Markov assumption
  • Transitions are time-independent
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Observation independence assumption

A

Likelihood of the t’th feature vector depends only on the current state, and is therefore otherwise unaffected by previous states and feature vectors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

First-order Markov Assumption

A

Apart from the immediately preceding state, no other previously observed states or features affect the probability of occurrence of the next state.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Time-independent transition

A

We assume the transition probability between two states to be constant irrespective of the time when the transition actually takes place.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

3 Steps of Viterbi Re-estimation

A
  • Initialisation
  • EM Re-estimation
  • Termination
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Viterbi Re-estimation: Initialization: 2 Steps

A

a) For every training observation sequence, assign in a sensible manner, a corresponding state sequence and extend it by adding the initial and termination state.
b) From this initial state sequence, generate an initial model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Viterbi Re-estimation: EM Re-estimation: 2 Steps

A

Expectation Step: For the current model estimate, apply the viterbi algorithm on every training sequence to calculate its log-likelihood given S*, the expected state sequence for this observation sequence. Accumulate the “score” to be used later to test for convergence.

Maximization step: Use all the S*’s obtained in the E-step to update the parameters of the HMM.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Viterbi Re-estimation: Termination

A

Compare the total score obtained in the E-step to that from the previous E-step. If it within an acceptable tolerance, terminate, otherwise continue with re-estimation, (step 2).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Goal of dimensionality reduction

A

To project the data onto a lower dimensional space, while retaining some of the essential characteristics of the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

PCA approach to dimensionality reduction

A

PCA finds lower dimensional subspaces that describe the essential properties of the data, by finding the directions of maximum variation in the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

LDA approach to dimensionality reduction

A

LDA reduces the dimension of the data values in such a way that maximum class separation is obtained in the lower dimensional space.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Dynamic programming algorithm

A

An algorithm that uses a table to store intermediate values as it builds up the probability of the observation sequence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does the forward algorithm do?

A

Computes the observation probability by summing over the probabilities of all possible hidden state paths that could generate the observation sequence, but it does so efficiently by implicitly folding each of these paths into a single forward trellis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

HMMs: Decoding task

A

Given as input an HMM, and a sequence of observations, find the most probable sequence of states.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

HMMs: Learning task

A

Given an observation sequence O and the set of possible states in the HMM, learn the HMM parameters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Viterbi algorithm use

A

To find the optimal sequence of hidden states.
Given an observation sequence and an HMM, the algorithm returns the state path through the HMM that assigns maximum likelihood to the observation sequence.

17
Q

Forward-backward algorithm use

A

To train the parameters of an HMM, being the transition probability matrix and the observation likelihood matrix.

18
Q

Sequence Model (or Classifier)

A

A model whose job is to assign a label or class to each unit in a sequence,

thus mapping a sequence of observations to a sequence of labels.

19
Q

HMM

A

A probabilistic sequence model: given a sequence of units, they compute a probability distribution over possible sequences of labels and choose the best label sequence.

20
Q

Markov chain

A

A special case of a weighted automaton in which weights are probabilities and in which the input sequence uniquely determines which states the automaton will go through.

21
Q

3 fundamental problems that characterize hidden Markov models

A
  • Likelihood: What’s the likelihood of an observation sequence given an HMM.
  • Decoding: Given an observation sequence and an HMM, what’s the best hidden state sequence.
  • Learning: Given an observation sequence and the set of states in the HMM, learn the HMM parameters.
22
Q

Algorithms solving the 3 problems of HMMs

A
  • Likelihood computation: The Forward Algorithm
  • Decoding: The Viterbi Algorithm
  • HMM Training: The Forward-Backward algorithm
23
Q

Backward probability

A

The probability of seeing the observations from time t+1 to end, given that we are in state i at time t.

24
Q

Contrast discriminative and generative models (TEXTBOOK)

A

In the case of generative models, a model is trained for each class, totally ignoring the properties of the properties of the other classes.
Discriminative models use all the training data simultaneously to generate the model. It can be used to good effect to try and maximise the differences between classes.

25
Q

Generative Approach (TEXTBOOK)

A

A model is developed for every class from the observations known to belong to that class.

Once this model is known, it is in principle possible to generate observations for each class.

26
Q

Discriminative Approach (TEXTBOOK)

A

Directly estimates the posterior from the data. This has the advantage that the data is used to best effect in order to discriminate between the classes.

Thus training data is used more effectively in distinguishing between classes.