Final Test Flashcards
Fundamental HMM Assumptions (3)
- Observation independence assumption
- First-order Markov assumption
- Transitions are time-independent
Observation independence assumption
Likelihood of the t’th feature vector depends only on the current state, and is therefore otherwise unaffected by previous states and feature vectors.
First-order Markov Assumption
Apart from the immediately preceding state, no other previously observed states or features affect the probability of occurrence of the next state.
Time-independent transition
We assume the transition probability between two states to be constant irrespective of the time when the transition actually takes place.
3 Steps of Viterbi Re-estimation
- Initialisation
- EM Re-estimation
- Termination
Viterbi Re-estimation: Initialization: 2 Steps
a) For every training observation sequence, assign in a sensible manner, a corresponding state sequence and extend it by adding the initial and termination state.
b) From this initial state sequence, generate an initial model.
Viterbi Re-estimation: EM Re-estimation: 2 Steps
Expectation Step: For the current model estimate, apply the viterbi algorithm on every training sequence to calculate its log-likelihood given S*, the expected state sequence for this observation sequence. Accumulate the “score” to be used later to test for convergence.
Maximization step: Use all the S*’s obtained in the E-step to update the parameters of the HMM.
Viterbi Re-estimation: Termination
Compare the total score obtained in the E-step to that from the previous E-step. If it within an acceptable tolerance, terminate, otherwise continue with re-estimation, (step 2).
Goal of dimensionality reduction
To project the data onto a lower dimensional space, while retaining some of the essential characteristics of the data.
PCA approach to dimensionality reduction
PCA finds lower dimensional subspaces that describe the essential properties of the data, by finding the directions of maximum variation in the data.
LDA approach to dimensionality reduction
LDA reduces the dimension of the data values in such a way that maximum class separation is obtained in the lower dimensional space.
Dynamic programming algorithm
An algorithm that uses a table to store intermediate values as it builds up the probability of the observation sequence.
What does the forward algorithm do?
Computes the observation probability by summing over the probabilities of all possible hidden state paths that could generate the observation sequence, but it does so efficiently by implicitly folding each of these paths into a single forward trellis.
HMMs: Decoding task
Given as input an HMM, and a sequence of observations, find the most probable sequence of states.
HMMs: Learning task
Given an observation sequence O and the set of possible states in the HMM, learn the HMM parameters.
Viterbi algorithm use
To find the optimal sequence of hidden states.
Given an observation sequence and an HMM, the algorithm returns the state path through the HMM that assigns maximum likelihood to the observation sequence.
Forward-backward algorithm use
To train the parameters of an HMM, being the transition probability matrix and the observation likelihood matrix.
Sequence Model (or Classifier)
A model whose job is to assign a label or class to each unit in a sequence,
thus mapping a sequence of observations to a sequence of labels.
HMM
A probabilistic sequence model: given a sequence of units, they compute a probability distribution over possible sequences of labels and choose the best label sequence.
Markov chain
A special case of a weighted automaton in which weights are probabilities and in which the input sequence uniquely determines which states the automaton will go through.
3 fundamental problems that characterize hidden Markov models
- Likelihood: What’s the likelihood of an observation sequence given an HMM.
- Decoding: Given an observation sequence and an HMM, what’s the best hidden state sequence.
- Learning: Given an observation sequence and the set of states in the HMM, learn the HMM parameters.
Algorithms solving the 3 problems of HMMs
- Likelihood computation: The Forward Algorithm
- Decoding: The Viterbi Algorithm
- HMM Training: The Forward-Backward algorithm
Backward probability
The probability of seeing the observations from time t+1 to end, given that we are in state i at time t.
Contrast discriminative and generative models (TEXTBOOK)
In the case of generative models, a model is trained for each class, totally ignoring the properties of the properties of the other classes.
Discriminative models use all the training data simultaneously to generate the model. It can be used to good effect to try and maximise the differences between classes.
Generative Approach (TEXTBOOK)
A model is developed for every class from the observations known to belong to that class.
Once this model is known, it is in principle possible to generate observations for each class.
Discriminative Approach (TEXTBOOK)
Directly estimates the posterior from the data. This has the advantage that the data is used to best effect in order to discriminate between the classes.
Thus training data is used more effectively in distinguishing between classes.