Final Test Flashcards
Fundamental HMM Assumptions (3)
- Observation independence assumption
- First-order Markov assumption
- Transitions are time-independent
Observation independence assumption
Likelihood of the t’th feature vector depends only on the current state, and is therefore otherwise unaffected by previous states and feature vectors.
First-order Markov Assumption
Apart from the immediately preceding state, no other previously observed states or features affect the probability of occurrence of the next state.
Time-independent transition
We assume the transition probability between two states to be constant irrespective of the time when the transition actually takes place.
3 Steps of Viterbi Re-estimation
- Initialisation
- EM Re-estimation
- Termination
Viterbi Re-estimation: Initialization: 2 Steps
a) For every training observation sequence, assign in a sensible manner, a corresponding state sequence and extend it by adding the initial and termination state.
b) From this initial state sequence, generate an initial model.
Viterbi Re-estimation: EM Re-estimation: 2 Steps
Expectation Step: For the current model estimate, apply the viterbi algorithm on every training sequence to calculate its log-likelihood given S*, the expected state sequence for this observation sequence. Accumulate the “score” to be used later to test for convergence.
Maximization step: Use all the S*’s obtained in the E-step to update the parameters of the HMM.
Viterbi Re-estimation: Termination
Compare the total score obtained in the E-step to that from the previous E-step. If it within an acceptable tolerance, terminate, otherwise continue with re-estimation, (step 2).
Goal of dimensionality reduction
To project the data onto a lower dimensional space, while retaining some of the essential characteristics of the data.
PCA approach to dimensionality reduction
PCA finds lower dimensional subspaces that describe the essential properties of the data, by finding the directions of maximum variation in the data.
LDA approach to dimensionality reduction
LDA reduces the dimension of the data values in such a way that maximum class separation is obtained in the lower dimensional space.
Dynamic programming algorithm
An algorithm that uses a table to store intermediate values as it builds up the probability of the observation sequence.
What does the forward algorithm do?
Computes the observation probability by summing over the probabilities of all possible hidden state paths that could generate the observation sequence, but it does so efficiently by implicitly folding each of these paths into a single forward trellis.
HMMs: Decoding task
Given as input an HMM, and a sequence of observations, find the most probable sequence of states.
HMMs: Learning task
Given an observation sequence O and the set of possible states in the HMM, learn the HMM parameters.