Hidden Markov Models Flashcards
What are the two sequences in a Hidden Markov Model (HMM)?
State sequence (hidden), also called path
Symbol sequence (observed)
What are the two matrices required to represent a Hidden Markov Model?
The emission and transition matrices
What are the advantages of pair HMMs over Needleman-Wunsch algorithm?
HMM transition and emission probabilities can be trained, NW substitution scores are fixed.
What are the five states in a pair HMM?
Start
M (Match or Mismatch)
X (nucleotide in sequence X aligned against gap in Y)
Y (nucleotide in sequence Y aligned against gap in sequence X)
End
How would you calculate the probability of a sequence s, given a Hidden Markov Model?
Using the forward algorithm. The probability of observing s is the sum of the last column in the forward table. This corresponds to sum of probabilities of all possible paths that can create s.
Give the formula for an element in the Forward table F_l(i), where l is the state and i is the position in the sequence s
F_l(i) = E_k(s_i) sum_k F_k(i-1) T_{kl}
Where sum_k signifies summing over all states in the previous column.
What algorithms can be used to compute the probability of state k for position i given sequence s?
The Forward and Backward algorithms can be used for this type of posterior decoding.
Give the formula for an element in the Backward table B_l(i), where l is the state and i is the position in the sequence s
B_l(i) = sum_k E_k(s_{i+1}) T_{lk}B_k(i+1)
Where sum_k signifies summing over all states in the previous column.
Give the formula for the probability of of state k for position i given sequence s?
P(\pi_i = k | s) = F_k(i) B_k(i) / P(s)
where F is the forward table
B is the backward table
and P(s) is the probability of s given the HMM
What are the differences between Viterbi algorithm and posterior decoding using forward and backward algorithms?
Posterior decoding gives the complete probability distribution of the state at each position. Viterbi only gives the most probable path.
Why is it possible, in principle, to predict protein function from the corresponding DNA sequence?
In principle,
- DNA sequence determines the amino acid sequence
- The amino acid sequence determines the protein folding pattern and 3D structure
- The 3D structure determines protein function
Explain the idea behind Profile HMM
Proteins with similar functions (families) have similar structures in their amino acid sequence.
We can train a HMM to match a family.
It is then possible to test new protein sequences with this HMM. A high probability score from the forward algorithm signifies that the protein might belong to the family.
List the three types of states used in Profile HMMS
Match states
Insertion states
Deletion states
What is the purpose of the Viterbi algorithm?
To find the most probable path in a given HMM that created a sequence.
Give the formula for an element in the Viterbi table table V_l(i), where l is the state and i is the position in the sequence s
E_k(s_i) max F_k(i-1) T_{kl}