Hidden Markov Models Flashcards
What types of probability are involved in creating a HMM?
- Conditional probability
- Joint probability
- Marginal probability
What do we need to calculate the probability of observing a sequence?
- The model
- Model parameters (transition and emission probabilities)
- The coin used for each toss
How probable are the observations under a specified model?
Forward algorithm
What are the most probable hidden states of a model for the observations?
Viterbi algorithms (this algorithm shows all the possible probability paths)
How can we learn the HMM parameters given a set of sequences?
- Training a forward-backward algorithm
- Baum-Welch expectation maximization
Why are CpG islands underrepresented?
Because the cytosine is modified by methylation, and methylated C easily mutates into T
Where is methylation suppressed?
Around promoters and start regions of genes. There is a higher frequency of CpG islands in these regions.
How do we build a HMM model for sequence profiles?
- Use an MSA to find conserved regions associated with signalling, structure, or activity
- Use the MSA to train a sequence HMM profile
- Search for similar sequences that have a good fit to the HMM profile
How do we find similar sequences?
- Search proteome databases
- For an unknown sequence, find the probability that it came from the sequence model profile and use a threshold to determine the entry
- If likelihood > threshold, add to the protein family and update/train the sequence profile with the new sequence
- Iterate