Multiple Alignments Flashcards
Hidden Markov Models
Observed sequence produced by a number of distinct states. Each state has a different probability distribution
Transition matrix: probability of transition between states. Row is current, Column is next.
T(1,2) proba of state 1 to 2
Emission matrix: probability of each symbol being observed in a sequence being generated by each state. Row is current state, column is outcome
E(1,2) proba of outcome 2 when in state 1
probability of sequence being s state through the position is the multiplication for each position of the index in T corresponding to the current state to final state proba.
total proba of the sequence is (proba of the outcome)^k (proba staying on state or change state)^l
k and m being how many times it happened
Also need to multiply the proba of the start: initial probability
Evaluation
given the HMM model and the sequence, evaluate the proba of the sequence
FORWARD ALGORITHM
Decoding
given the HMM model and the sequence, find the sequence of states that maximize the probability
VITERBI ALGORITHM
Learning
given a HMM model and sequence
without transition and emission matrix
find the parameters of the matrices to max the probability
EXPECTATION MAXIMISATION ALGORITHM (EM)
HMM Sequence Analysis
Segment sequence in a set of states that have different patterns. Produce a model that generates these.
Can spot different states by watching density plots (change in nucleotide or AT GC)
DNA segmentation
Can find genes: better than sequence alignment. Can detect introns and exons. However more computational intensive
Example
Different distribution patterns can show that some proteins are in a different layer
Proba of outcome u when in state s
P = proba no transition * p (u in s) + proba transition * p(u in new state)
Genes with HMM
sub HMM for different parts of the sequence, introns/exons; non codings dna , etc
use biological knowledge
VEIL
Viterbi Exon Intron Locator
HMM with Upstream Downstream codins, start/stop codon, exon -> splice sites and introns
Submodels
Have 3 columns for each nucleotide in a codon, then stransition between them that lead to codons or stop codons
Profile Hidden Markove Models
pHMM
MULTIPLE ALIGNMENT
model aligment of a set of sequences using HMM
diamonds for insertion states
squares for states that are matched/mismatched
circle for deletion states
arrows link everything + arrows to itself for insertion
matched are for when more then half of the sequences have a letter.
Insertion have a letter and more than half have -
Deletion, other have a letter but you have -
Compute transition probabilities: how much times the transition (go through the pHMM with each sequence given) was used vs the others for that state with
P(M1->M2), P(M1->I2), P(M1->D2) with LAPLACE ESTIMATION for no 0
Laplace Estimation
P(outcome) = (n times outcome happened + 1) / ( number of sample + possible outcomes)
Expectation Maximization Algorithm
For learning the Transition and Emission matrix of a HMM
initialize h E T
given the sequence s and h, estimate E and T by counting symbols
then estimate E T with viterbi algorithm (VEIL)
Repeat until meeting criterion