Multiple Alignments Flashcards

Question 1

Q

Hidden Markov Models

Answer

A

Observed sequence produced by a number of distinct states. Each state has a different probability distribution
Transition matrix: probability of transition between states. Row is current, Column is next.
T(1,2) proba of state 1 to 2
Emission matrix: probability of each symbol being observed in a sequence being generated by each state. Row is current state, column is outcome
E(1,2) proba of outcome 2 when in state 1

probability of sequence being s state through the position is the multiplication for each position of the index in T corresponding to the current state to final state proba.

total proba of the sequence is (proba of the outcome)^k (proba staying on state or change state)^l
k and m being how many times it happened
Also need to multiply the proba of the start: initial probability

Question 2

Q

Evaluation

Answer

A

given the HMM model and the sequence, evaluate the proba of the sequence
FORWARD ALGORITHM

Question 3

Q

Decoding

Answer

A

given the HMM model and the sequence, find the sequence of states that maximize the probability
VITERBI ALGORITHM

Question 4

Q

Learning

Answer

A

given a HMM model and sequence
without transition and emission matrix
find the parameters of the matrices to max the probability
EXPECTATION MAXIMISATION ALGORITHM (EM)

Question 5

Q

HMM Sequence Analysis

Answer

A

Segment sequence in a set of states that have different patterns. Produce a model that generates these.
Can spot different states by watching density plots (change in nucleotide or AT GC)
DNA segmentation

Can find genes: better than sequence alignment. Can detect introns and exons. However more computational intensive

Example
Different distribution patterns can show that some proteins are in a different layer

Question 6

Q

Proba of outcome u when in state s

Answer

A

P = proba no transition * p (u in s) + proba transition * p(u in new state)

Question 7

Q

Genes with HMM

Answer

A

sub HMM for different parts of the sequence, introns/exons; non codings dna , etc
use biological knowledge

Question 8

Q

VEIL

Answer

A

Viterbi Exon Intron Locator
HMM with Upstream Downstream codins, start/stop codon, exon -> splice sites and introns
Submodels

Have 3 columns for each nucleotide in a codon, then stransition between them that lead to codons or stop codons

Question 9

Q

Profile Hidden Markove Models

Answer

A

pHMM
MULTIPLE ALIGNMENT
model aligment of a set of sequences using HMM

diamonds for insertion states
squares for states that are matched/mismatched
circle for deletion states
arrows link everything + arrows to itself for insertion

matched are for when more then half of the sequences have a letter.
Insertion have a letter and more than half have -
Deletion, other have a letter but you have -

Compute transition probabilities: how much times the transition (go through the pHMM with each sequence given) was used vs the others for that state with
P(M1->M2), P(M1->I2), P(M1->D2) with LAPLACE ESTIMATION for no 0

Question 10

Q

Laplace Estimation

Answer

A

P(outcome) = (n times outcome happened + 1) / ( number of sample + possible outcomes)

Question 11

Q

Expectation Maximization Algorithm

Answer

A

For learning the Transition and Emission matrix of a HMM
initialize h E T
given the sequence s and h, estimate E and T by counting symbols
then estimate E T with viterbi algorithm (VEIL)
Repeat until meeting criterion

Multiple Alignments Flashcards

(11 cards)