7. pw, DP Flashcards
What is the difference between similarity/identity and homology?
homology: all-or-nothing condition (homologous or not homologous)
similarity / identity: quantitative measure, can be eg 20%
Can homology be observed?
cannot be observed or known, just inferred
Comparative sequence analysis: starting with seq A and seq B, what kind of analysis can we do?
similarity / homology?
compute (optimal) alignment
What does sytenic mean
(of genes) occurring on the same chromosome.
What is a dotplot?
What signals do they give?
In bioinformatics a dot plot is a graphical method for comparing two biological sequences and identifying regions of close similarity after sequence alignment.
It is a type of recurrence plot.
signal
- identity, similarity
- length of consecutive signals
Define the pairwise sequence alignment (genes)
the comparison & arranging of two sequences by
* searching for pairwise matches and “good
mismatches” between their characters
* possibly inserting gaps in each sequence
What do we need to consider when obtaining a scoring matrix?
- observe trusted alignments of related proteins
- which residues are paired? (i.e., which substitutions have occurred?)
- different values for sequences of different evolutionary divergence!
- different scoring matrices for further diverged sequences!
Name two approaches for amino acid scoring matrices
What are their origins?
PAM (compiled by Margaret Dayhoff and her colleagues in the 1970s - very little data)
BLOSUM (Steven and Jorja Henikoff in 1992)
PAM matrices
what are they based on?
what does PAM1 imply?
Point Accepted Mutation
- based on observed amino acid substitutions in families of evolutionarily related proteins
- PAM1 implies 1 substitution per 100 amino acid,accepted by the processes of natural selection
PAM matrices
how do we get PAM250?
extrapolation of values for more distantly related proteins:
PAM250 = (PAM1)250
PAM matrices
What are the guidelines for which PAM matrix to choose?
PAM250 for proteins of 20% identity
PAM120 for proteins of 40% identity
PAM60 for proteins of 60% identity
BLOSUM matrices
what does BLOSUM stand for
What is it based on
BLOcks amino acid SUbstitution Matrices
based on local alignments of divergent sequences
BLOSUM matrices
How do we get different BLOSUM matrices?
eg BLOSUM50?
different BLOSUM matrices are not extrapolated but based on observed alignments
eg BLOSUM50 matrix is derived from alignments of sequences that are 50% identical
You want to compare two sequences that you believe may be distantly related.
How would you choose a BLOSUM matrix? a PAM matrix?
Choose a BLOSUM with a lower number
Choose a PAM with a higher number
(Maybe start with BLOSUM62 and then adjust)
BLOSUM matrices
What are the guidelines for how to choose a BLOSUM matrix?
eg when would you choose BLOSUM50?
guideline: a BLOSUM matrix index should approximately match the percent identity of the sequences to be aligned
–> BLOSUM50 matrix is best used for sequences
that are 50% identical