Sequence Alignment Flashcards
what is sequence alignment?
bioinformatic method to arrange sequences in order to identify regions of similarity and evolutionary relationships
what is a query sequence
unknown uncharacterized sequence that we want to align to a known sequence or database
what is a reference sequence
the known sequence the query is aligned to
what do we align sequences
to search for close matching sequences
to assign function to genes, proteins, genomes (annotation)
to infer evolutionary relationships
determine the residue-residue correspondences
what are the two alignment types
global and local
what is global alignment
sequences aligned on their entire length, carried out form beginning to end
what is local alignment
only local regions with highest level of similarity are aligned
what are homologous sequences
evolved from a common ancestor, have similar 3D structure and function
what is the threshold for homologous sequences
more than 70% identitity
what is pairwise alignment
aligns two sequences at a time, gaps introduced to find the best match
what is dotplot alignment
a graphical representation of the sequence similarity between two sequences, rows correspond to residues of one sequence, columns of the other
what are some measures of sequence similarity
the hamming distance and levenshtein
what is hamming distance
two strings of equal length, the number of positions with mismatching characters without insertions or deletions
what is levenshtein distance
minimum number of edit operations required to change one string into another, including insertions, deletions and alterations
what controls the cost of insertions and deletions when computing alignments
gap penalties
what are the two parameters used for gap penalties
gap opening and gap extension
what scores does a simple scoring system give to a match, mismatch and gap penalty
match = +1
mismatch = +0
gap penalty = -1
what does a simple scoring system not take into account
the influence of molecular evolution
probability of replacing and amino acid with another similar one
purine-pyrimidine transversions less frequent than pur-pur and pyr-pyr
what is transition-transversion matrix used for
aligning nucleotide sequences when accounting for a higher probability of transition than transversion
what is transition
A to G, C to T
what is transversion
purine replaced by pyrimidine
what is PAM ineffective at identifying?
distant relationships
how identical are two sequeneces 1 PAM apart
99% identical
what is the lowest PAM level that produces a correct alignment
PAM250 with ~20% sequence identity
what matrix is used to identify distant relationships
BLOSUM matrix
how identical should sequences aligned used BLOSUM62 be
62%
what is the main difference between PAM and BLOSUM
in BLOSUM, all matrices are directly calculated, no extrapolation like PAM
Is BLOSUM45 more or less divergent than BLOSUM80
more divergent
are blosum matrices based on global or local alignments
local
are PAM matrices based on global or local alignments
global
what are heuristic alignments?
fast but approximate methods for finding the alignment with the highest score, not guaranteed to find the best alignment
which is more sensitive for nucleic acid sequences; FASTA or BLAST?
FASTA
what is the fasted and most widely used heuristic tool for pairwise sequence comparison
BLAST
what is % Identity in BLAST
number of identical residues divided by the number of matched residues ignoring gaps
what are positive outputs by BLAST
fraction of residues that are identical or similar
what is the bit-score (Max score)
highest alignment score between the query sequence and the database segments
what is the E value
the likelihood that the similarity occurred by chance
is a low E value good or bad?
good - more significant
what is PSI-BLAST?
searches a database for distantly related sequences