Sequence Alignment (short ver) Flashcards
aligns two or more sequences to highlight their similarity, inserting a small number of gaps into each sequence (usually denoted by dashes) to align wherever possible identical or similar characters.
Basic Sequence Alignment Algorithm
Why compare biological sequence?
to obtain functional or mechanistic insight about a sequence by inference from another potentially better characterized sequence
to find whether two (or more) genes or proteins are evolutionary-related
to find structurally or functionally similar regions within sequences
practical applications of sequence alignment include…
similarity searching of databases
assembly of sequence reads
mapping sequencing reads to a known genome
Protein structure prediction, annotation, etc…
similarity searching of databases
reads into a longer construct such as a genomic sequence
assembly of sequence reads
looking for differences from reference genome - SNPs, indels
resequencing
mapping transcription factor binding sites via
ChiP-Seq (chromatin immuno-precipitation sequencing)
arguably the most fundamental operation of bioinformatics
Pairwise sequence
sequence comparison is most informative when it detects _
homologs
sequences that have common origin; they share a common ancestor
homolog
homologous sequences may either be:
orthologs or paralogs
True or False: orthologs and paralogs are an example of all or nothing relationships
True
Any pair of sequences may share a certain level of:
identity and/or similarity
Orthologs tend to have ____ while Paralogs tend to have ______
Orthologs - similar function
paralogs - slightly different function
homolog produced by speciation that have diverged due to divergence of the organisms they are associated with.
orthologs
homologs produced by gene duplication
paralogs
ortho is a greek word that means
straight; implies direct descent
they represent genes derived from a common ancestral gene that duplicated within an organisms and then subsequently diverged by accumulated mutation
paralogs
greek word meaning along side of
para
Why is determining orthologs vs. paralogs can be a complex problem?
gene loss after duplication
lack of knowledge of evolutionary history
weak similarity because of evolutionary distance
True or False: Homology implies exact same functions
False;
homology does not imply exact same functions
may have similar function at very crude level but play a different physiological role
homology
three types of sequence change that can occur during evolution
mutations/substitutions
deletions
insertions
great tools to visualize sequence similarity and evolutionary changes in homologous sequences
alignment
represent mutations/substitutions
mismatches
represent insertions and deletions
gaps
one way to judge alignment is to:
compare their number of:
matches
insertions
deletions
mutationd
reflect biological or statistical observations about the known sequences, and are frequently represented by scoring matrices
alignment scoring schemes
if x1 and y1 is a match
reward
if x1 and y1 is a mismatch
penalty
if either x1 and y1 is a gap
penalty
the gap penalty is denoted as
g
true or false: biologist often prefer parsimonious alignment
true
number of postulated sequence changes is minimized
parsimonious alignment
True or false: there may be more than one optimal alignment and these may not reflect the true evolutionary history of our sequences
True
two commonly quoted matrices for pairs of aligned sequences
Sequence identity
sequence similarity
typically quotes the percent of identical characters in the aligned region of two sequences
sequence identity
typically, the score resulting from optimal pairwise alignment (there is dependence in parameters used like scoring scheme)
sequence similarity
True or False: homology is not an all or nothing relationship, you can not have a percent homology
False;
homology is an all or nothing relationship
True or False; homology is an all or nothing relationship, you can have a percent homology
False
you can not have a percent homology
frequently used as an indicator of homology
high sequence similarity
use to find genes and/or proteins with potentially similar or identical function
high sequence similarity
can query a database of sequence by performing a series of pairwise alignments
high sequence similarity
way of arranging the primary sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequence
sequence alignment
aligned sequences of nucleotide or amino acid residues are typically represented as ____ within a matrix
rows
inserted between the residues so that residues with identical or similar characters are aligned in successive columns
gaps
True or False; alignment can also be not an exact match
True
True or False: Alignment can be based on edit distance
True
True or False: Alignment is usually based on a similarity measure
True
the number of changes requires to change one sequence into another is called the __
edit distance
every character in the query (source) string lines up with a character in the target string
Global alignment
attempt to align every resiue in every sequence, are most useful when the sequences in the query set are similar and of roughly equal size
global alignment
may require gap (space) insertion to make strings the same length
global alignment
a general global alignment based on dynamic programming
Needleman Wunsch algorithm
Needleman-Wunsch algorithm is based on
dynamic programming
an internal alignment or embedding of a substring into a target string
local alignment
general local alignment method based on dynamic programming
Smith-Waterman algorithm
Pairwise sequence alignment methods
brute force alignment
dot matrices
dynamic programming
its objective is to arrange two sequences in such a fashion that pairs of matching characters between the two sequences are maximized
pairwise sequence alignment
function that ranks or scores the characters being compared
substitution matrix
simplesst case of pairwise sequence
brute force alignments
make the brute force method unusable for all but the shortest sequence
gaps
pair of related sequences often have insertions or deletions relative to one another, we therefore require
gapped pairwise alignment
simple graphical method for pairwise alignment;
dot plot/dot matrix approach
no scoring, so difficult to compare alternative alignments
dot plot/dot matrix approach
can give visual sequence structure but requires human interaction
dot plot/dot matrix approach
provides optimal solutions (but not necessarily unique solutions)
dynamic program algorithm
much faster, widely used for database searches, may miss some pairws with low similarity
heuristic word or k-tuple approaches