Sequence alignment (long ver p2) Flashcards
Examples of Pairwise alignment software
EMBL - EBI Pairwise Sequence Alignment
BLAST’
What are the different applications of Pairwise Alignment?
measuring sequence similarity
studying the evolution of sequences
share a common evolutionary ancestor
Homologous sequences
True or False: Homologous sequences does not share a significantly related 3D structure but share the same evolutionary ancestor
False
shares the same 3D structure
usually share significant amino acid/ nucleotide identity
homologous sequences
sequence regions that are homologous are also called
conserved regions
sequences that share a common evolutionary ancestry
homologs
derived from a single ancestral gene in the last common ancestor
orthologs
homologous genes with identical function in different organisms and is only separated by speciation
orthologs
two or more homologous genes found within a single species
paralogs
separated by a gene duplication event
paralogs
if a gene in an organisms is duplicated and transposed so that two copies occupy two different positions in the same genome, then the two copies are _
paralogous
create gene families
paralogs
consists of two or more copies of paralogous genes within the genome of a single organism
gene families
True or False: Biological sequences does not occur in families
False
it often occurs in families
related genes within an organism
paralogs
sequences within a population
polymorphic variants
genes in other species
orthologs
True or false: Homologous sequences often retain similar structures and functions
True
collection of three or more proteins (or nucleic acid) sequences that are partially or completely aligned
multiple sequence alignment
Homologous residues are aligned in _ across the length of the sequences
columns
In multiple sequence alignment, the residues are presumed to be homologous in an:
evolutionary and structural sense
residues are homologous as they are presumably derived from a common ancestor
evolutionary sense
aligned residues tend to occupy corresponding positions in the three-dimensional structure of each aligned protein
structural sense
What are the 5 main approaches to multiple sequence alignment
exact methods
progressive alignment
iterative approaches
consistency-based methods
structure-based methods
employs dynamic programming (similar to Needleman Wunsch but the matrix is multidimensional)
exact methods
goal is to maximize the summed alignment score of each pair of sequences
exact methods
generate optimal alignments but are not feasible in time or space for more than a few sequences
exact methods
strategy entails calculating pairwise sequence alignment scores between all the proteins (or nucleic acid sequences) being aligned
Progressive Sequence Alignment
beginning the alignment with 2 closest sequences and progressively adding more sequences to the alignment
progressive sequence alignments
What is the pro of Progressive Sequence Alignment?
permits rapid alignment of hundredsthousands of sequences
What is the con of Progressive Sequence Alignment?
final alignment depends on the order in which sequences are joined; not guaranteed to provide most accurate alignments
What are the examples of Progressive Sequence Alignment?
ClustalIW
What are the 3 stages of ClustalIW algorithm?
STAGE 1: create pairwise alignment of every protein included in MSA
STAGE 2: guide tree is calculated from the distance (similarity) matrix
STAGE 3: MSA is created based on guide tree
two ways to construct guide tree
Unweighted Pair Group Method of Arithmetic Averages (UPGMA)
Neighbor-Joining Method
compute a suboptimal solution using a progressive alignment strategy, and then modify the alignment using dynamic programming or other methods until a solution converges
Iterative Approaches
What is the advantage of Iterative Approach over Progressive Sequence Alignment?
overcome alignment errors by iterative refinment
What is an example of Iterative Approach?
MAFFT
What does MAFFT mean?
Multiple Alignment using Fast Fourier Transform
example of multiple alignment package that is considered to be highly accurate based on recent benchmarking studies
MAFFT
use information about the multiple sequence alignment as it is being generated to guide the pairwise alignments
consistency-based methods
example of Consistency-based approach
T-coffee
What does t-coffee mean
tree-based consistency objective function for alignment evaluation
include all possible pairwise global alignments of the input sequences and the 10-highest scoring local alignments
T-coffee
True or False: every pair of aligned residues is assigned a weight
T-coffee
based on the idea that the tertiary structures evolve more slowly than primary sequences
structure-based approaches
accuracy of msa is improved by including information about the 3-dimensional structure of one or more members of the group of proteins being aligned
structure-based approaches
a compilation of both multiple sequence alignments and profil HMMs of protein families
Pfam
What does Pfam mean?
Protein Family Database of Profile HMMs