Sequence Alignment Flashcards
what is sequence alignment?
bioinformatic method to arrange sequences in order to identify regions of similarity and evolutionary relationships
what is a query sequence
unknown uncharacterized sequence that we want to align to a known sequence or database
what is a reference sequence
the known sequence the query is aligned to
what do we align sequences
to search for close matching sequences
to assign function to genes, proteins, genomes (annotation)
to infer evolutionary relationships
determine the residue-residue correspondences
what are the two alignment types
global and local
what is global alignment
sequences aligned on their entire length, carried out form beginning to end
what is local alignment
only local regions with highest level of similarity are aligned
what are homologous sequences
evolved from a common ancestor, have similar 3D structure and function
what is the threshold for homologous sequences
more than 70% identitity
what is pairwise alignment
aligns two sequences at a time, gaps introduced to find the best match
what is dotplot alignment
a graphical representation of the sequence similarity between two sequences, rows correspond to residues of one sequence, columns of the other
what are some measures of sequence similarity
the hamming distance and levenshtein
what is hamming distance
two strings of equal length, the number of positions with mismatching characters without insertions or deletions
what is levenshtein distance
minimum number of edit operations required to change one string into another, including insertions, deletions and alterations
what controls the cost of insertions and deletions when computing alignments
gap penalties