Sequence alignment (long ver p1) Flashcards
DNA Alphabet aside from ATGC
B
C, G, T
DNA Alphabet aside from ATGC
D
A, G, T
H
A, C, T
K
G, T
M
A, G
N
A, C, G, T
R
A, G
S
C, G
V
A, C, G
W
A, T
Y
C, T
What are the different biological sequences?
Nucleic Acid
Protein / Amino Acid
amino acid code is a _
one letter abbreviation
Enumeration
What is the significance of Sequence Alignment?
learn about gene or protein through homology
discover functional, structural, and evolutionary information of sequences
obtain best possible/”optimal”alignment using algorithms
process of lining up two sequences to achieve maximal levels of identity and show conservation of residues
pairwise alignment
way to represent the relationship between two biological sequences (proteins or nucleotides)
pairwise alignment
used to assess degree of similarity and possibly homology
pairwise alignment
same residues between two proteins; may be in global or local alignment
identical
residues have structural or functionally related
similar
can only use ‘higher/lower’degree of similarity
similar
sum of both identical and similar residues
percent similarity
goal of pairwise alignment
percent similarity
refers to an exact match between two nucleotides or amino acids
identity
refers to a resemblance between two residues that is greater than one would expect at random
similarity
simple picture that gives an overview of the similarities between two sequences
dot plot
what correspond to the residues of dot plot?
rows and columns
positions in the dot plot are _ if the residues are different, and _ if they match
left blank - different
filled - match
compare sequences as a whole
global alignment
no amino acids or nucleotides is discarded
global alignment
It is used when sequences are quite similar and approximately the same length
global alignment
analyze polymorphisms between closely related sequences
global alignment
What kind of algorithm does Global alignment use?
Needleman Wunsch algorithm
stretches of sequence with high density of matches are aligned (generate islands of matches)
local alignment
more suitable for aligning sequences that are similar in some regions (maybe a conserved region or domain) but different in others
local alignment
detect similar subsequences in two sequences
local alignment
What is the algorith used in Local Alignment?
Smith Waterman algorithm
What are the steps in Pairwise Comparison?
Set up the matrix
Score the matrix
Identifying the optimal alignment
How to set up the matrix?
one sequence is written left right/ top bottom; one residue at a time
Scoring the matrix
protein
Choose PAM or BLOSUM
Choose number
Scoring the matrix
DNA
score for identities or mismatches
What does PAM mean?
Percent Accepted Mutation
estimates the rate at which each possible residue in a sequence changes to each other residue over time
PAM
What does BLOSUM-X means?
Blocks Amino Acid Substitution Matrix
Amino acids are grouped according to the chemistry of the side group
Substitution Matrix: PAM
Log odd values
+10
means that ancestor is probability is greater
Log odd values
0
means that the probability are equal
log odd values
4
means that the change is random
True or False: Substitution Matrix: BLOSUM is not based on an explicit evolutionary model
True
derived from all amino acid changes observed in an aligned region given a family of proteins regardless of the protein’s sequence similarity but are related biochemically (basis of common ancestor)
Substitution matrix: BLOSUM
Matrix is based on substitutions and conserved positions in BLOCKS
Substitution matrix: BLOSUM
common regions in related sequences
blocks
used to find conserved domains
Substitution matrix: BLOSUM