Bio Background Flashcards

Question

Assumption of sequence alignment

Answer 1

Life is monophyletic Biological entities share common ancestry

Answer 2

Homology: similarity due to evolution Analogy: similarity due to analogous function

Answer 3

A is homologus to B if they relate from divergence from a common ancestor Paralogues -> different but related functions in one species Orthologues -> Same function in different species

Answer 4

A is analogous to B if similar function but different origins Convergent evolution -> different species in a similar env adapt to same function

Answer 5

Transition A <-> G between two purines C <->T between two pyrimidines Transversion (mixed one purine one pyrimidine) C <-> G A <-> T Deletions Insertions Inversions A-T adenine turns to thymine G-C Guanine turns to cytosine

Answer 6

Among all explanations, the simplest one is preferred. Explain the absence/presence of nucleotides with the min number of evolutional changes Easier to delete n bases in one site than one base in n sites

Answer 7

Global alignment - comparative and evolutionary studies Local alignment - database searching and retrieval - ignores distantly related biological regions and focuses on evolutionarily conserved signals of similarity

Answer 8

Pairwise alignment - two sequences - exact solution Multiple sequence alignment - 3 or more - approximated (heuristic) solutions

Answer 9

ends - missing data internal - deletion or insertion

Answer 10

Manual alignment Dot Matrix

Answer 11

match -> diagonal step through dot mismatch -> diagonal step through empty gap on top seq -> vertical step gap on left seq -> horizontal step W/ multiple window size, stringency, alphabet size stringency = if at least h chars are identical adv: - simple - trial and error to explore disadv: - expensive for large seq - may not find best - qualitative analysis

Answer 12

- scoring system : gap penalty -> gap-opening penalty, gap-extension penalty : scoring Matrix M(a,b) -> based on the additive property of the score, implies poistion independence match (a=b) mismatch(a!=b)

Answer 13

Fixed gap-penalty system -> 0, 1, or a constant Linear gap-penalty system -> gamma(g) = -g*d = gap length g by a constant d Affine score -> opening cost (d) extension cost (e) gamma(g) -d - (g-1) * e, with e < d Logarithmic gap-penalty system -> gap-ext increases with the logarithm of the gap length

Answer 14

- Identity scoring -> match 1, mismatch 0 - DNA scoring -> match 3, transition 2, transversion 0 - Chemical Similarity Scoring, higher scores for amino acids based on chemical similarity: size, charge, hydrophobicity - Observed matrices: analyze substitution frequency

Answer 15

DNA M(a,b) > 0 if match <=0 if mismatch Aminoacids PAM -> Percent/Point Accepted Mutation Possibility of pair caused to homology and not by chance BLOSUM -> Substitution matrices for aminoacids direct observation of blocks of proteins having similar functions

Answer 16

related proteins up to 85% very similar PAM1 - 1 substitution in 100 amino acid residues 1% Going through N percent mutations PAM-N Matrix PAM250 -> 250 evolutionary steps Pos score common replacement neg score unlikely replacement Short -> short seq, strong local similarities Long -> Long seq, weak similarities PAM60 60% close relations PAM120 general use 40% identity PAM250 distant 20% identity

Answer 17

Blocks database, based on local alignments or blocks / observed Families of proteins with identical function highly conserved protein domains Identify motifs -> blocks of local alignments BLOSUM 62 is the default matrix in BLAST 2.0 BLOSUMn based on seq that are at most n percent identical higher n more closely related BLOSM62 general use BLOSUM80 close relations BLOSUM45 distant relations

Answer 18

top = distantly related proteins PAM100 ~= BLOSUM90 PAM120 ~= BLOSUM80 PAM160 ~= BLOSUM60 PAM200 ~= BLOSUM52 PAM250 ~= BLOSUM45 bottom = closely related sequences

Answer 19

BLOSUM -> best for local alignments BLOSUM62 -> majority of weak protein similarities BLOSUM45 -> long and weak alignments PAM250 -> seq 17-27% identity BLOSUM62 -> moderately distant proteins BLOSUM50 -> FASTA searches

Answer 20

permits only substitutions (positive cost) if |A| = |B|, 0<= d(A,B) <= |A| X =aaaccd, Y=abcccd d(X,Y) = 2 two substitutions necessary

Answer 21

permits insertions, deletions and substitutions (positive costs) 0 <= d(A,B) <= max(|A|, |B|) X=aaaccd, Y=abccd d(X,Y)=2 , one substitution and one deletion

Answer 22

permits only insertions (positive cost) d(A,B) = |B| - |A| or inf X=aaccd,Y=abbaccd d(X,Y) =2 two insertions

Answer 23

D[i,0] = i D[0,j] = j D[i,j] = min( D[i-1, j-1] + f(i,j), D[i-1, j] + 1, D[i, j-1] + 1 ) f(i,j) = 0 if match else 1

Answer 24

Complexity - Space (nm) Time for build O(nm) backtrack O(n+m) D[i,0] = i D[0,j] = j D[i,j] = min( D[i-1,j-1] + f(i,j), D[i-1, j] + 1, D[i, j-1] + 1 ) f(i,j) = 0 if match else 1 follow path back from D[n,m] to D[0,0] vertical step -> align a symbol in A with gap in B horizontal step -> align a symbol in B with gap in A diagonal step -> match or mismatch

Answer 25

y = gap penalty D[i,0] = i*y D[0,j] = j*y D[i,j] = max( D[i-1,j-1] + ro(i,j), D[i-1, j] + y, D[i, j-1] + y ) ro(i,j) = scoring matrix B top A left D[0,j] = 0 gap beginning of B first column D[n,j] = 0 gap tail of B last column D[i,0] = 0 gap beginning of A first row D[i,m] = 0 gap tail of A last row

Answer 26

D[i,0] = 0 D[0,j] = 0 D[i,j] = max( D[i-1,j-1] + ro(i,j), D[i-1, j] + y, D[i, j-1] + y, 0 ) ro(i,j) = scoring matrix . f.e. match 1 mismath -1 y gap -1 y = gap penalty

Answer 27

SW finds segments in two seq that have similarities First row and first column = 0 neg score = 0 begin with highest score end at 0, top left NW aligns two complete sequences first row and column = gap penalty can be negative begin with cell at (n,m) end at (0,0)

Bio Background Flashcards

(51 cards)