2 - Amino acid scoring matrices and optimal pairwise alignment methods Flashcards

Question 1

Q

What can you do when a dotplot is cluttered?

Answer

A

Use a sliding window (eg. 3 nt words)
Use a threshold (eg 2/3 neighbouring. side dots must be identical)
Use amino acid residues instead of nucleotides!

Question 2

Q

Which type of amino acids experience the most substitutions?

Answer

A

Non-polar neutral amino acids, as they are chemically similar.

Question 3

Q

Give four scoring matrices for protein similarity

Answer

A

Unit cost: 1 match = 1 point
Genetic similarity matrices (codon similarity)
Chemical similarity (amino acid chemistry)
Empirical matrices (Dayhoff/PAM or BLOSUM): Real data on relative propensity of interchange between amino acids

Question 4

Q

What is 1 PAM? And therefore, what are two sequences 5 PAM apart?

Answer

A

1 PAM = 1 percent accepted mutation

Two sequences 5 PAM apart are 95% identical. The bigger the PAM number the more divergent the sequences.

Question 5

Q

What is the BLOSUM matrix?

Answer

A

It’s similar to PAM, but much newer and with more substitutions recorded from BLOCKS database (conserved regions of multiple alignments of proteins)

BLOSUM30: Created from sequences which are below 30% identical etc. (opposite of PAM naming scheme!)

Question 6

Q

Review probability, starting on page 12 of lecture 2.

How do you denote joint probability? Conditional probability?

Answer

A

Review probability, starting on page 12 of lecture 2.

Joint: P(A, B) = (A | B) / total
P(A, B) = P(B) x P(A)

Conditional: P(A | B) = (A | B) / B

Question 7

Q

What is the multiplication theorem for conditional probability?

Answer

A

P(D,G) = P(D | G) x P(G)

Question 8

Q

Matrices (eg. PAM and BLOSUM) of the interchange between various amino acids are calculated from REAL data.

How is this data usually displayed? How are these calculated?

Answer

A

log-odds matrices

= log(P(A,B) / P(A) x P(B))

Where A and B are residues at a position and P(A,B) can be thought of as the joint probability of having the two residues appear at a given site having evolved from a common ancestor over time t.

The probability of seeing A and B together is given by the frequency of A (P(A)) times the frequency of B (P(B)).

If values are positive, it is more likely than random chance to share a common ancestry.

Question 9

Q

What are two advantages and three disadvantages to the dot matrix method?

Answer

A

Pros

Good way of visualizing alignments (eg. can see repeat structures)
Programs can do this

Cons

Needs visual inspection
Subjective (devil in the clouds)
Need something to tell you what the OPTIMAL alignment is

Question 10

Q

What are two methods for finding the optimal alignment of two sequences?

Give the two used types of the most common method!

Answer

A

Exhaustive: Evaluate all possible alignments and choose the best scoring one. Practically impossible for two sequences.

Dynamic programming algorithm: Time is proportional to N*M where N and M are the lengths of the target (N) and query (M) sequences (MUCH FASTER)

Needleman-Wunch (global)
Smith-Waterman (local)

Question 11

Q

List the pairwise alignment methods

Answer

A

Dot plots
Global alignment (NW), used when sequences are co-linear.
Local alignment (SW), eg. FASTA and BLAST. Can use for mosaic or repetitive proteins (where co-linearity is not necessarily expected).

Question 12

Q

What is the Needleman-Wunch algorithm?

Answer

A

Global
Makes a 2D matrix of similarity values
Builds new matrix by adding up elements in a systematic manner
Traces back through the matrix from top left to right, top to bottom over the highest numerical path

Question 13

Q

What is the Smith-Waterman algorithm?

Answer

A

Local
Needs to give a negative penalty to mismatches and gaps (which NW doesn’t).
Stop extending when score = 0 or less
The entire matrix (that is created) must be searched for regions with high local similarity
Keeps cumulative total and no elements are allowed a score less than zero
Tracing the optimal path starts at the highest score in the matrix