2 - Amino acid scoring matrices and optimal pairwise alignment methods Flashcards

1
Q

What can you do when a dotplot is cluttered?

A
  • Use a sliding window (eg. 3 nt words)
  • Use a threshold (eg 2/3 neighbouring. side dots must be identical)
  • Use amino acid residues instead of nucleotides!
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Which type of amino acids experience the most substitutions?

A

Non-polar neutral amino acids, as they are chemically similar.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Give four scoring matrices for protein similarity

A
  • Unit cost: 1 match = 1 point
  • Genetic similarity matrices (codon similarity)
  • Chemical similarity (amino acid chemistry)
  • Empirical matrices (Dayhoff/PAM or BLOSUM): Real data on relative propensity of interchange between amino acids
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is 1 PAM? And therefore, what are two sequences 5 PAM apart?

A

1 PAM = 1 percent accepted mutation

Two sequences 5 PAM apart are 95% identical. The bigger the PAM number the more divergent the sequences.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the BLOSUM matrix?

A

It’s similar to PAM, but much newer and with more substitutions recorded from BLOCKS database (conserved regions of multiple alignments of proteins)

BLOSUM30: Created from sequences which are below 30% identical etc. (opposite of PAM naming scheme!)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Review probability, starting on page 12 of lecture 2.

How do you denote joint probability? Conditional probability?

A

Review probability, starting on page 12 of lecture 2.

Joint: P(A, B) = (A | B) / total
P(A, B) = P(B) x P(A)

Conditional: P(A | B) = (A | B) / B

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the multiplication theorem for conditional probability?

A

P(D,G) = P(D | G) x P(G)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Matrices (eg. PAM and BLOSUM) of the interchange between various amino acids are calculated from REAL data.

How is this data usually displayed? How are these calculated?

A

log-odds matrices

= log(P(A,B) / P(A) x P(B))

Where A and B are residues at a position and P(A,B) can be thought of as the joint probability of having the two residues appear at a given site having evolved from a common ancestor over time t.

The probability of seeing A and B together is given by the frequency of A (P(A)) times the frequency of B (P(B)).

If values are positive, it is more likely than random chance to share a common ancestry.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are two advantages and three disadvantages to the dot matrix method?

A

Pros

  • Good way of visualizing alignments (eg. can see repeat structures)
  • Programs can do this

Cons

  • Needs visual inspection
  • Subjective (devil in the clouds)
  • Need something to tell you what the OPTIMAL alignment is
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are two methods for finding the optimal alignment of two sequences?

Give the two used types of the most common method!

A

Exhaustive: Evaluate all possible alignments and choose the best scoring one. Practically impossible for two sequences.

Dynamic programming algorithm: Time is proportional to N*M where N and M are the lengths of the target (N) and query (M) sequences (MUCH FASTER)

  • Needleman-Wunch (global)
  • Smith-Waterman (local)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

List the pairwise alignment methods

A
  • Dot plots
  • Global alignment (NW), used when sequences are co-linear.
  • Local alignment (SW), eg. FASTA and BLAST. Can use for mosaic or repetitive proteins (where co-linearity is not necessarily expected).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the Needleman-Wunch algorithm?

A
  • Global
  • Makes a 2D matrix of similarity values
  • Builds new matrix by adding up elements in a systematic manner
  • Traces back through the matrix from top left to right, top to bottom over the highest numerical path
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the Smith-Waterman algorithm?

A
  • Local
  • Needs to give a negative penalty to mismatches and gaps (which NW doesn’t).
  • Stop extending when score = 0 or less
  • The entire matrix (that is created) must be searched for regions with high local similarity
  • Keeps cumulative total and no elements are allowed a score less than zero
  • Tracing the optimal path starts at the highest score in the matrix
How well did you know this?
1
Not at all
2
3
4
5
Perfectly