Substitution matrices Flashcards

1
Q

Why are not all missense mutations equal?

A

Different qualities of amino acid e.g. polar, non-polar

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Alignment of sequences

A

Gaps inserted to maximise alignment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Haemoglobin vs Myoglobin

A

Similar but not the same

Slide proteins along to identify similarities

Plot matches

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Insertion of gaps

A

Insertion of gaps in sequence allows for a greater overall identity by more amino acid matches

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Forming empirical substitution matric from an alignment

A

V: G, A, A, L, L, K, I, P, K, Q, T, A, F, D
= 3A, 2L, 2K, 1G, 1I, 1P, 1Q, 1T, 1F, 1D
– total 14 substitutions from 20 occurrences

Valine 70% substituted
21% are A; 14% each are L, K; 7% are others

L: N, K, K, K, E, V, A, V, I, S, H, F, I, I, I, H, M, S, M
= 4I, 3K, 2H, 2V, 2M, 2S, 1F, 1N, 1E, 1A
- total 19 substitutions from 31 occurrences

Leucine 61% substituted
21% are I; 16% K; 11% are H,V,M,S; 5% each others

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Point accepted mutations (PAM) matrices

A
  • Takes pair of orthologous sequences from two species where you know the date of their common ancestor
  • Repeat what was just shown for all amino acids, to compile an empirical substitution matrix
  • PAM1 = a PAM matrix made from species with 1 million years divergence
  • Likewise PAM50, PAM500
  • Choose the appropriate matrix depending on the species you are aligning
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Blocks summary (BLOSUM) matrices

A

Blocks Summary (BLOSUM) matrices
- Based on the now defunct Blocks aligner and its curated database Blocks+

  • From the Blocks+ database, select all the alignments
  • Choose a threshold the thin them
  • For instance, only retain sequences that are 62% identical, any sequence that is not at least 62% identical to one of the other sequences is discarded
  • On second pass, remove sequences that are not at least 62% identical to all the other sequences
  • Then empirically assemble the substitution matrix from the alignments that remain.
  • BLOSUM-62
  • Likewise BLOSUM-90 (using a 90% threshold)
  • Note that numbers go in opposite direction to PAM numbers
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

BLOSUM-62 substitution matrix for proteins

A
  • Common substitutions score highly
  • Rare substitutions score lowly (negative values)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How are mismatches scored?

A
  • On a variable scale - depends on likelihood of mismatch

Likelihood of mismatch determined from scoring curated datasets eg.
Peptide sequences from species that diverged X millions of years ago – the Point Accepted Mutation series: PAM1, PAM50, PAM500

Peptide sequences with a certain degree of similarity on alignment with BLOCKS – the Blosum series: Blosum-62 is the substitution matrix derived from a set of sequences that are at least 62% identical, Blosum-90 from at least 90% identical etc

Others: Gonnet, Jones-Taylor-Thornton (JTT), Whelan & Goldman (WAG)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly