Structure Comparison and Classification Flashcards

Week 4 Lecture 1

1
Q

ClustalW

A
  • Positions specific gap opening and extension penalties
  • Uses different amino acid substitution matrices - one for close relatives, one for distant relatives
  • When the structure is known we want to increase the gap penalty within helices and strands and decrease it between them, forcing gaps to occur more frequently in loops
  • If no structure is known, we can use simple rules which depend on the residues occurring and the frequencies of gaps
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Profile-based sequence search methods

A
  • We can identify patterns of conserved residues by comparing related sequences within a protein family
  • Even the most distant members of the family will have these patterns of conserved residues
  • We can make a profile which encapsulates these patterns and use it to detect more distantly related sequences
  • Highly conserved positions usually correspond to residues important for the folding or packing in the buried core or functional residues within the active site
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

PSI-BLAST

A
  • First, it constructs a multiple alignment of all the related sequences identified by BLAST
  • Then it estimates the residue frequencies at each position to construct a position-specific score matrix (PSSM), also known as a weight matrix or 1D profile
  • Then it uses the 1D profile for scanning the database
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

HMMs for protein family recognition

A
  • The sequence is aligned using a probabilistic model of interconnecting matches, deleted or inserted states
  • Contains statistical information on observed and expected positional variation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Protein Language Models (PLMs)

A
  • Adapted from text learning - learn which features best predict the next residue in a protein
  • These features can be represented in a vector for each residue to train classifiers for specific tasks
    e.g. ProtBERT, ProtTrans
How well did you know this?
1
Not at all
2
3
4
5
Perfectly