Structure Comparison and Classification Flashcards
Week 4 Lecture 1
1
Q
ClustalW
A
- Positions specific gap opening and extension penalties
- Uses different amino acid substitution matrices - one for close relatives, one for distant relatives
- When the structure is known we want to increase the gap penalty within helices and strands and decrease it between them, forcing gaps to occur more frequently in loops
- If no structure is known, we can use simple rules which depend on the residues occurring and the frequencies of gaps
2
Q
Profile-based sequence search methods
A
- We can identify patterns of conserved residues by comparing related sequences within a protein family
- Even the most distant members of the family will have these patterns of conserved residues
- We can make a profile which encapsulates these patterns and use it to detect more distantly related sequences
- Highly conserved positions usually correspond to residues important for the folding or packing in the buried core or functional residues within the active site
3
Q
PSI-BLAST
A
- First, it constructs a multiple alignment of all the related sequences identified by BLAST
- Then it estimates the residue frequencies at each position to construct a position-specific score matrix (PSSM), also known as a weight matrix or 1D profile
- Then it uses the 1D profile for scanning the database
4
Q
HMMs for protein family recognition
A
- The sequence is aligned using a probabilistic model of interconnecting matches, deleted or inserted states
- Contains statistical information on observed and expected positional variation
5
Q
Protein Language Models (PLMs)
A
- Adapted from text learning - learn which features best predict the next residue in a protein
- These features can be represented in a vector for each residue to train classifiers for specific tasks
e.g. ProtBERT, ProtTrans