Lecture 8: Protein and Antibody Structure Prediction Flashcards
Why is protein structure prediction so difficult?
Because of the enormous search space of possible conformations and the exponential increase in the number of local energy minima.
What are some examples of ab initio folding (force-field and simulation based)?
Long simulations of small proteins like the work by Duan and Kollman in 1998 on a 36-residue protein.
The Folding at Home project
What are some examples of ab initio folding (knowledge-based scoring functions)?
Rosetta (Baker’s Laboratory)
I-Tasser (Zhang Laboratory)
What are the types of template based methods?
Homology modeling (sequence-sequence alignment)
Protein threading (sequence-structure alignment)
What are some tools to create structural models of sequences?
Swiss-Model
Modeller
What are some tools to evaluate the generated model?
Swiss-Model
Methods that use Ramachandran plots
Molprobity-prosa
What are the steps in comparative modelling?
1.Template search using BLAST.
2. Template alignment.
3. Model building based on the template structure
4. Model evaluation
What are some typical comparative model errors?
Distortions or shifts in aligned regions
Regions without a template
Incorrect sidechain packing
Incorrect template choice
Misalignment
How is quality assessed on the Swiss-Model tool?
It uses the QMEAN function
It provides both local and global (QMEAN4 & QMEAN6) scores.
What are the principles of AlphaFold?
- Utilizes multiple sequence alignments (MSA) which contain information about the relative positioning of residues.
- Employs a deep learning model to iteratively improve structure prediction
On what data is AlphaFold trained?
AlphaFold uses PDB data as a ground truth for training and optimization
How is MSA used in AlphaFold?
Covariation between columns of an MSA can be used to predict relative positions between residues.
How is the iterative modelling procedure used in AlphaFold?
Involves a pair representation of residue interactions and a structure module to map this to a 3D structure.
The predicted structure is then recycled to iteratively improve its quality.
What is LDDT?
Local Distance Difference Test - a superposition-free metric that measures how different the local spatial arrangement of atoms is between two structures.
What is pLDDT?
Per-residue LDDT-Cα - for predicted structures, it represents the expected value of Cα LDDT as a confidence score for each residue.
What is PAE?
Predicted Aligned Error - evaluates the confidence of the global domain arrangement in a predicted structure.
What are some AlphaFold-related tools?
ColabFold (lightweight modelling)
AlphaFold-Multimer (protein-protein complexes)
AlphaMissense (effects of amino acid substitutions)
FoldSeek (searching structure matches).
What are some drawbacks of AlphaFold?
Limited to proteins with a sequence length of less than 1400 amino acids.
Generates models of overlapping “contigs”.
Provides no information on protein dynamics.
Does not tell you the process with which a protein is folded!
What is the general structure of antibodies?
They are composed of two identical heavy chains and two identical light chains.
The antigen binding site is located in the variable region and is composed of 3 Complimentary determining regions (CDR) loops per chain.
What are challenges in antibody modelling?
Large range of linker length
Linker flexibility
Modeling CDR loop conformations
What are tools for antibody modelling?
VCAb web application
ABYSIS
SAbDab-SabPred
What are steps in antibody modelling?
Front: Steps in Antibody Modelling - Template-based approach (Step 0)
*
Back: Retrieve VH and VL sequences. The template structure is chosen for the target antibody, either for the VH and VL domains separately, or for both domains combined.
Flashcard 41
*
Front: Alternative method for assembling VH and VL domains in antibody modeling (Step 1-2)
*
Back: A fragment-based method can be used to assemble the VH and VL domains.
Flashcard 42
*
Front: Steps after assembling VH and VL in antibody modeling (Step 3 onwards)
*
Back: The VH–VL orientation is then modeled after choosing the framework template. The ‘canonical’ CDR loops (CDRH1, CDRH2, CDRL1, CDRL2, CDRL3) are modeled, followed by CDRH3. Models may also be refined with a Force Field.
Flashcard 43
*
Front: Modelling ‘canonical’ CDR loops
*
Back: Once a template framework structure is selected, a database method is generally used with a CDR–specific database for each CDR loop. If a suitable decoy is not found, the most sequence–similar, length–matched CDR loop (based on its BLOSUM62 score) is used as the template.
Flashcard 44
*
Front: Modelling CDR loops when no length-matched templates are found
*
Back: The most sequence–similar loop is then used as the template for ab initio modeling by programs like MODELLER.
Flashcard 45
*
Front: Challenges in CDRH3 loop modelling
*
Back: Variability in sequence and length, making template search difficult. Also, the flexibility of the CDR3 loop, as its conformation can be antigen-dependent.