Proteins and Antibody Modelling Flashcards
Week 8 Lecture 1
Steps in comparative modelling
- Start (retrieve the sequence)
- Template search
- Target/template alignment
- Model building
- Model evaluation
What are antibodies?
- Immunoglobulin proteins produced by B-cells as part of an immune response
- Composed of two identical heavy chains and two identical light chains
- The antigen binding site, composed of 3 CDR loops per chain, is located in the variable region
Antibody variable regions
- Made up of 4 framework regions and 3 complementary determining regions
- The FWRs remain relatively constant between antibody chains of the same class
- The CDR loops vary greatly and all 6 (CDRH1, CDRH2, CDRH3, CDRL1, CDRL2, CDRL3) together constitute an antigen-binding paratope
- CDRH3 is particularly important for antigen binding
Challenges in antibody modelling
- Modelling the full-length antibody
- Large range of linker length
- Linker flexibility
- Modelling CDR loop conformations
- The model’s accuracy is only known post-hoc
- No current publicly available pipeline comments on the in vitro ‘developability’ of the antibody
What is homology modelling?
Constructing a model of the “target” protein from its amino acid sequence and an experimental 3D structure of a related homologous protein.
What do we need to perform homology modelling?
- A tool to search databases of known structures and sequences (PDB and BLAST)
- A tool to align sequences (MSA alignment programme: T-coffee)
- A tool to compare structures (RMSD)
- A tool to model sequences based on homology to an available structure in the PDB (SwissModel)
- A tool to assess the modelled sequence (SwissModel, Molprobity-Prosa)
How do we transfer 3D information?
Piece together individual fragments from the template protein that match your sequence and map them to the model
Root Mean Square Deviation (RMSD)
A measure of the average distance between the atoms of superimposed molecules
Template search (Homology modelling)
Step 1
- Query the new sequence in a database to find sequences related to the query
- BLAST is a fast tool for rapidly comparing sequences with every sequence in the database and reporting similar sequences
- % identity must be over 30%
Template alignment (Homology modelling)
Step 2
- MSA can be used to see which sequences are highly conserved
- T-coffee is a common software
Model building (Homology modelling)
Step 3
- SwissModel or MODELLER
- Extract and satisfy spatial restraints
Model evaluation (Homology modelling)
Step 4
- Ramachandran plot
- Typical errors include regions without a template, distortions/shifts in aligned regions, and sidechain packing
- The resolution/quality of template structure is important to model quality
- Tools: ProSA web, Molprobity, SwissModel QA, CASP
Confidence I: pLDDT
Represents the confidence of the predicted structure compared to the “true” (ground truth) structure.
ConfidenceII: PAE
Predicted Aligned Error
- It is a pairwise estimate of positional error
- Expected position error on position X when predicted and true structures are aligned on residue Y
- Allows us to assess the confidence of global arrangement
Steps in antibody modelling
- Template search
- Target-template alignment
- Model building
- Model evaluation
Template search (antibody modelling)
Step 1
- A template structure is chosen for the target antibody, either for the VH and VL domains separately, or for both domains combined.
- Alternatively, a fragment-based method can be used to assemble the VH and VL domains.
Target-template alignment (antibody modelling)
Step 2
The VH–VL orientation is then modelled after choosing the framework template.
Model building (antibody modelling)
The ‘canonical’ (the standard and easy to model) CDR loops (CDRH1, CDRH2, CDRL1, CDRL2, CDRL3) are modelled, followed by CDRH3. The models may also be refined.
CDR loop modelling
- Occurs after choosing the framework template
- A CDR-specific database is used for each CDR loop
- If a suitable decoy isn’t found in the database use an Fv-specific database.
- If a decoy is still not found, the most sequence–similar, length–matched CDR loop (based on its BLOSUM62 score) is used as the template.
- If no length–matched templates are found, MODELLER uses the most sequence–similar loop as the template for ab initio modelling.
CDR loop modelling challenges
- The CDRs exhibit the largest variability in sequence, length, and composition of amino acids, resulting in conformational variability.
- Each modelled CDR may influence the conformations of the next loop.
- Modelled in order of predicted accuracy: CDRL2, CDRH2, CDRL1, CDRH1, CDRL3, and CDRH3
- CDR3 is the most variable region because it encompasses the region of the rearranged gene where the three gene segments (VH-DH-JH) are joined.
Side chain modelling
Two methods:
1. Complete prediction: every side chain is predicted.
2. Partial prediction: side chains of identical residues from the template are retained, and the remaining side chains are predicted (usually more accurate).
Assessing the confidence of the antibody model
- The confidence of the model antibody structure is the probability that a region (e.g., framework, CDRL3) will be modelled within xÅ given the sequence identity or loop length.
- Confidence calculations give you the expected RMSD for a specified probability.