Structure Validation, Terminology, Protein Data Bank (PDB) Flashcards
What is the quality criteria for structure validation
- Resolution
- R-factor & R-free
- Geometry
- B-factors
- Other experimental data- Does the model agree with biochemical and other data (mutagenesis, kinetics, spectroscopy etc.)
Who checks the X-ray crystallographer?
- The reviewers and researchers
- The protein data bank (PDB)
- Competing groups working on similar structures
What is the R-factor
- One of the key statistics for judging a structure’s quality
- Does the model reflect the actual experimental data?3. The residual (or fraction) of the data that the model does not explain
- Low resolution structures can be as high as 30%
- For exceptional sub-atomic resolution structures as low as 10%
- R-factor usually around 20-25%
- The fundamental reason for the difference is the crystal quality (purity of the sample and conformational flexibility of the molecule) and accuracy of the model (phasing quality and resolution)
What is R-free
- Calculated the same way as R-factor but only looks at a fraction of the data that has never been used to the refine the structure
- 5-10% of reflections removed randomly from the data set prior to refinement
- Reflections for entire dataset called work set or used
- Reflections for removed reflections called test or free
- Unbiased measure of the success of structural refinement
- The refined model has never seen the omitted data so the comparisons report an unbiased evaluation of the accuracy of the model
- Indicator of incorrect modelling when»_space; R-factor
- For good models usually no more than 5% higher than R-factor
Why does R-free give a more objective measure of the quality of the model
- Not biased towards these reflections
2. Avoids model bias and overfitting of the data
What geometry does the model need to agree with
- Model must have reasonable bond lengths, bond angles and overall geometric agreement compared to other well-defined structures
- Deviations for bond length <0.01 Å with angle deviations <2° compared to ideal values
Describe a Ramachandran plot
- Define whether or not the main chain dihedral angles fall into spatially allowed conformational regions
Why do you need an average of atom position
- If we could hold an atom rigidly fixed in one place, we could observe its distribution of electrons in an ideal situation
- Image would be dense towards the centre with the density falling off further from the nucleus
- But Electrons usually have a wider distribution
- Due to vibration of the atoms, and/or differences between the many different molecules in the crystal lattice
- The observed electron density will include an average of these small motions
- Slightly smeared image of the molecule
What is the amount of smearing proportional to
- Describes the degree to which the electron density is spread out for each atom
- The amount of ‘smearing’ is proportional to the magnitude of the B-factor
What is the B-factor an indicator of
- An indicator of thermal vibration of atoms
2. Indicates the true static or dynamic mobility of an atom, and also errors in model building
How is the electron density of an atom is broadened by disorder in the crysta
- Local static disorder - Atom positions change from one unit cell to another
- Local dynamic disorder - Atom positions change over time during the measurement
What are B-factors used for
- B-factors are introduced to account for disorder in the atomic model
- Confidence measure for location of each atom
- On scale from 1-100 Å2
- If an atom on the surface of a protein has a high temperature factor
- Atom is probably moving a lot and you are only observing one possible snapshot of its location
What do different B-factor values tell you
- Values <10 will create a model of the atom that is very sharp
- Atom is not moving much and is in the same position in all the molecules of the crystal
- Values >50 indicate that the atom is moving so much that it can barely be seen
- Atoms coloured by temperature factors
What colours and where do you find different b-factrs
- High values (lots of motion) in red and yellow
- Low values in blue
- The protein interior (core) has low B factors but the surface residues have higher values
What is the PDB- protein data bank
- The PDB archive contains information about experimentally-determined structures of proteins, nucleic acids, and complex assemblies
- Structural biologists determine the location of each atom relative to each other in the molecule then deposit this information, which is then annotated and publicly released into the archive
How many structures determined by XRC are in the protein data bank
- ~150,000 structures determined by XRC
How do you identify a protein in the PDB
- Each entry in the PDB given a unique identification code e.g. 1ATP, 1TOX, 3LCB
- PDB files
a) Header, summary of the protein, citation information, details of structure solution, sequence
b) List the atoms in each protein (and solvent, water, ligands), and their 3D location in space (coordinates)
c) Typically contains coordinates of just one asymmetric unit which may or may not be the same as the biological assembly - PDB offers tools for browsing, searching and analyzing structural data
What are limitations to X-ray structures
- Need lots of highly pure protein (~5-10 mg), so may be limited to using recombinant proteins
- Sometimes it is challenging to find a condition where the protein crystallizes
- Proteins with floppy loops or moving domains can be problematic
- Might not be able to crystallize these
- X-ray structures are static – no information about dynamics
- Hydrogen atoms scatter poorly and are only visible at very high resolution
What are advantages to x-ray structures
- Protein crystals are typically half water so the protein’s environment is actually pretty physiological
- Structure also shows more than just protein (H2Os, metals, ions, ligands etc.)
- At atomic resolution (<1.0 Å) bond lengths can be measured directly instead of assumed, and deviations from canonical geometry can be seen
- No lower limit on protein size as long as it is well folded
- No upper limit on molecule size- Intact viruses and ribosome have been solved (5 MDa)
- Many steps can be automated- High resolution structures can be solved quickly after collecting data