protein modeling Flashcards
Levels of Protein Organization
Prmary , Secondary , tetriary, Quaternary structure
Proteins intramolecular interactions
hydrogen bonds , electrostatic attractions , van der waals attractions , hydrphobic effect
Spectroscopy methods for structural biology ?
x ray crystallography
Nuclear magnetic resonance
Cryo-Electron Microscopy
All these methods require a large quantity of pure protein
Proteins expression methods
Extracted directly from conventional cells
Bacterial expression
Free of cell expression
yeast cells expression
Insect and Human embryonic kidney (HEK) cells expression
Proteins purification ?
- Cells lysis
- Centrifugation to get
the fraction of interest
For membrane proteins
more steps are required
multiple purification processes Purification 1,2… or more
Assess protein
purity with SDS
page Gel
protein purification methods ?
Size exclusion chromatography
affinity chromatography
Nuclear magnetic resonance (NMR)
Multidimentional NMR
Nuclear magnetic resonance (NMR)
advantages and disadvantages ?
Advantages:
-Can probe dynamic and intrinsically disordered proteins
-Non-destructive and non-invasive
-Three-dimensional structures in their natural state can be measured directly in solution
- Require less protein than X-ray crystallography
Disadvantages:
-Limited to small proteins because of the difficulty of interpretation of biomolecules with large molecular weight proteins
y
how it works
. In presence of a magnetic field there are two spin states with a high and a low energy state
. It is possible to change the nucleus state by application of a second magnetic field
. The change of state upon relaxation (return to the initial state) is mesured to get NMR chromatograms
The chemical shift in NMR is extremely important, as it gives information about the local structure surrounding the nucleus of interest.
X-ray crystallography
Advantages:
-Useful for large structures: Not limited by size or atomic weight
-Provide Higher average resolution than Cryo-EM
-Relatively simple to use when conditions are optimized
-Adapted to small molecules characterization
Disadvantages:
-Require a massive quantity of proteins
-Challenging for membrane proteins
-The crystal must diffract to high resolution
-Conformation can be altered due to the crystal organization
-Can not probe proteins’ dynamic
-Phase Problem
the crystalisation is a n empyrical proccess
Fourier Transform isa mathematical function that helps to transform the signals between two different domains, such as in this case, transforming the signal from frequency (1/d) domain to distance (d) domain.
X-ray crystallography Phase problem
Molecular replacement uses phase data from a similar protein
Isomorphous replacement uses heavy atoms properties to deduce the phase
Direct Method Based on structure factor magnitude of different Fourier components limited to molecules with ∼2000 nonhydrogen atoms and data at high resolution, <1.2 Å
Phase information is necessary for FT calculation and thus to solve protein structure
in detail:
X-ray Diffraction: In X-ray crystallography, a crystal of the sample of interest is irradiated with a beam of X-rays. The X-rays interact with the electron density in the crystal, resulting in constructive and destructive interference patterns, known as diffraction spots, that are captured on a detector.
Diffraction Pattern: The diffraction pattern consists of spots whose positions and intensities are related to the structure of the crystal. The positions of the spots provide information about the spatial arrangement of atoms in the crystal (i.e., the structure), while the intensities of the spots are related to the electron density of the crystal.
Phases: The diffraction pattern contains information about both the amplitude and phase of the scattered X-rays. The amplitude is related to the intensity of the diffraction spots, while the phase is related to the position of the waves within each spot.
Phase Problem: While the amplitudes of the scattered X-rays can be measured directly from the diffraction pattern, the phases cannot be directly determined. This is because the intensity measurements provide only the squared amplitude of the complex wavefunction, which does not contain phase information.
Solving the Phase Problem: Solving the phase problem involves determining the phases of the scattered waves in order to reconstruct the electron density map of the crystal. Several methods have been developed to solve the phase problem, including:
Direct methods, which use mathematical relationships between the phases and amplitudes of the diffraction data to calculate the phases directly.
Molecular replacement, which involves using a known model or structure similar to the unknown structure as a starting point to solve for the phases.
Experimental phasing techniques, such as multiple isomorphous replacement (MIR) or single-wavelength anomalous dispersion (SAD), which involve introducing heavy atoms or exploiting anomalous scattering to obtain phase information.
Iterative Refinement: Once initial phases have been determined, the electron density map is calculated and refined iteratively to improve the accuracy of the structure. This process involves comparing the calculated diffraction pattern with the observed diffraction data and adjusting the phases and atomic coordinates until a model consistent with the experimental data is obtained.
Cryo-Electron Microscopy advantages and disadvantages ?
Advantages:
-Useful for large structures: Not limited by large proteins
-Proteins are imaged in a native like state
-Can deal with a limited amount of flexibility
-Require less quantity of proteins than X-ray crystallography
Disadvantages:
-Can deal with a limited amount of flexibility
-Size limitation for proteins smaller than 60 kDa
-The protein needs to behave well on CryoEM support
(avoid aggregation or preferential orientation)
Cryo-Electron Microscopy how does it work ?
Purified protein
Freezing / Negative
staining
EM data
collection
Particle picking
Particle alignment
and classification
3D model
reconstruction
Model refinement
Global refinement with constraints ?
Global Optimization: Global refinement with constraints involves optimizing the atomic model while simultaneously satisfying all imposed constraints. This global optimization process ensures that the refined model is physically realistic, chemically plausible, and consistent with experimental data.
done with rosseta commons , CCP4, phenix
Manual inspection also advised
Model quality is limited by the resolution of the map, what are the possible resolutions?
4.8Å resolution map
(no side chain informations)
3.5Å resolution map
(partial side chain informations)
2.5 Å resolution map
(side chain informations)
And check local resolution, Maps are not homogeneous!!!
PDB and mmCIF atimic coordinates ?
Both the PDB (Protein Data Bank) and mmCIF (macromolecular Crystallographic Information File) formats are widely used formats for storing structural data of biological macromolecules, such as proteins, nucleic acids, and complexes. These formats contain atomic coordinates that define the positions of atoms within the molecule’s three-dimensional structure. Here’s how atomic coordinates are represented in each format:
PDB Format:
In the PDB format, atomic coordinates are typically represented as a list of lines, each corresponding to a single atom in the molecule.
Each line begins with the keyword “ATOM” or “HETATM” (for atoms belonging to the main chain or to hetero groups, respectively), followed by fields specifying the atom’s serial number, atom name, residue name and number, chain identifier, coordinates (x, y, z), occupancy, temperature factor (B-factor), and optional additional fields.
The atomic coordinates are typically stored in Angstrom units (1 Å = 0.1 nm) and represent the positions of atoms in a Cartesian coordinate system.
Example of an ATOM line in PDB format:
mathematica
Copy code
ATOM 1 N ASP A 1 1.876 5.512 -9.364 1.00 0.00 N
mmCIF Format:
In the mmCIF format, atomic coordinates are represented using a structured data format based on key-value pairs and data blocks.
Atomic coordinates are stored within the data blocks corresponding to individual structural models or entries.
The mmCIF format includes specific data categories and fields for representing atomic coordinates, such as “_atom_site.label_atom_id”, “_atom_site.Cartn_x”, “_atom_site.Cartn_y”, and “_atom_site.Cartn_z”, which specify the atom names and Cartesian coordinates of atoms, respectively.
Similar to the PDB format, the coordinates are typically stored in Angstrom units.
Example of atomic coordinates in mmCIF format:
mathematica
Copy code
_atom_site.label_atom_id _atom_site.Cartn_x _atom_site.Cartn_y _atom_site.Cartn_z
N 1.876 5.512 -9.364
In both formats, the atomic coordinates provide essential information about the spatial arrangement of atoms within a molecule’s structure, enabling visualization, analysis, and manipulation of the molecular model using computational tools and software.
Small moleculescanbedisplayedin different ways , how do we make the computer understand ?
CNCC@Hc1ccc(O)c(O)c1
Adrenaline
how does The information content increases with„dimensions?
„1D“→
Elements and their occurrence
→
Can contain someinformation about atom connectivity
→
Example: SMILES
*
“2D”→
1D + some information on spatial orientation
→
Example: SDF
*
“3D”→
2D + more detailed information on spatial orientation
→
Example: PDB, MOL2
SMILES strings, what are they ?
*
simplified molecular-input line-entry system
*
Line representation of chemical structure of a molecule
*
Basic information on atom connectivity
*
Usually there are different ways to correctly write the same molecule
→
E.g. ethanol: CCO, OCC and C(O)C
SD file ?
*
SDF = structure-data file
*
Someinformationon spatialorientation
*
Associatesdata with structure
*
Molecules display similar toMOL file
SD File (or SDF: Structure Data File):
SD files are more versatile and can store both 2D and 3D chemical structure information for small molecules.
SD files are often used to store additional data beyond basic structure information, such as compound names, identifiers, properties, and experimental data.
SD files support a tabular format, allowing multiple compounds and associated data to be stored in a single file, with each compound represented by a block of data.
SD files can include both text and binary data, making them suitable for storing a wide range of chemical and biological data.
SD files are commonly used in cheminformatics applications, chemical databases, and virtual screening studies.
Mol2 file ?
*
Spatialinformationofmolecule
*
Information on atomtypes
*
Information on connectivity
*
Sometimesinformationon partial charges
MOL File (or Molfile):
MOL files are simple and widely used text-based file formats for representing the 2D chemical structure of small molecules.
A MOL file typically contains information such as atom connectivity, bond orders, atom coordinates, and atom types.
MOL files are commonly used for storing and exchanging chemical structure data between software applications and databases.
MOL files can represent a single molecule or a collection of molecules in a single file, with each molecule separated by a “$$$$” delimiter.
MOL files are human-readable and can be easily edited with a text editor.
Why search in molecule/ligand databases?
*
You want to know….
→
…characteristics of a specific molecule
→
…whether a molecule binds to a protein
→
…whether ligands for a protein are known
→
…whether a novel ligand you discovered is similar toknown ligands
→
…whether your docking setup works
Thereareseveraldatabaseswithdifferent information
*
Whatinformationareyousearchingfor?
*
Whatinformationisavailablein thespecificdatabase?
What to do if there is no experimental structure?
*
Only few residues difference?
→
In silico point mutations might be sufficient
*
Structure of a protein with a similar sequence available?
→
Homology Modelling
*
No good templates? No specific details required?
→
Ab initio folding
what is the basis of Homology modelling ?
*
“Classical” approach to create structural models
*
Comparative approach
*
“Similar sequences have a similar 3D structure”
Homology modelling Why Choosing a good template is essential?
Choose the right template
→
An experimental proteinstructureofa similarprotein
→
The more similar the sequence, the better the prediction!
→
Sequence identity: at least 25-30% for decent results
→
Sequence similarity
→
How good is the resolution of the template?
→
Is there more than one template?
→
Is the entire target sequence covered? Loop modelling required?
→
Any additional considerations? E.g. specific conformations?
what is the difference between sequence identity and sequence similarity
?
Sequence identity refers to the exact matching of residues (nucleotides or amino acids) at the same position in aligned sequences.
Sequence similarity refers to the degree of likeness or resemblance between two sequences, taking into account not only identical residues but also similar residues.
what to consider ? Homology modelling,
Carefully align similar regions
2.
Align the sequence of target and template(s)
→
Which parts of the sequence are similar?
→
Are there parts you don’t want to align?
what should you consider ? Homology modelling,
Extracting information from the template
Extracting spatial restraints
→
How is the spatial environment of a residue in the template?
→
Transfer the information to the target
Homology modelling,
what should you consider while
Modelling?
→
Transfer the spatial orientation from the template to the structure
→
Keep as many restraints as possible
Homology modelling, What should you consider during Refinement to improve the model ?
Refine the model
→
Side chain orientations
→
Removing clashes
→
Energy minimisation
→
….
Homology modelling what should you consider while evaluating the model ?
Don’t just use, what you get
Evaluation
→
Checking for clashes
→
Weird side chain orientations
→
Ramachandran plot
→
Kinked backbones
→
….
what is the Ramachandran plot ?
*
Does the backbone adopt angles and conformations that are theoretically allowed?
Homology modelling ALL the steps ?
1.
Choose the right template
2.
Align the sequence
3.
Extracting spatial restraints
4.
Modelling
5.
Refine the model
6.
Evaluation
if you dont like what you see you start again with different input
Homology modelling
When does it work well?
- Template and Target have a high sequence similarity/identity
- Known structural motifs
→ e.g. α-helices
Homology modelling
When is it less accurate?
- Loops (structurally more complex)
- No templates with similar sequence
- Reproducing different protein conformations ( Choice of template improtant )
- Details e.g. binding site conformation & side chain orientations ( you can model with a ligand in the binding site)
Homology modellingSummary: What to watch out for…
*
Choose templates carefully
→
Resolution
→
Sequence similarity
→
Conformation
→
Bound ligand?
*
Pay attention to how the sequences are aligned
*
Evaluate a model before using it further
*
Is an additional optimization required?