Computational Structural Biology Exam Flashcards
GFP
- green fluorescent protein
- keeps the chromophore planar and facilitates an excited-state proton transfer for the fluorescent coloring
2 types of atomistic interactions
covalent (the framework of biomolecules)
non-covalent (dynamic glue)
covalent
- the framework of biomolecules
- forms when. atoms share pairs of electrons that hold molecules together
ex/ peptide, phosphodiester, glycosidic bonds
peptide bonds
covalently link amino acids into polypeptide chains
phosphodiester bonds
form the sugar-phosphate backbone of DNA and RNA
covalent
glycosidic bonds
join monosaccharides to form complex sugars
covalent
characteristics of covalent bonds
- strength/stability for complex structures
- directionality: covalent bonds limit the specific angles and orientations leading to the 3D shapes of biomolecules
– single bonds allow rotation
– double/triple bonds restrict rotation
directionality of covalent bonds
covalent bonds limit the specific angles and orientations leading to the 3D shapes of biomolecules
– Single bonds: allow rotation, contributing to molecular flexibility
– Double/Triples bonds: restrict rotation, affecting the rigidity and function of molecules
non-covalent bonds
- the dynamic glue
- weaker than the covalent bonds and involve electrostatics (charge dipoles, van der waals)
- drive most of biology
— molecular recognition
— macromolecular structure
types of non-covalent electrostatic interactions
- charge-charge
- charge-dipole
- dipole-dipole
- charge-induced dipole
- dipole-induced dipole
- dispersion (van der Waals)
molecular recognition
Enzyme-substrate binding
Antigen-Antibody interactions
macromolecular structure
Membrane formation
Protein-protein interactions
Base pairing in DNA and RNA
Protein folding
structural biology
- determines the 3D shapes of biological macromolecules and how these shapes relate to functions
why study structural biology?
- Proteins and nucleic acids adopt specific shapes crucial for their biological roles
- Primary Goal: to understand how molecular machines in cells work by deciphering their atomic arrangements
primary structure of a protein
- The linear sequence of amino acids, held together by covalent peptide bonds
- dictates how the protein will fold into higher-order structures
- does not reveal protein’s functional form/activity
- its folding process may depend on cellular factors/chaperones
secondary structure of a protein
- local conformations of the polypeptide chain, stabilized primarily by hydrogen bonds
- structural motif are critical for certain functions
— pleated sheet, alpha helix, 310 helices - undergo local fluctuations – alpha helices can unwind, and beta-sheets can twist – adding to functional flexibility
tertiary structure of a protein
- complete 3D shape of a single polypeptide chain
- reveal active sites or binding pockets were catalysis or molecular interactions occur
- predicting how a sequence folds into its tertiary structure is complex even with knowledge of 2ndary structures
particle behaviors
- determined by quantum numbers (principle, orbital, magnetic)
— based on electrons specific energy levels and characteristics - electrons mix into molecular orbitals based on their specific energy level
*** molecular orbitals are what determine behavior as particles interact with orbitals
*** changing positions changes orbitals
RESULTS in e- density distribution unique to that structure
what causes different e- density distributions?
particles interacting with molecular orbitals and energy levels differently based on positions of e- within structure
3 types of experimental techniques based on probes interacting with molecule’s e- density
- x-ray crystallography
- NMR spectroscopy
- cryo-electron microscopy
x-ray crystallography
- uses how a crystal of molecules diffracts X-rays
Basic Principle: photons scatter when they interact with atoms
Probe: photon (carrier of electromagnetic radiation)
The scattered X-rays form a diffraction pattern unique to the crystal (elastic scattering by e-)
elastic scattering for x-ray crystallography
- Incident photon induces an oscillating dipole by distorting the electron density (Rayleigh)
- An oscillating dipole acts as an electromagnetic source and re-emits photons at the same wavelength in all directions
constructive interference
- needed to amplify the signals of the e- for the detectors of the diffraction pattern
- wavelengths are similar and in phase –> constructively interfere
- waves are out of phase –> destructively interfere
diffraction pattern
- spots on the detector represent the reflections of the scattered X-rays
– Intensity of the spots reflects the electron density in the crystal
– Position and angle of the spots corresponds to the geometry
*** does NOT directly show the atomic positions but provides the data needed to infer e- density
building an e- density map
- reveals the distribution of electrons in the crystal, indicating where atoms are located
- interpreted by fitting atomic models (e.g. amino acids for proteins) into density
- Low-resolution data make it difficult to assign atomic positions precisely, leading to uncertainty in the model
Why do we need crystals?
- Crystals have the same repeating unit cell, which amplifies our signals
If in solution, particles would be:
– Too sparse to diffract
– Moving and diffraction pattern would constantly change
NMR spectroscopy
How atomic nuclei interact with magnetic fields and radiofrequency pulses
Cryo-Electron Microscopy
- how molecules scatter electron beams
- beam of high-energy electrons used instead of photons
- no crystals used: The sample is sample is rapidly frozen in vitreous ice to preserve its native structure
— By freezing sample, the biological molecules are imaged in their native hydrated state.
UniProt
- protein information database
- Comprehensive database to access curated data about protein structures, functions, sequences, and annotation
- Reviewed (Swiss-Prot): experts manually curated and verified these entries, ensuring high accuracy
- Unreviewed (TrEMBL): these entities are automatically generated and have no been manually reviewed
- entry ID’s are unique identifiers for the proteins
- Protein Data Bank contains structures (PDB)
why are electrons used for Cryo-EM?
- Have much shorter wavelength (~ 0.02 Å at 300 keV) than photons
- Light elements which scatter electrons more effectively than X-rays
Single Particle Analysis (SPA)
- main Cryo-EM technique used to determine the 3d structures of individual macromolecules
- Millions of image of individual particles are collected from a thin layer
- Particles are computationally aligned and classified into different orientations
5 Challenges of disorder in molecules
- flexibility and disorder
- x-ray crystallography
- Cryo-EM and conformational flexibility
- Intrinsically Disordered Proteins (IDPs)
- Conformational Heterogeneity and Biological Function
Challenge of flexibility and disorder in biomolecules
- Molecules are not static
- Proteins often exhibit flexibility, disordered regions, and multiple confrontations
Why it matters: structural techniques often require ordered/stable configurations
Challenges in X-ray Crystallography
- Flexible or disordered regions do not pack into crystals well, often leading to failure in obtaining high-quality crystals
- In cases where crystallization is successful, flexible or disordered regions do not show up clearly in e- density map
- Crystals capture a single conformation of the molecule, often ignoring the flexibility or dynamic range
Challenges in Cryo-Em and Conformational Flexibility
- strength of Cryo-Em is its ability to capture multiple conformational states of a molecule, providing insights into flexibility and structural heterogeneity
- Challenge: that highly flexible or disordered molecules may appear as fuzzy or low-resolution regions in the final structure
- Advanced computational techniques are required to sort out different conformations present in Cryo-EM data
Intrinsically Disordered Proteins (IDPs)
lack a stable 3D structure under physiological conditions but are still functional, often gaining structure upon binding to partners
Challenge of Conformational Heterogeneity and Biological Function
Many proteins function by switching between different conformations, which is essential for their activity (e.g. enzymes, transporters, and receptors)
ex/ G-protein coupled receptors that adopt different conformations when bound to different ligands, triggering different cellular responses.
G-protein coupled receptors (GPCRs)
adopt different conformations when bound to different ligands, triggering different cellular responses.
Challenges in Experimental Structural Biology
Technical Limitations:
– Difficulty in capturing dynamic and flexible regions.
Incomplete structures due to unresolved disordered regions.
Biological Complexity:
– Dynamic conformational ensembles not represented in static snapshots
Resource Constraints:
– Time-consuming and costly experiments
Why predict protein structure?
- Protein structure dictates intersections, signaling, and biochemical roles.
- Experimental methods (x-ray, Cryo-EM) provide high-resolution structures but are resource-intensive and time-consuming
Structural insights can accelerate…
- Drug discovery: designing small-molecule inhibitors or antibodies that target specific protein conformations/
- Biotechnology: engineering proteins for industrial to therapeutic applications
- Disease research: mutations causing structural defects linked to diseases like Alzheimer’s and cystic fibrosis.
why is prediction is critical for the future of biology?
- Advances in predictive accuracy are opening new frontiers in biology
- integrating predictive models with experimental data is the way forward
- Structure prediction complements genomics/transcriptomics to create a holistic understanding of biological function
6 things make structure prediction hard
- conformational space
- complex energy landscapes
- flexibility and dynamics
- environmental effects
- post-translational modifications (PTMs)
- methods are data-driven
Conformational space
- Proteins can adopt a large number of possible conformations.
- Levinthal’s Paradox: a protein can’t sample all conformations in a biologically reasonable time, yet it folds quickly.
– Ex/ A protein with 100 amino acids, each capable of adopting about 3 torsion angles, results in ~3 ^100 possible conformations.
complex energy landscape
- A potential energy surface (PES) represents the energy of a system as a function of the positions of its atoms.
– Understands how the system’s energy changes upon reactions or movements
– Proteins fold to the lowest free-energy state, but this landscape is highly rugged. - Energy calculations are computationally intensive and depend on accurate force fields.
flexibility and dynamics
- Proteins are not static; they adopt multiple conformations (flexibility) based on their environment and interactions with other molecules
- Some proteins/regions do not adopt a fixed 3D structure but remain disordered or flexible under physiological conditions.
environmental effects
- Proteins fold differently in different environments
- Predictions need to capture interactions with solvent molecules, ions, and cofactors
Post-translational modifications (PTMs)
PTMs such as phosphorylation, glycosylation, and methylation can alter protein folding and function
Ex/
– elF4E is a eukaryotic translation initiation factor involved in directing ribosomes to the cap structure of mRNAs
– Ser209 is phosphorylated by MNK1
– AlphaFold3 accurately predicts changes when they’re already known.
methods are data driven
Our predictions rely on similarity to known structures, but novel sequences or folds (for which no homologous structures exist) are difficult to predict accurately.
– Ex/ AlphaFold has made strides, but prediction de novo structures remain challenging, especially for proteins with no templates.
homology modeling
- predicts protein structures based on evolutionary relationships
*** The main principle is that proteins with similar sequences tend to fold into similar structures.
Common tools for homology modeling: MODELLER, SWISS-MODEL, Phyre2
– most accurate when sequence identity to other proteins is high (>30%)
Hidden Markov Models (HMMs)
HMMs: statistical models representing sequences using probabilities for matches/indels (probabilistic states)
- capture evolutionary patterns in proteins
- predicts outcomes based on transitional probabilities
- captures more robust alignments
- include info on hidden states
HMM stepwise
- start with a multiple sequence alignment
- indels can be modeled
- occupancy and amino acid frequency at each position in the alignment are encoded
- profile created
HMMs model protein sequences as a series of probabilistic states (4)
- hidden states
- match states
- insertion states
- deletion states
hidden states
represent the underlying biological events that are not directly observable
match states
conserved positions in the sequence
insertion and deletion states
- Insertion states: positions where extra residues are added
- Deletion states: positions where residues are missing
HMMER
a tool that uses HMMs to search databases for sequence that match a given profile HMM (homology)
– Used to find homologous sequences, identifying evolutionary relationships across protein families
SWISS-MODEL
automated protein structure homology-modelling platform for generating 3D models of a protein using a comparative approach.
*** novel proteins are very challenging
when to use threading instead of homology modeling
- In cases where sequence similarity to known structures is low (<30%), homology modeling becomes unreliable.
- Threading matches sequences to known structural folds based on structural rather than sequence similarity
*** Phyre2, RaptorX, MUSTER, and I-TASSER are commonly used for threading and takes much longer than homology modeling.
identifying the right fold stepwise
- sequences
- LOMETS threading
— template - template fragments for structure assembly
- clustering
— cluster centroid - structure re-assembly
- lowest E structure
— final model - TM align search
- PDB library
- structural analogy
— function prediction
contact maps
- A contact map is a 2D representation of which residues are in close proximity
- allow for visualization of residue interactions in proteins
contact maps and spatial proximity
- determined by spatial proximity, not sequence order, typically within a certain distance threshold
- Residues on the diagonal are adjacent in sequence (and spatially)
- residues far apart in the sequence can still be close in the 3D structure, reflected in contact map
The Rise of Machine Learning in Structural Biology
- Traditional methods like homology modeling and threading rely on templates and known structures
- ML predicts 3D structures only from sequenced data
- AlphaFold (DeepMind) and RosettaFold (Baker Lab) lead the charge in this area.
AlphaFold
- Developed by DeepMind
*** predicts protein structures with atomic accuracy by using deep learning models trained on large structural datasets
Breakthroughs:
- AlphaFold 2 achieved near-experimental level accuracy in the 2020 CASP14 competition (critical assessment of protein structure prediction)
- AlphaFold 3 (2024) predicts proteins, DNA, RNA, ligands, and post-translational modifications.
Coevolving residues mutate in a correlated manner
- Mutations in one residue often result in compensatory mutations in its interacting partner
- This is observed across species through analysis of homologous protein sequences
- Correlated mutations indicate functionally significant residue pairs
coevolution analysis
- helps predict which residues are close in the 3D structure
- Residues showing correlated mutations are likely to be spatially close in the folded protein
- This is particularly useful when no experimental structure is available.
coevolution detection
- using large multiple sequence alignments (MSAs) from homologous proteins.
- The more diverse the sequences in the MSA, the better the resolution of coevolving residues.
- Evolutionary info from MSAs guides predictions for residue-residue contacts.
Coevolution example: DHFR
- Residues with a high score (i.e. coevolve) are near each other in the protein’s structure (i.e. small distance)
Coevolutionary signals can be noisy.
- Not all correlated mutations are due to direct physical interactions; some may be indirect.
- Noise from data can come from random mutations or insufficient evolutionary diversity.
- Large and diverse sequence data sets are needed for reliable coevolution predictions.
Machine learning leverages coevolution for high-accuracy predictions.
- AlphaFold and RosettaFold utilize coevolutionary data from MSAs to predict residue interactions.
- incorporate evolutionary info along with structural features, leading to highly accurate predictions.
alphafold pipeline (evoformer)
input sequence and MSA –> ML models ==> prediction of atomistic structure
- Using MSAs and contact maps, DeepMind trained a model to predict protein structures
– Contact maps are converted into dihedral angles
What is new in AlphaFold 3?
Biggest change is the use of a diffusion model
Diffusion models essentially learn to unscramble atoms into a structure.
- supercharged for any biomolecule
** breakthrough but not a final solution
– caveat is that proteins are dynamic
alphafold and disordered proteins
- At least 40% of proteins have disordered regions
- AlphaFold (and all other methods) struggle with disordered regions.
LARP1
protein movements
- proteins undergo movements like folding, unfolding, and domain motions.
– essential for binding, catalysis, and signal transduction.
– Understanding dynamics is crucial for drug design, protein design, biotech, etc.
Protein structure determination and prediction provide fixed snapshots
***DO NOT capture the full range of functional conformations
molecular dynamics (MD)
- provide time-resolved insights into protein behavior
- more realistic analysis of proteins
- atoms are treated as classical particles (atoms treated as hard spheres)
– involves:
1. simulation of atomic movement
2. visualization and analysis
Simulation of Atomic Movement
- MD computes trajectories of atoms over time scales of femtoseconds to microseconds.
- It can capture both small-scale vibrations and large-scale conformational changes.
Visualization and Analysis
- Provides detailed information on atomic interactions and energy changes.
- Enables the study of mechanisms at an atomic level
MD simulations provide more realistic analysis of proteins through..
- refinement of predicted structures
- Studying Intrinsically Disordered Proteins
- Folding and Misfolding Pathways
Refinement of Predicted Structures (MD)
- MD helps minimize energy and relax structures obtained from modeling.
- Improves accuracy by accounting for environmental effects
Studying Intrinsically Disordered Proteins (MD)
- MD captures the flexible nature of disorder regions.
- Aids in understanding functions that depend on disorder
Folding and Misfolding Pathways (MD)
- Simulates the folding process to identify intermediates.
- Investigates misfolding mechanisms relevant to diseases.
classical mechanics
- Describes the motion of macroscopic objects
- Assumes particles have well-defined positions and velocities
- Governed by Newton’s Laws of Motion
** atoms are treated as hard spheres
Quantum Mechanics
- Necessary for describing behavior at atomic and subatomic scales
- Accounts for wave-particle duality, uncertainty principle, proton tunneling
- Electrons exhibit quantum behavior that cannot be captured classically
Classical approximation impacts…
Nuclei →
- Nuclei (protons and neutrons) are much heavier than electrons.
- Their de Broglie wavelengths are very small, making quantum effects less significant
- At RT, thermal energies dominate over quantum zero-point energies.
Electrons →
- not explicitly simulated in classical MD.
- Their effect are included implicitly through potential energy functions (force fields).
- The electronic structure is assumed to remain in the ground state during simulation.
suitable systems vs limitations of classical approximations
Suitable Systems:
- Biological macromolecules (protein, nucleic acids, lipids)
- Materials where electronic excitations are not critical.
- Processes where bond breaking/forming does not occur.
Limitations:
- cannot accurately simulate chemical reactions involving electronic transitions.
- Quantum phenomena like tunneling and zero-point energy are not captured.
Newton’s Second Law
The acceleration of an object is directly proportional to the net force acting on it and inversely proportional to its mass
– given atomic forces, we can calculate atomic movements
(F = ma)
forces (NSLoM)
- the negative gradients of potential energy
– potential energy is dependent on positions of all atoms
– determines accelerations and thus motion of atoms
Time evolution of the system
- computed by integrating equations of motion
- Continuous motion approximated using discrete time steps
– Determine forces
– Move a small amount forward in time
– Repeat - Time step length determines how “smooth” the animation/trajectory
stepwise of molecular simulations computing an atomistic trajectory
- 3d coordinates of atoms in the system
- atoms exert forces on each other
- using Newton’s equation of motion, we can predict their movement
integration algorithms
- Numerical Solution:
- Approximate the continuous equations of motion using discrete time steps - Update Position and Velocities:
- Calculate the new positions and velocities of particles based on current forces.
Challenges Addressed by Integration Algorithms
- Stability: prevent numerical errors from accumulating over many time steps
- Accuracy: ensure that the trajectories closely follow the true physical behavior.
- Efficiency: balance computational speed with the precision of the simulation.
Common Integration Algorithms
- Verlet: uses current and previous positions to calculate the next position.
- Velocity Verlet: an extension of the Verlet algorithm that explicitly calculates velocities.
time step length
- determines how smooth the trajectory
- smaller time steps lead to more calculations to simulate same amount of time
force fields
- used to compute energies and atomic forces
- sets of equations that describe the potential energy of a molecule based on atomic positions
- based on dynamics of bond lengths, bond angles, and dihedral angles
chemical bonds
- behave like springs
- Two spheres (atoms) connected by a single spring
- The spring resists changes in the distance between the two atoms
- bond vibrations are seen as harmonic oscillators
Spring constants
- are determined by bond order and atom types
- energy increases (k) in kcal/mol as bond length decreases
— single > double > tripe
Bond angles behave like…harmonic oscillators
- Three balls connected by 2 springs forming an angle, with a “hinge” at the central atom.
— We also have separate spring constants for bond angles.
dihedral angle
- the angle between two planes formed by four sequentially bonded atoms (A-B-C-D)
- the angle between these two planes.
- describes the rotation around the bond between atoms B and C.
*** do not behave like springs
Dihedrals VS Bonds and Angles
Bonds and Angles:
- govern local geometry (bond lengths/angles) using quadratic (harmonic) potentials that favor specific distances and angles
Dihedrals:
- govern torsional or rotational flexibility around bonds, typically using periodic and multi-well potentials to allow for multiple stable conformations.
dihedral potentials
- capture arbitrary functions with rotational symmetry.
ex/ periodic energy functions with varying minima - can be modeled using custom fourier series
fourier series
- approximate functions as a sum of sine and cosine waves
- approximate (any) symmetrical rotational energy function.
adding more sine and cosine terms for fourier series
improves the approximation
- allows the series to closely match the original complex function
Noncovalent Interactions Role in Molecular Assembly
- Facilitate the organizations of molecules into complex structures
- Determine the macroscopic properties of materials (e.g. solubility, melting points)
Noncovalent Interactions Importance in Biological Systems
- Govern essential processes like enzyme-substrate binding, protein folding, and membrane formation
- Critical for understanding biochemical pathways and drug design
Why are Noncovalent Interactions crucial for MD?
While covalent bonds define the primary structure of molecules
— noncovalent interactions are pivotal for dictating how molecules interact.
Dispersion Forces
Nature:
- weak, attractive forces arising from instantaneous dipoles in molecules
Role:
- stabilize molecular assemblies by promoting close packing
C6 = dispersion coefficient
Repulsion Forces
Nature:
- Strong, short-range forces due to overlapping electron clouds.
Role:
- Prevent atoms from collapsing into each other, maintaining molecular integrity
C12 = repulsion coefficient
Combined van der Waals Potential
Van der Waals forces are modeled using the Lennard-Jones potential
— captures both the attractive and repulsive aspects of noncovalent interactions.
Electrostatic forces decay
- decay as 1/r, making them significant over longer distances compared to van de Waals forces
*** Electrostatic Interactions Drive Charged and Polar Molecule Behavior
what makes up the complete force field?
bonded and non-bonded interactions
parameterizing force fields starts with
Begins with Quantum Mechanical Data for Smalls Molecules
- QM calculations
- data utilization
- small molecule focus for simplicity and accuracy
Role of Quantum Mechanics in Parameterizing Force Fields
QM Calculations:
- provides high-accuracy data on molecular geometries, energetics, and electronic distributions
Data Utilization:
- QM data inform the selection and tuning of force field parameters to ensure they reflect true molecular behavior.
Small Molecule Focus for Parameterizing Force Fields
Simplicity:
- Smaller molecules have fewer atoms and simpler interactions, making QM calculations more manageable.
Accuracy:
- QM methods (e.g. Density Functional Theory, Hartree-Fock) yield precise information essential for initial parameterization.
Complexity of Proteins in Force Field Parameterization
Size & Structure:
- protein consists of hundreds to thousands of atoms with intricate 3D structures.
Diverse Interactions:
- include a variety of noncovalent interactions, such as hydrogen bonds, ionic bonds, hydrophobic interactions, and van der Waals forces.
Limitations of QM for Large Systems for Force Field Parameterization
Computational Cost:
- QM calculations become computationally prohibitive for large biomolecules like proteins.
Alternative Strategies:
- Utilize QM data from representative small segments or use empirical and semiempirical methods.
Types of Experimental Data –>(Experimental Data is crucial for Refining Force Field Parameters)
- Spectroscopic Data:
- Infrared (IR), Nuclear Magnetic Resonance (NMR), and Raman spectroscopy provide insights into bond vibrations and molecular geometries. - Crystallography:
- X-ray crystallography offers precise information on atomic positions and molecular conformations. - Thermodynamic Measurements:
- Data on melting points, boiling points, and solvation energies inform interaction strengths.
Parameters Optimization
Fitting Process:
- adjusts force field parameters to minimize discrepancies between simulations results and experimental observations.
Validation Metrics:
- use root-mean-square deviations (RMSD), binding affinities, and structural stability as benchmarks.
Fitting Force Field Parameters to Experimental Data…
Ensures Realistic Simulations
– uses parameter adjustment
Parameter Adjustment for Fitting Force Field Parameters
Process:
- fine-tune force field parameters to minimize discrepancies between simulations outcomes and experimental observations.
Techniques:
- use of optimizations algorithms and statistical methods to achieve best-fit parameters
Challenges in Parameterizing Force Fields for Proteins
- High Dimensionality
- Diverse Chemical Environments
- Dynamic Conformational Changes
- Long-Range Electrostatic Interactions