Computational Structural Biology Exam Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

GFP

A
  • green fluorescent protein
  • keeps the chromophore planar and facilitates an excited-state proton transfer for the fluorescent coloring
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

2 types of atomistic interactions

A

covalent (the framework of biomolecules)
non-covalent (dynamic glue)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

covalent

A
  • the framework of biomolecules
  • forms when. atoms share pairs of electrons that hold molecules together

ex/ peptide, phosphodiester, glycosidic bonds

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

peptide bonds

A

covalently link amino acids into polypeptide chains

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

phosphodiester bonds

A

form the sugar-phosphate backbone of DNA and RNA

covalent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

glycosidic bonds

A

join monosaccharides to form complex sugars

covalent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

characteristics of covalent bonds

A
  • strength/stability for complex structures
  • directionality: covalent bonds limit the specific angles and orientations leading to the 3D shapes of biomolecules
    – single bonds allow rotation
    – double/triple bonds restrict rotation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

directionality of covalent bonds

A

covalent bonds limit the specific angles and orientations leading to the 3D shapes of biomolecules

– Single bonds: allow rotation, contributing to molecular flexibility
– Double/Triples bonds: restrict rotation, affecting the rigidity and function of molecules

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

non-covalent bonds

A
  • the dynamic glue
  • weaker than the covalent bonds and involve electrostatics (charge dipoles, van der waals)
  • drive most of biology
    — molecular recognition
    — macromolecular structure
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

types of non-covalent electrostatic interactions

A
  • charge-charge
  • charge-dipole
  • dipole-dipole
  • charge-induced dipole
  • dipole-induced dipole
  • dispersion (van der Waals)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

molecular recognition

A

Enzyme-substrate binding
Antigen-Antibody interactions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

macromolecular structure

A

Membrane formation
Protein-protein interactions
Base pairing in DNA and RNA
Protein folding

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

structural biology

A
  • determines the 3D shapes of biological macromolecules and how these shapes relate to functions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

why study structural biology?

A
  • Proteins and nucleic acids adopt specific shapes crucial for their biological roles
  • Primary Goal: to understand how molecular machines in cells work by deciphering their atomic arrangements
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

primary structure of a protein

A
  • The linear sequence of amino acids, held together by covalent peptide bonds
  • dictates how the protein will fold into higher-order structures
  • does not reveal protein’s functional form/activity
  • its folding process may depend on cellular factors/chaperones
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

secondary structure of a protein

A
  • local conformations of the polypeptide chain, stabilized primarily by hydrogen bonds
  • structural motif are critical for certain functions
    — pleated sheet, alpha helix, 310 helices
  • undergo local fluctuations – alpha helices can unwind, and beta-sheets can twist – adding to functional flexibility
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

tertiary structure of a protein

A
  • complete 3D shape of a single polypeptide chain
  • reveal active sites or binding pockets were catalysis or molecular interactions occur
  • predicting how a sequence folds into its tertiary structure is complex even with knowledge of 2ndary structures
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

particle behaviors

A
  • determined by quantum numbers (principle, orbital, magnetic)
    — based on electrons specific energy levels and characteristics
  • electrons mix into molecular orbitals based on their specific energy level

*** molecular orbitals are what determine behavior as particles interact with orbitals

*** changing positions changes orbitals

RESULTS in e- density distribution unique to that structure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

what causes different e- density distributions?

A

particles interacting with molecular orbitals and energy levels differently based on positions of e- within structure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

3 types of experimental techniques based on probes interacting with molecule’s e- density

A
  1. x-ray crystallography
  2. NMR spectroscopy
  3. cryo-electron microscopy
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

x-ray crystallography

A
  • uses how a crystal of molecules diffracts X-rays

Basic Principle: photons scatter when they interact with atoms

Probe: photon (carrier of electromagnetic radiation)

The scattered X-rays form a diffraction pattern unique to the crystal (elastic scattering by e-)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

elastic scattering for x-ray crystallography

A
  1. Incident photon induces an oscillating dipole by distorting the electron density (Rayleigh)
  2. An oscillating dipole acts as an electromagnetic source and re-emits photons at the same wavelength in all directions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

constructive interference

A
  • needed to amplify the signals of the e- for the detectors of the diffraction pattern
  • wavelengths are similar and in phase –> constructively interfere
  • waves are out of phase –> destructively interfere
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

diffraction pattern

A
  • spots on the detector represent the reflections of the scattered X-rays

– Intensity of the spots reflects the electron density in the crystal

– Position and angle of the spots corresponds to the geometry

*** does NOT directly show the atomic positions but provides the data needed to infer e- density

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

building an e- density map

A
  • reveals the distribution of electrons in the crystal, indicating where atoms are located
  • interpreted by fitting atomic models (e.g. amino acids for proteins) into density
  • Low-resolution data make it difficult to assign atomic positions precisely, leading to uncertainty in the model
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Why do we need crystals?

A
  • Crystals have the same repeating unit cell, which amplifies our signals

If in solution, particles would be:
– Too sparse to diffract
– Moving and diffraction pattern would constantly change

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

NMR spectroscopy

A

How atomic nuclei interact with magnetic fields and radiofrequency pulses

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Cryo-Electron Microscopy

A
  • how molecules scatter electron beams
  • beam of high-energy electrons used instead of photons
  • no crystals used: The sample is sample is rapidly frozen in vitreous ice to preserve its native structure
    — By freezing sample, the biological molecules are imaged in their native hydrated state.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

UniProt

A
  • protein information database
  • Comprehensive database to access curated data about protein structures, functions, sequences, and annotation
  • Reviewed (Swiss-Prot): experts manually curated and verified these entries, ensuring high accuracy
  • Unreviewed (TrEMBL): these entities are automatically generated and have no been manually reviewed
  • entry ID’s are unique identifiers for the proteins
  • Protein Data Bank contains structures (PDB)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

why are electrons used for Cryo-EM?

A
  • Have much shorter wavelength (~ 0.02 Å at 300 keV) than photons
  • Light elements which scatter electrons more effectively than X-rays
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Single Particle Analysis (SPA)

A
  • main Cryo-EM technique used to determine the 3d structures of individual macromolecules
  • Millions of image of individual particles are collected from a thin layer
  • Particles are computationally aligned and classified into different orientations
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

5 Challenges of disorder in molecules

A
  1. flexibility and disorder
  2. x-ray crystallography
  3. Cryo-EM and conformational flexibility
  4. Intrinsically Disordered Proteins (IDPs)
  5. Conformational Heterogeneity and Biological Function
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Challenge of flexibility and disorder in biomolecules

A
  • Molecules are not static
  • Proteins often exhibit flexibility, disordered regions, and multiple confrontations

Why it matters: structural techniques often require ordered/stable configurations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Challenges in X-ray Crystallography

A
  • Flexible or disordered regions do not pack into crystals well, often leading to failure in obtaining high-quality crystals
  • In cases where crystallization is successful, flexible or disordered regions do not show up clearly in e- density map
  • Crystals capture a single conformation of the molecule, often ignoring the flexibility or dynamic range
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Challenges in Cryo-Em and Conformational Flexibility

A
  • strength of Cryo-Em is its ability to capture multiple conformational states of a molecule, providing insights into flexibility and structural heterogeneity
  • Challenge: that highly flexible or disordered molecules may appear as fuzzy or low-resolution regions in the final structure
  • Advanced computational techniques are required to sort out different conformations present in Cryo-EM data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Intrinsically Disordered Proteins (IDPs)

A

lack a stable 3D structure under physiological conditions but are still functional, often gaining structure upon binding to partners

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

Challenge of Conformational Heterogeneity and Biological Function

A

Many proteins function by switching between different conformations, which is essential for their activity (e.g. enzymes, transporters, and receptors)

ex/ G-protein coupled receptors that adopt different conformations when bound to different ligands, triggering different cellular responses.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

G-protein coupled receptors (GPCRs)

A

adopt different conformations when bound to different ligands, triggering different cellular responses.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

Challenges in Experimental Structural Biology

A

Technical Limitations:
– Difficulty in capturing dynamic and flexible regions.
Incomplete structures due to unresolved disordered regions.

Biological Complexity:
– Dynamic conformational ensembles not represented in static snapshots

Resource Constraints:
– Time-consuming and costly experiments

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

Why predict protein structure?

A
  • Protein structure dictates intersections, signaling, and biochemical roles.
  • Experimental methods (x-ray, Cryo-EM) provide high-resolution structures but are resource-intensive and time-consuming
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

Structural insights can accelerate…

A
  • Drug discovery: designing small-molecule inhibitors or antibodies that target specific protein conformations/
  • Biotechnology: engineering proteins for industrial to therapeutic applications
  • Disease research: mutations causing structural defects linked to diseases like Alzheimer’s and cystic fibrosis.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

why is prediction is critical for the future of biology?

A
  • Advances in predictive accuracy are opening new frontiers in biology
  • integrating predictive models with experimental data is the way forward
  • Structure prediction complements genomics/transcriptomics to create a holistic understanding of biological function
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

6 things make structure prediction hard

A
  1. conformational space
  2. complex energy landscapes
  3. flexibility and dynamics
  4. environmental effects
  5. post-translational modifications (PTMs)
  6. methods are data-driven
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

Conformational space

A
  • Proteins can adopt a large number of possible conformations.
  • Levinthal’s Paradox: a protein can’t sample all conformations in a biologically reasonable time, yet it folds quickly.
    – Ex/ A protein with 100 amino acids, each capable of adopting about 3 torsion angles, results in ~3 ^100 possible conformations.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

complex energy landscape

A
  • A potential energy surface (PES) represents the energy of a system as a function of the positions of its atoms.
    – Understands how the system’s energy changes upon reactions or movements
    – Proteins fold to the lowest free-energy state, but this landscape is highly rugged.
  • Energy calculations are computationally intensive and depend on accurate force fields.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

flexibility and dynamics

A
  • Proteins are not static; they adopt multiple conformations (flexibility) based on their environment and interactions with other molecules
  • Some proteins/regions do not adopt a fixed 3D structure but remain disordered or flexible under physiological conditions.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

environmental effects

A
  • Proteins fold differently in different environments
  • Predictions need to capture interactions with solvent molecules, ions, and cofactors
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

Post-translational modifications (PTMs)

A

PTMs such as phosphorylation, glycosylation, and methylation can alter protein folding and function

Ex/
– elF4E is a eukaryotic translation initiation factor involved in directing ribosomes to the cap structure of mRNAs
– Ser209 is phosphorylated by MNK1
– AlphaFold3 accurately predicts changes when they’re already known.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

methods are data driven

A

Our predictions rely on similarity to known structures, but novel sequences or folds (for which no homologous structures exist) are difficult to predict accurately.
– Ex/ AlphaFold has made strides, but prediction de novo structures remain challenging, especially for proteins with no templates.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

homology modeling

A
  • predicts protein structures based on evolutionary relationships

*** The main principle is that proteins with similar sequences tend to fold into similar structures.

Common tools for homology modeling: MODELLER, SWISS-MODEL, Phyre2

– most accurate when sequence identity to other proteins is high (>30%)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

Hidden Markov Models (HMMs)

A

HMMs: statistical models representing sequences using probabilities for matches/indels (probabilistic states)

  • capture evolutionary patterns in proteins
  • predicts outcomes based on transitional probabilities
  • captures more robust alignments
  • include info on hidden states
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

HMM stepwise

A
  1. start with a multiple sequence alignment
  2. indels can be modeled
  3. occupancy and amino acid frequency at each position in the alignment are encoded
  4. profile created
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

HMMs model protein sequences as a series of probabilistic states (4)

A
  1. hidden states
  2. match states
  3. insertion states
  4. deletion states
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
53
Q

hidden states

A

represent the underlying biological events that are not directly observable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
54
Q

match states

A

conserved positions in the sequence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
55
Q

insertion and deletion states

A
  • Insertion states: positions where extra residues are added
  • Deletion states: positions where residues are missing
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
56
Q

HMMER

A

a tool that uses HMMs to search databases for sequence that match a given profile HMM (homology)

– Used to find homologous sequences, identifying evolutionary relationships across protein families

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
57
Q

SWISS-MODEL

A

automated protein structure homology-modelling platform for generating 3D models of a protein using a comparative approach.

*** novel proteins are very challenging

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
58
Q

when to use threading instead of homology modeling

A
  • In cases where sequence similarity to known structures is low (<30%), homology modeling becomes unreliable.
  • Threading matches sequences to known structural folds based on structural rather than sequence similarity

*** Phyre2, RaptorX, MUSTER, and I-TASSER are commonly used for threading and takes much longer than homology modeling.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
59
Q

identifying the right fold stepwise

A
  • sequences
  • LOMETS threading
    — template
  • template fragments for structure assembly
  • clustering
    — cluster centroid
  • structure re-assembly
  • lowest E structure
    — final model
  • TM align search
  • PDB library
  • structural analogy
    — function prediction
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
60
Q

contact maps

A
  • A contact map is a 2D representation of which residues are in close proximity
  • allow for visualization of residue interactions in proteins
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
61
Q

contact maps and spatial proximity

A
  • determined by spatial proximity, not sequence order, typically within a certain distance threshold
  • Residues on the diagonal are adjacent in sequence (and spatially)
  • residues far apart in the sequence can still be close in the 3D structure, reflected in contact map
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
62
Q

The Rise of Machine Learning in Structural Biology

A
  • Traditional methods like homology modeling and threading rely on templates and known structures
  • ML predicts 3D structures only from sequenced data
  • AlphaFold (DeepMind) and RosettaFold (Baker Lab) lead the charge in this area.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
63
Q

AlphaFold

A
  • Developed by DeepMind

*** predicts protein structures with atomic accuracy by using deep learning models trained on large structural datasets

Breakthroughs:
- AlphaFold 2 achieved near-experimental level accuracy in the 2020 CASP14 competition (critical assessment of protein structure prediction)
- AlphaFold 3 (2024) predicts proteins, DNA, RNA, ligands, and post-translational modifications.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
64
Q

Coevolving residues mutate in a correlated manner

A
  • Mutations in one residue often result in compensatory mutations in its interacting partner
  • This is observed across species through analysis of homologous protein sequences
  • Correlated mutations indicate functionally significant residue pairs
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
65
Q

coevolution analysis

A
  • helps predict which residues are close in the 3D structure
  • Residues showing correlated mutations are likely to be spatially close in the folded protein
  • This is particularly useful when no experimental structure is available.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
66
Q

coevolution detection

A
  • using large multiple sequence alignments (MSAs) from homologous proteins.
  • The more diverse the sequences in the MSA, the better the resolution of coevolving residues.
  • Evolutionary info from MSAs guides predictions for residue-residue contacts.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
67
Q

Coevolution example: DHFR

A
  • Residues with a high score (i.e. coevolve) are near each other in the protein’s structure (i.e. small distance)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
68
Q

Coevolutionary signals can be noisy.

A
  • Not all correlated mutations are due to direct physical interactions; some may be indirect.
  • Noise from data can come from random mutations or insufficient evolutionary diversity.
  • Large and diverse sequence data sets are needed for reliable coevolution predictions.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
69
Q

Machine learning leverages coevolution for high-accuracy predictions.

A
  • AlphaFold and RosettaFold utilize coevolutionary data from MSAs to predict residue interactions.
  • incorporate evolutionary info along with structural features, leading to highly accurate predictions.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
70
Q

alphafold pipeline (evoformer)

A

input sequence and MSA –> ML models ==> prediction of atomistic structure

  • Using MSAs and contact maps, DeepMind trained a model to predict protein structures

– Contact maps are converted into dihedral angles

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
71
Q

What is new in AlphaFold 3?

A

Biggest change is the use of a diffusion model
Diffusion models essentially learn to unscramble atoms into a structure.

  • supercharged for any biomolecule

** breakthrough but not a final solution
– caveat is that proteins are dynamic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
72
Q

alphafold and disordered proteins

A
  • At least 40% of proteins have disordered regions
  • AlphaFold (and all other methods) struggle with disordered regions.
    LARP1
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
73
Q

protein movements

A
  • proteins undergo movements like folding, unfolding, and domain motions.
    – essential for binding, catalysis, and signal transduction.
    – Understanding dynamics is crucial for drug design, protein design, biotech, etc.

Protein structure determination and prediction provide fixed snapshots
***DO NOT capture the full range of functional conformations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
74
Q

molecular dynamics (MD)

A
  • provide time-resolved insights into protein behavior
  • more realistic analysis of proteins
  • atoms are treated as classical particles (atoms treated as hard spheres)

– involves:
1. simulation of atomic movement
2. visualization and analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
75
Q

Simulation of Atomic Movement

A
  • MD computes trajectories of atoms over time scales of femtoseconds to microseconds.
  • It can capture both small-scale vibrations and large-scale conformational changes.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
76
Q

Visualization and Analysis

A
  • Provides detailed information on atomic interactions and energy changes.
  • Enables the study of mechanisms at an atomic level
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
77
Q

MD simulations provide more realistic analysis of proteins through..

A
  1. refinement of predicted structures
  2. Studying Intrinsically Disordered Proteins
  3. Folding and Misfolding Pathways
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
78
Q

Refinement of Predicted Structures (MD)

A
  • MD helps minimize energy and relax structures obtained from modeling.
  • Improves accuracy by accounting for environmental effects
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
79
Q

Studying Intrinsically Disordered Proteins (MD)

A
  • MD captures the flexible nature of disorder regions.
  • Aids in understanding functions that depend on disorder
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
80
Q

Folding and Misfolding Pathways (MD)

A
  • Simulates the folding process to identify intermediates.
  • Investigates misfolding mechanisms relevant to diseases.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
81
Q

classical mechanics

A
  • Describes the motion of macroscopic objects
  • Assumes particles have well-defined positions and velocities
  • Governed by Newton’s Laws of Motion
    ** atoms are treated as hard spheres
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
82
Q

Quantum Mechanics

A
  • Necessary for describing behavior at atomic and subatomic scales
  • Accounts for wave-particle duality, uncertainty principle, proton tunneling
  • Electrons exhibit quantum behavior that cannot be captured classically
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
83
Q

Classical approximation impacts…

A

Nuclei →
- Nuclei (protons and neutrons) are much heavier than electrons.
- Their de Broglie wavelengths are very small, making quantum effects less significant
- At RT, thermal energies dominate over quantum zero-point energies.

Electrons →
- not explicitly simulated in classical MD.
- Their effect are included implicitly through potential energy functions (force fields).
- The electronic structure is assumed to remain in the ground state during simulation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
84
Q

suitable systems vs limitations of classical approximations

A

Suitable Systems:
- Biological macromolecules (protein, nucleic acids, lipids)
- Materials where electronic excitations are not critical.
- Processes where bond breaking/forming does not occur.

Limitations:
- cannot accurately simulate chemical reactions involving electronic transitions.
- Quantum phenomena like tunneling and zero-point energy are not captured.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
85
Q

Newton’s Second Law

A

The acceleration of an object is directly proportional to the net force acting on it and inversely proportional to its mass
– given atomic forces, we can calculate atomic movements
(F = ma)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
86
Q

forces (NSLoM)

A
  • the negative gradients of potential energy
    – potential energy is dependent on positions of all atoms

– determines accelerations and thus motion of atoms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
87
Q

Time evolution of the system

A
  • computed by integrating equations of motion
  • Continuous motion approximated using discrete time steps
    – Determine forces
    – Move a small amount forward in time
    – Repeat
  • Time step length determines how “smooth” the animation/trajectory
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
88
Q

stepwise of molecular simulations computing an atomistic trajectory

A
  1. 3d coordinates of atoms in the system
  2. atoms exert forces on each other
  3. using Newton’s equation of motion, we can predict their movement
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
89
Q

integration algorithms

A
  1. Numerical Solution:
    - Approximate the continuous equations of motion using discrete time steps
  2. Update Position and Velocities:
    - Calculate the new positions and velocities of particles based on current forces.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
90
Q

Challenges Addressed by Integration Algorithms

A
  1. Stability: prevent numerical errors from accumulating over many time steps
  2. Accuracy: ensure that the trajectories closely follow the true physical behavior.
  3. Efficiency: balance computational speed with the precision of the simulation.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
91
Q

Common Integration Algorithms

A
  1. Verlet: uses current and previous positions to calculate the next position.
  2. Velocity Verlet: an extension of the Verlet algorithm that explicitly calculates velocities.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
92
Q

time step length

A
  • determines how smooth the trajectory
  • smaller time steps lead to more calculations to simulate same amount of time
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
93
Q

force fields

A
  • used to compute energies and atomic forces
  • sets of equations that describe the potential energy of a molecule based on atomic positions
  • based on dynamics of bond lengths, bond angles, and dihedral angles
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
94
Q

chemical bonds

A
  • behave like springs
  • Two spheres (atoms) connected by a single spring
  • The spring resists changes in the distance between the two atoms
  • bond vibrations are seen as harmonic oscillators
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
95
Q

Spring constants

A
  • are determined by bond order and atom types
  • energy increases (k) in kcal/mol as bond length decreases
    — single > double > tripe
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
96
Q

Bond angles behave like…harmonic oscillators

A
  • Three balls connected by 2 springs forming an angle, with a “hinge” at the central atom.
    — We also have separate spring constants for bond angles.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
97
Q

dihedral angle

A
  • the angle between two planes formed by four sequentially bonded atoms (A-B-C-D)
  • the angle between these two planes.
  • describes the rotation around the bond between atoms B and C.

*** do not behave like springs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
98
Q

Dihedrals VS Bonds and Angles

A

Bonds and Angles:
- govern local geometry (bond lengths/angles) using quadratic (harmonic) potentials that favor specific distances and angles

Dihedrals:
- govern torsional or rotational flexibility around bonds, typically using periodic and multi-well potentials to allow for multiple stable conformations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
99
Q

dihedral potentials

A
  • capture arbitrary functions with rotational symmetry.
    ex/ periodic energy functions with varying minima
  • can be modeled using custom fourier series
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
100
Q

fourier series

A
  • approximate functions as a sum of sine and cosine waves
  • approximate (any) symmetrical rotational energy function.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
101
Q

adding more sine and cosine terms for fourier series

A

improves the approximation

  • allows the series to closely match the original complex function
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
102
Q

Noncovalent Interactions Role in Molecular Assembly

A
  • Facilitate the organizations of molecules into complex structures
  • Determine the macroscopic properties of materials (e.g. solubility, melting points)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
103
Q

Noncovalent Interactions Importance in Biological Systems

A
  • Govern essential processes like enzyme-substrate binding, protein folding, and membrane formation
  • Critical for understanding biochemical pathways and drug design
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
104
Q

Why are Noncovalent Interactions crucial for MD?

A

While covalent bonds define the primary structure of molecules
— noncovalent interactions are pivotal for dictating how molecules interact.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
105
Q

Dispersion Forces

A

Nature:
- weak, attractive forces arising from instantaneous dipoles in molecules
Role:
- stabilize molecular assemblies by promoting close packing

C6 = dispersion coefficient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
106
Q

Repulsion Forces

A

Nature:
- Strong, short-range forces due to overlapping electron clouds.
Role:
- Prevent atoms from collapsing into each other, maintaining molecular integrity

C12 = repulsion coefficient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
107
Q

Combined van der Waals Potential

A

Van der Waals forces are modeled using the Lennard-Jones potential
— captures both the attractive and repulsive aspects of noncovalent interactions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
108
Q

Electrostatic forces decay

A
  • decay as 1/r, making them significant over longer distances compared to van de Waals forces

*** Electrostatic Interactions Drive Charged and Polar Molecule Behavior

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
109
Q

what makes up the complete force field?

A

bonded and non-bonded interactions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
110
Q

parameterizing force fields starts with

A

Begins with Quantum Mechanical Data for Smalls Molecules

  1. QM calculations
  2. data utilization
  3. small molecule focus for simplicity and accuracy
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
111
Q

Role of Quantum Mechanics in Parameterizing Force Fields

A

QM Calculations:
- provides high-accuracy data on molecular geometries, energetics, and electronic distributions

Data Utilization:
- QM data inform the selection and tuning of force field parameters to ensure they reflect true molecular behavior.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
112
Q

Small Molecule Focus for Parameterizing Force Fields

A

Simplicity:
- Smaller molecules have fewer atoms and simpler interactions, making QM calculations more manageable.

Accuracy:
- QM methods (e.g. Density Functional Theory, Hartree-Fock) yield precise information essential for initial parameterization.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
113
Q

Complexity of Proteins in Force Field Parameterization

A

Size & Structure:
- protein consists of hundreds to thousands of atoms with intricate 3D structures.

Diverse Interactions:
- include a variety of noncovalent interactions, such as hydrogen bonds, ionic bonds, hydrophobic interactions, and van der Waals forces.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
114
Q

Limitations of QM for Large Systems for Force Field Parameterization

A

Computational Cost:
- QM calculations become computationally prohibitive for large biomolecules like proteins.

Alternative Strategies:
- Utilize QM data from representative small segments or use empirical and semiempirical methods.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
115
Q

Types of Experimental Data –>(Experimental Data is crucial for Refining Force Field Parameters)

A
  1. Spectroscopic Data:
    - Infrared (IR), Nuclear Magnetic Resonance (NMR), and Raman spectroscopy provide insights into bond vibrations and molecular geometries.
  2. Crystallography:
    - X-ray crystallography offers precise information on atomic positions and molecular conformations.
  3. Thermodynamic Measurements:
    - Data on melting points, boiling points, and solvation energies inform interaction strengths.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
116
Q

Parameters Optimization

A

Fitting Process:
- adjusts force field parameters to minimize discrepancies between simulations results and experimental observations.

Validation Metrics:
- use root-mean-square deviations (RMSD), binding affinities, and structural stability as benchmarks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
117
Q

Fitting Force Field Parameters to Experimental Data…

A

Ensures Realistic Simulations
– uses parameter adjustment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
118
Q

Parameter Adjustment for Fitting Force Field Parameters

A

Process:
- fine-tune force field parameters to minimize discrepancies between simulations outcomes and experimental observations.

Techniques:
- use of optimizations algorithms and statistical methods to achieve best-fit parameters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
119
Q

Challenges in Parameterizing Force Fields for Proteins

A
  1. High Dimensionality
  2. Diverse Chemical Environments
  3. Dynamic Conformational Changes
  4. Long-Range Electrostatic Interactions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
120
Q

high dimensionality challenge

A

Issue:
- proteins possess numerous degrees of freedom, making comprehensive parameterization computationally intensive.

Solution:
- utilize advanced optimization techniques and high-performance computing resources.

121
Q

Diverse Chemical Environments Challenge

A

Issue:
- Different regions of a protein (e.g. active sites, hydrophobic cores, experience varied chemical environments)

Solution:
- Develop region-specific parameters or use adaptive force fields that can account for environmental variations.

122
Q

Dynamic Conformational Changes challenge

A

Issue:
- proteins frequently undergo conformational shifts that must be accurately captured by the force field.

Solution:
- Incorporate flexible dihedral terms and ensure that parameters support a wide range of conformational states.

123
Q

Long-Range Electrostatic Interactions Challenge

A

Issue:
- Accurate modeling of electrostatics in large, charge systems is computationally demanding

Solution:
- Implement efficient algorithms like Particle Mesh Ewald (PME) and use approximations where appropriate.

124
Q

Summary of Force Field Parameterization Process →
Step-by-Step Process

A
  1. Quantum Mechanical Calculations:
    - obtain high-accuracy data for smell molecules and representative fragments
  2. Empirical Data Integration:
    - Incorporate experimental measurements to validate and refine parameters
  3. Parameter Optimization:
    - adjust force field parameters through iterative simulations and comparisons
  4. Advanced Techniques:
    - utilize machine learning, multi-scale modeling, and automated pipelines to enhance parameters accuracy and efficiency.
125
Q

Common Force Fields

A

*** Different force fields are tailored for specific types of molecules and applications

AMBER, CHARMM, OPLS

126
Q

AMBER

A

optimized for proteins and nucleic acids
– optimized for biomolecular interactions

127
Q

CHARMM

A
  • versatile, used for a wide range of biomolecules
  • known for its extensive parameter set, suitable for complex systems including proteins, lipids, and membranes
128
Q

OPLS

A
  • focuses on liquids and organic molecules
  • optimized for small molecules, organic compounds, and polymers, with emphasis on accurate non-bonded interactions
129
Q

Selection Criteria for Force Fields

A
  • Compatibility with the system being studied
  • Availability of parameters for the molecules of interest
130
Q

5,6,7,8-tetrahydrofolate (THF)

A
  • crucial for cell growth
  • Producing red blood cells
  • Synthesizing purines
  • Interconverting amino acids
  • Methylating tRNA
  • Generating and using formate
131
Q

Disrupting THF production

A
  • has a cascading effect on essential cellular processes, primarily affecting DNA and RNA synthesis and amino acid metabolism

***This is a useful process for drug design.

132
Q

DHFR

A
  • Dihydrofolate reductase (DHFR) is a crucial enzyme that produces THF from dihydrofolate (DHF)

DHF + NADPH → THF + NADP(+)

133
Q

DHFR uses

A

studied as an antibiotic (e.g. trimethoprim) and cancer (e.g. methotrexate) target

134
Q

DHFR conservation

A
  • complicates drug design
  • patient with a bacterial infection is prescribed a drug loosely targeting DHFR
    —- deleterious side effects

***Both proteins have high structural similarity, even around the active site

  • Bacteria and humans have similar structures, but their dynamics are different
    — must ensure drugs only bind to bacterial proteins by exploiting dynamics insights
135
Q

Simulating DHFR

A
  • provides insight into druggable conformations
  • explore various low-energy conformations that are, hopefully, similar to reality
  • Knowing conformations unique to bacteria allow us to design a small molecule that competitively inhibits DHFR
136
Q

before starting any molecular simulation…

A
  • need a starting structure
  • If our starting structure is very far away from our desired equilibrium, our simulations will take longer
  • NO static structure for experiment
137
Q

Things that could go wrong with using static structure from experiment

A
  • Low-quality experimental structures
  • Inaccurate computational predictions
  • High-energy conformations
  • Missing or incorrect cofactors

** wait for the protein to fold to study its dynamics

138
Q

experimental structures

A
  • Experimental structures offer the best option for their accuracy
  • PDB contains experimentally determined structures for thousands of proteins (not all equally suitable for simulations)

— Generally resolution preference: X-ray, Cryo-EM, NMR

138
Q

factors for choosing the best experimental structures

A
  • resolution
  • completeness
  • functional state
  • B-factors
139
Q

Resolution

A
  • refers to how well the atomic positions are determined
    – Resolution below 2.0 A is generally preferred for high-quality simulation
    – r-factors that are high indicate less structural accuracy
140
Q

Completeness

A

Flexible loops or disordered regions are often missing from the structure

141
Q

Functional State

A

Proteins can exist in different functional conformations: active vs inactive state, bound to ligands or unboard

142
Q

B-factors

A

Higher B-factors suggest more uncertainty in atom positions, which might make that part of the structure less reliable

143
Q

Simulations cannot have missing…

A

…residues (specific amino acid in protein)

  • It’s essential to fix chain breaks and missing loops before simulation

— dashed lines indicate unknown and missing info

144
Q

how to add missing residues

A

Missing atoms or residues can be added using modeling software like Modelleer
(protein model prediction programs)

145
Q

removing some components from PDB structures

A
  • components like ligands or non-essential ions should be removed
  • ligands, ions, or crystallization agents that are not physiologically relevant

***Distorts protein’s behavior in a simulated biological environment if not removed

146
Q

Correct protonation states

A
  • are essential for accurate simulations
  • Experimental structures often cannot resolve hydrogens, so we need to add them ourselves
147
Q

pH-sensitive residues

A

Protonation states of amino acids affect the charge distribution, which influences electrostatic interactions during the simulation

148
Q

Histidine (His, H)

A
  • pKa ~6.0
  • Protonation switching around pH 6-7
149
Q

Cysteine (Cys, C)

A
  • pKa ~8.3
  • Could form disulfide bonds in oxidizing environments
150
Q

Aspartic Acid (Asp, D)

A
  • pKa ~3.9
  • Affects interactions like salt bridges and hydrogen bonds
151
Q

Lysine (Lys, K)

A
  • pKa ~10.5
  • Can form ionic bonds with negatively charged residues
152
Q

Glutamic Acid (Glu, E)

A
  • pKa ~4.2
  • Glu’s protonation state affects electrostatic interactions
153
Q

Tyrosine (Tyr, Y)

A
  • pKa ~10.1
  • Hydrogen bonding and in enzyme active sites
154
Q

DHFR is localized in the cytoplasm, which contains a multitude of chemical species

A

ions, molecules, proteins, organelles, cytoskeleton, membranes

155
Q

how to balance computational feasibility with biological realism

A
  • Protein of interest (already prepared)
  • Water molecular at the appropriate temperature (310 K) and pressure (1 atm)
  • Cations (Na+ and K+) and anions (Cl-) at an ionic strength of 150 millimolar
  • Any cofactors (e.g. NADPH and folate for DHFR)
156
Q

realistic systems do not have…

A

walls
- solved with periodic boundary conditions (PBC)

157
Q

why do we have PBCs for MD simulations?

A
  • a protein in vivo will have lots of room to move around
    — could make box very large, but that is very costly
  • for this simulation, we have to apply force to keep molecules in the box
  • water molecules and proteins would bounce off these walls in an unphysical manner (edge effects)
  • PBC simulate infinite systems from a finite box
158
Q

periodic boundary conditions (PBC)

A
  • PBC simulate infinite systems from a finite box
    — We (virtually) place exact copies of our system all directions

Atoms that cross the box edge reappear on the other side; thus, do not have edge effects
— think PacMan game

159
Q

why are force fields parameterized?

A

to reproduce quantum chemical and experimental data

160
Q

minimum image convention (MIC)

A
  • ensures that an atom in the primary box only interacts with the closest image of another atom
  • Image atoms in adjacent boxes are used to calculate interactions across the boundaries
    (ensures correct interactions)
161
Q

Force field parameterization steps overall

A
  1. Generate structures and use quantum chemistry to compute energy and forces
  2. Optimize force field parameters until they reproduce the quantum chemistry data set
  3. Run MD simulations and predict experimental data (e.g. NMR, Raman spec, solvation energies, etc)
  4. Continue to optimize force field parameters to minimizing quantum chemistry and simulation prediction errors
162
Q

Force fields are dependent on…

A
  • Force fields are dependent on fitting data and simulation set up

– Force fields are not inherently compatible with each other (causes simulations to be unreliable)

  • Ex/ protein force fields and DNA force fields are set to different things (proteins and DNA/RNA types)

** therefore are compatible by design, or validated against experimental data

163
Q

Key factors for selecting a force field

A
  • System type: different force fields are optimized for specific systems
  • Accuracy VS speed: high accuracy force fields may require more computational resources
  • Compatibility: choose a force fields based on compatibility with available topology generators and the type of molecules in your simulations
164
Q

Topology files

A
  • define the molecular structure and interactions in the simulations
  • contains info on atom types, bonds, angles, dihedrals, and non-bonded interactions based on the chosen force field

*** essentially tells the program which force field parameters to use and where

165
Q

when is additional parameterization required?

A
  • Complex molecules and ligands requires parameterization and careful integration
  • Non-standard residues or ligands are not always included in standard fold field parameter sets
    —- require additional parameterization to ensure proper interactions in the simulation
166
Q

Energy minimization

A
  • necessary before running molecular dynamics simulations
  • adjust the initial structure to remove unfavorable atom positions and steric clashes that could cause instability during simulations

** Without minimization, high-energy configurations may lead to unrealistic results or early failures in the molecular dynamics simulations

167
Q

energy minimization and steric clashes

A
  • removes steric clashes and optimizes the initial geometry

— Steric clashes occur when atoms are too close, resulting in excessively high energy

— Energy minimization gently adjust the structures to lower the system’s energy

167
Q

Physics statistical at the molecular level involves 3 concepts

A
  1. Number of particles:
    - biological systems contain billions of atoms interacting simultaneously
  2. Thermal motion:
    - atoms and molecules are in constant motion due to thermal energy
  3. Uncertainty and variability:
    - exact positions and velocities of particles are inherently uncertain
168
Q

Observable properties

A

averages of atomistic behaviors on macroscopic and microscopic levels

169
Q

Microscopic VS Macroscopic levels

A

Microscopic level:
- individual atoms and molecules

Macroscopic level:
- bulk properties from collective behavior

170
Q

Atomistic system

A

stochastic (randomly determined), measurable properties are computed as averages.

171
Q

Statistical mechanics

A

uses statistical methods to relate microscopic proerties to macroscopic observables

172
Q

Macrostate

A
  • specifies the temp, pressure, volume, and number of particles of molecular systems
    — Large scale system that defines properties of molecular system
  • changing values of temp, pressure, volume, etc changes the macrostate
    *** essentially infinite number of macrostates
173
Q

ensemble

A

the collection of all possible microstates of a single macrostate

174
Q

microstate

A

a unique configuration defined by the positions and velocities of all particles
— a specific configuration of a system by knowing positions and velocities of all particles

174
Q

Accurate ensemble averages require…

A
  • require sampling every possible configuration
  • Longer simulation provide better sampling of microstates and their probabilities
    More accurate hydrogen bond distance estimate!
175
Q

multiple microstates (i.e. configurations) can have the same…

A

distance
– measure the weighted mean of the microstates
— used to compute expected value of ensemble

176
Q

Microcanonical Ensemble (NVE) →

A
  • Fixed number of particles (N)
  • Volume (V)
  • Energy (E)
177
Q

Canonical Ensemble (NVT) →

A
  • Fixed number of particles (N)
  • Volume (V)
  • Temperature (T)
178
Q

Isothermal-Isobaric Ensemble (NPT) →

A
  • Fixed number of particles (N)
  • Pressure (P)
  • Temperature (T)

*** most common

179
Q

What does constant temperature mean?

A

Remember: macrostate observables are ensembles averages

— The instantaneous temperature of microstates will fluctuate, but the ensemble average should be constant

*** There should be no net flow of energy!!!

*** Kinetic energy determines temperature

180
Q

Kinetic energy

A
  • determines temperature
  • Particle velocities determine kinetic energy
    — every particle does not have same velocity; they generally follow the Maxwell-Boltzmann distribution
181
Q

Most Probable Velocity

A

the velocity at which the peak of the distribution occurs

182
Q

Average Velocity

A

the mean velocity of all particles

183
Q

Temperature Dependence

A

higher temperatures shift the distribution toward higher velocities

184
Q

Thermostats

A

adjust the velocities of particles to increase or decrease the system’s kinetic energy → thereby controlling the temperature

185
Q

Berendsen thermostat

A

adjusts the velocities of all particles uniformly based on the current temperature and target temperature

– indicated by velocity scaling factor
—– Velocity scaling factor is computed by slowly/carefully scaling the current velocity based on the temperature deviation

186
Q

velocity scaling factor

A
  • computed by slowly/carefully scaling the current velocity based on the temperature deviation
  • prevents abrupt changes that could destabilize the simulation

— Simple velocity scaling does not generate a true canonical (NVT) ensemble; it cannot reproduce realistic temperature fluctuations

187
Q

particle collisions are…

A

mass dependent

188
Q

Berendsen thermostats VS Nose-Hoover Thermostat

A
  • Berendsen thermostats inaccurately models thermal energy transfer via particle collisions
  • Nose-Hoover thermostat uses momenta scaling provides realistic kinetic energy and thus temperature control
189
Q

Nose-Hoover thermostat

A
  • connect particle momenta to fictitious heat bath
  • Heat bath allows thermal energy to flow in and out of our simulation
  • Momenta scaling provides realistic kinetic energy and thus temperature control
  • dependent on Q ⇒ a “mass” coupling parameter that controls thermostat responsiveness
190
Q

Barostats and pressure

A
  • Barostats maintain desired pressure during simulations
  • Adjusts the volume of the simulation box to achieve and maintain target pressure
191
Q

pressure

A

directly proportional to density and temp

192
Q

NkBT

A
  • represents thermal energy of ideal gas
  • assumes non-interacting particles and elastic collisions
193
Q

{W}

A
  • virial corrections to real gas
  • corrects for intermolecular forces in pressure equation
194
Q

Berendsen Barostat

A
  • Gentle Pressure Stabilization
  • Same concept as Berendsen thermostat: Scale box volume based on pressure difference to target
  • atomic positions get scaled with box size
  • velocities do not get affected
  • using barostats, we can keep a consistent macrostate!!!
195
Q

WIth thermostats and barostats, we can…

A

keep a consistent macrostate!!!

196
Q

Initial configurations

A
  • are not in true thermodynamic equilibrium
  • starting structures often come from experiments not relevant for our simulations
  • After minimization, we run a short simulation to let the system adjust to the desired macrostate
197
Q

Why discard the initial relaxation/configurations?

A
  • We discard the initial relaxation as it is not our desired macrostate
  • Once macrostate variable(s) reach steady state, we are now sampling valid microstates
198
Q

Production simulation sampling

A
  • sample microstates from our desired macrostate
  • Ensemble averages improve with more simulation time by sampling more microstates

*** “Replicates” do not exists as it does experimental biology and chemistry

199
Q

Production simulation sampling timeline

A

NVT
- short simulation to relax to temperature of interest

NPT
- short simulation to relax to density of interest

NPT
- long simulation process

200
Q

Multiple shorter simulations or one long one?

A

multiple short simulations provides better sampling of microstates

201
Q

Random initial velocities

A

*** provide better change of sampling different microstates

  1. Simulation starts here on my potential energy surface (PES)
  2. Initial velocities send it in this direction
  3. There is a change that it never samples this minima
  4. Multiple simulations with random velocities reduces this chance
202
Q

Root Mean Square Deviation (RMSD)

A
  • measures the overall change in the structure during a simulation, tracking deviations from the starting conformation

— monitors global conformational changes

  • The difference between the coordinates represents the displacement of atom i from its reference position at time t
203
Q

Low VS High RMSD

A
  • Low RMSD → the structure is very similar to the reference structure (e.g., stable conformation)
  • High RMSD → indicates significant deviation, suggesting large structural changes or flexibility over time
204
Q

Root Mean Square Fluctuation (RMSF)

A
  • identifies regions of flexibility in the protein by calculating the fluctuation of each atom or residue

– Tracking Local Flexibility

  • This measures how much the atom is fluctuating around its mean, not relative to a reference structure
205
Q

High VS Low RMSF

A

High RMSF → value for an atom means that it fluctuates a lot, indicating flexibility (often seen in loops or solvent-exposed regions)

Low RMSF → atom remains relatively fixed in place, suggesting rigidity (common in well-ordered regions like helices or beta-sheets)

206
Q

Potential of Mean Force (PMF)

A
  • effective potential that governs the behavior of a system along a collective variable
  • A collective variable defines the progress of an interaction or molecular reaction

— common collective variables include distances between atoms, bond angles, or dihedral angles.

207
Q

1D potential energy surface

A
  • This shows you the average energy with respect to h
  • Bond length is a particular angstroms apart
  • Important: This is not a covalent bond, so it will not look like our spring model

*** Nature prefers to spend time in low-energy conformations

208
Q

Probability and energy

A
  • Probability and energy are intricately linked [ W(x) vs P(x) ]
    — display as opposite curve plots
209
Q

drug development

A

a complex, multi-stage process requiring significant time and resources
Many years and millions of dollars

210
Q

drug discovery pipeline

A
  1. Discovery and Preclinical Research
    – Potential drugs are identified and tested in non-human studies
    ***Computation is most helpful with the drug discovery stage
  2. Clinical Trials
    – Testing in human subjects to assess safety and efficacy
  3. Regulatory Approval
    – Evaluation by agencies like the FDA before the drug can be marketed
  4. Post-Marketing Surveillance
    – Ongoing monitoring after the drug is available to the public
211
Q

why identifying the right protein target is crucial for drug development?

A
  • crucial for developing effective and safe drugs
  • Proteins regulate nearly all cellular processes and drugs and inhibit or activate proteins to correct disease states

*** Target identification is accelerated with bioinformatics

212
Q

Criteria for selecting a protein target:

A
  • Disease Relevance: the protein plays a critical role in the disease mechanism
  • Druggability: target has a structure that allows it to bind with drug-like molecules
  • Specificity: Targeting the protein minimizes effects on healthy cells, reducing side effects
213
Q

importance of chemical space in drug discovery

A
  • Chemical space contains an astronomical number of possible compounds to explore
  • Effective drugs must bind to the target protein with sufficient affinity and specificity

***Estimated to be between 10^60 to 10^200 possible small organic molecules

We need methods to navigate chemical space and identify promising leads accurately and efficiently

214
Q

High-throughput screening (HTS)

A

allows testing of thousands of compounds against the target protein

215
Q

High-throughput screening (HTS) stepwise

A
  1. Library Preparation:
    - Collection of diverse compounds
  2. Assay Development:
    - Design of biological assays to measure compound activity against the target
  3. Screening:
    - Compounds are tested in miniaturized assays
  4. Data Analysis:
    - Identification of “hits” that show desired activity
216
Q

Virtual screening

A
  • evaluates vast libraries to identify potential leads efficiently
  • Experimental assays are still expensive, and limited to commercially available compounds

*** Instead, we can use computational methods to predict which compounds we should experimental validate
— virtual screening allows for screening of millions/billions of compounds allowing for expansion of the search space

217
Q

selective binding

A
  • binding to a protein is governed by thermodynamics (and kinetics)
  • Binding occurs when a compound/ligand interacts specifically with a protein

** reversible

218
Q

binding affinity and energy

A
  • determined by the Gibbs free energy change
  • the change in free energy when a ligand binds to a protein determines the binding process spontaneity
219
Q

gibbs free energy

A
  • Gibbs free energy combines enthalpy and entropy

Enthalpy (delta H) ⇒ accounts for energetic interactions
Entropy (delta S) ⇒ how much conformational flexibility changes

***Simulations capture free energy directions instead of treating enthalpy and entropy separately

220
Q

enthalpy

A

Enthalpy accounts for non covalent interactions
—- electrostatics, h-bonds, dipoles, pi-pi stacking

  • Ensemble differences in non covalent interactions provide binding enthalpy
221
Q

chemical interactions and e- densities

A
  • Molecular interactions are governed by their electron densities (Hohenberg-Kohn theorem)

** For a quantum system, if you know electron densities, then you know everything about that system

This is rather difficult, so we often use conceptual frameworks to explain trends (e.g., hybridization and resonance)

222
Q

Every noncovalent interaction can be described with this framework → (4)

A
  1. Coulomb’s law describes the interactions between charges
  2. Molecular geometry uniquely specifies an e- density
  3. Regions of increased electron density are associated with higher partial negative charges
  4. Electron are mobile and can be perturbed by external interactions/other electrons
223
Q

electrostatic forces

A
  • govern interactions between charged and polar regions
  • Charged molecules have a net imbalance between
    (+) charges in nuclei & (-) charges from electrons

*** leads to net electrostatic attractions or repulsions between different atoms

224
Q

electrostatic forces role in binding

A
  • Long-range interaction: can attract ligands to the binding site from a distance
  • Anchor points:
    — often serves as a key anchoring interactions in the binding site

~5 to 20 kcal/mol per interaction

225
Q

hydrogen bonds

A

Attraction between a (donor) hydrogen atom covalently bonded to an electronegative atom and another (acceptor) electronegative atom with a lone pair

226
Q

h-bonding role in binding

A
  • Specificity​: Precise orientation of the ligand
  • Stabilization​: Moderately strong interactions
  • Dynamic​: Allows for adaptability of ligands

*** strongest when the hydrogen, donor, and acceptor atoms are collinear

~2 to 7 kcal/mol per hydrogen bond

227
Q

Uneven electron distribution

A
  • creates partial charges and dipoles
  • lead to unequal distribution of electron density
  • results in regions or partial positive or partial negative charges
  • Consistent electron density spatial variation results in permanent dipoles
228
Q

uneven electron distribution role in binding

A
  • Directional binding: Highly directional, ensuring that the ligand aligns correctly
  • Flexibility: Can accommodate slight conformational changes

~0.01-1 kcal/mol per interaction

229
Q

Van der Waals forces

A
  • weak, non-directional interactions
  • Dispersion: Electrons in molecules are constantly moving, leading to temporary uneven distributions that induce dipoles in neighboring molecules
  • Induction: The electric field of a polar molecule distorts the electron cloud of a nonpolar molecule, creating a temporary dipole
230
Q

Van der Waals forces role in binding

A
  • Complementary fit​: Maximizes surface contact
  • Flexibility: Allows small conformational changes

~0.4 - 4 kcal/mol per interaction

231
Q

pi-pi interactions

A
  • involve stacking of aromatic rings
  • Noncovalent interactions between aromatic rings due to overlap of pi-electron clouds
232
Q

pi-pi interactions role in binding

A

Orientation: proper positioning of aromatics

Selectivity: recognition of ligands

~1 to 15 kcal/mol per interaction

233
Q

summing up all enthalpic contributions during a simulation…

A

provides our ensemble average

234
Q

Entropy

A
  • accounts for microstate diversity of a single macrostate
  • defined as S=kBln⁡Ω
    – where Ω = total # of microstates available to the system without changing the system state

***Entropy is “energy dispersion”
– Higher entropy implies greater microstate diversity for a given macrostate

235
Q

system state

A

can be arbitrarily defined and compared as
– Unbound ligand vs. bound ligand
– Unfolded protein vs. folded protein
– Liquid water at 300 K vs. 500 K

236
Q

Grid-based protein-ligand binding →

A
  • My macrostate (number of particles, temp, and pressure) remain constant
    — rearranges the ligands without binding to the receptor
  • N choose L grid sites
  • Number of ways to choose L grid sites out of N is the binomial coefficient
    *** Smaller grid (with same size site) is decreased entropy
237
Q

How does entropy change?

A
  • Depends (increase, no change, decrease) on ligand concentration!!!
  • How to interpret this: Pick a number of ligands and move to the right (L - 1), does entropy go up or down?
238
Q

for protein ligand binding, we must account for…

A

*** For protein-ligand binding, we need to account for how the number of accessible microstates/configurations for protein and ligand

  • after that point, can run molecular simulations of different states
239
Q

partition function (Z)

A
  • Partition functions of protein, ligand, and complex are vastly different
  • Z is related to the number of all accessible microstates

*** many practical limitations to sampling all microstates

240
Q

What if we slowly disappear the ligand? (for sampling all microstates)

A
  • This has several advantages:
    – More relevant conformational sampling
    – Can run independent simulations in parallel
    – Focuses on taking differences with smaller numbers

***This technique is generally called alchemical simulations

241
Q

alchemical parameter

A
  • controls our protein-ligand interactions
  • 1 = interactions are normal
  • 0 = no intermolecular interactions are on
    – Intramolecular interactions are left alone
242
Q

Alchemical simulations limitation

A
  • VERY expensive

*** Use “docking” to more efficiently screen molecule before (if ever) doing alchemical simulations

243
Q

Alchemical simulations precision

A
  • Compute energy changes by gradually transforming one molecule into another
    – highly precise, offering detailed insights into binding affinities for drug design
244
Q

Why are alchemical simulations computational expensive?

A
  • Atomistic forces:
    — computes forces for all atoms in proteins, ligands, cofactors, ions, solvents for millions of structures
  • Detailed sampling:
    — captures a wide range of conformations, which adds more dimensions to the calculation
    Alchemical parameters:
    — simulations must be performed at various alchemical parameters

*** ~ 10,000 CPU hour
(417 days on 1 core)

245
Q

docking

A
  • Avoid sampling all microstates and determine one “optimal” protein-ligand structure ⇒ using this bound structure, predict a “score” that is correlated to binding affinity
  • simplifies the binding free energy prediction problem to enhance speed
  • efficient by avoiding sampling all microstates and determining one “optimal” protein-ligand structure
246
Q

Significance of Protein Conformation in Docking

A
  • Protein-ligand interactions are highly-dependent on the protein’s 3D structure
  • Using an inappropriate protein conformation can lead to inaccurate docking results
247
Q

challenges of docking

A
  1. Conformational Flexibility:
    - Proteins are not rigid structures; they exhibit movements ranging from side-chain rotations to large domain motions
  2. Impact on Binding Sites:
    - The shape and properties of the binding site can change, affecting ligand binding affinity and specificity.
  3. Limited Experimental Structures:
    - Crystallography and NMR provide snapshots of protein conformations but may not capture all relevant states.
248
Q

Sources of Protein Conformational Data

A

Experimental Methods:
- X-ray Crystallography:
Provides high-resolution structures but may miss dynamic conformations.
- NMR Spectroscopy: Captures ensembles of conformations but is limited to smaller proteins.

Computational Techniques:
- Molecular Dynamics (MD) Simulations: Explore the conformational space over time.
- Normal Mode Analysis (NMA): Identifies collective motions in proteins.
- Ensemble Generation Methods: Generate multiple protein conformations for docking.

249
Q

Experimental Structure Selection Criteria →

A

Resolution and Quality
– Prefer structures with higher resolution (e.g., <2.5 Å).
– Assess reliability using R-factors and validation reports.

Ligand-Bound vs. Apo Structures
– Ligand-Bound (Holo) Structures: Provide direct insight into binding site conformation.
– Apo Structures: May reveal binding site flexibility in the absence of ligands.

Relevance to Target Ligand
– Choose structures co-crystallized with ligands similar to those of interest.

250
Q

Molecular Dynamic Simulations for Conformational Sampling

A
  • Extract representative structures using clustering algorithms
  • Identify conformations with open or closed binding sites
251
Q

Importance of Water Molecules

A
  • Role in binding: structured water molecules can mediates interactions between the protein and ligand
  • Inclusion Criteria: retain water molecules that are conserved across multiple crystal structures
252
Q

handling water in docking

A
  • Some docking programs allow explicit water molecules in the binding site
  • Alternatively, consider their effect implicitly in scoring functions
253
Q

binding pocket detection for docking

A
  • The binding pocket is the specific region where a ligand interacts with a protein

** Accurate identification of binding pockets is essential for successful docking and virtual screening.

254
Q

Binding pocket

A

a cavity that can accommodate a ligand

255
Q

Protein Surface Characteristics →

A
  • Convex Regions: Typically inaccessible to ligands.
  • Concave Regions (Cavities): Potential binding sites.
255
Q

Classification of Binding Pockets →

A
  1. Orthosteric Sites
  2. Allosteric Sites
  3. Cryptic Sites
256
Q

Orthosteric Sites

A

The primary active site where endogenous ligands bind.

257
Q

Allosteric Sites

A

Secondary sites that modulate protein function upon ligand binding.

258
Q

Cryptic Sites

A

Binding pockets not apparent in the unbound protein structure but form upon ligand binding or conformational change.

259
Q

Geometry-Based Pocket Detection Technique

A

alpha shape theory
– uses Delaunay triangulation and alpha complexes to define cavities

260
Q

alpha shape theory

A

alpha spheres touch certain about of atoms (3 atoms only); cannot put any spheres on the outside in protein land
Shows pockets based on how many spheres it is touches (group spheres placed in open spaces and indicate it as a pocket)

261
Q

Grid-Based Pocket Detection

A

Methodology
1. Overlay a 3D grid on the protein structure.
2. Classify grid points as inside, outside, or on the surface.

Pocket Identification
– Clusters of surface grid points forming concave regions indicate potential pockets.

262
Q

Detecting Cryptic Binding Sites

A
  • Cryptic sites are hidden in the unbound structure and require conformational changes to become apparent

Strategies →
– Used enhanced sampling MD methods like metadynamics
– Apply pocket detection to multiple conformations

263
Q

ligand poses importance

A
  • Precise ligand poses are crucial for reliable predictions of binding affinity and activity.
  • Incorrect poses can lead to false negatives or positives, misguiding drug development efforts.

*** aka accurate docking

264
Q

ligand pose

A

The specific orientation and conformation of a ligand within the binding site of a target protein.

265
Q

Ligand Pose Optimization

A

Optimization Goal →
– Identify the energetically most favorable pose that closely represents the true binding mode.

Key Components →
1. Orientation: Position and alignment within the binding pocket.
2. Conformation: Internal geometry, including bond angles, lengths, and torsions.

266
Q

4 types of search strategies for docking

A

Systematic, stochastic, empirical, machine learning

267
Q

systematic searches

A
  • numerically iterate over all possible conformations

– Identify important degrees of freedom
– Scan along each angle with a step size of N degrees
– Remove structures with high strain

*** only possible for very small molecules = not used often!

268
Q

Stochastic searches

A
  • random sampling (Monte Carlo)
  • provide better balance of sampling and cost
  • can utilize conformer libraries (pre-generated)

Steps:
1. Generate conformation
2. Compute energy change
3. If energy change less than a random sample: make move
4. Repeat

***Allows us to sample efficiently!

269
Q

scoring functions

A
  • parameterized models to estimate binding affinity after docking
  • Physics-based methods using force-field like methods
  • Machine learning (graphing neural networks) have been gaining traction recently
270
Q

Phenotypic drug screening

A
  • involves testing compounds on an organism level to identify potential leads

ex/ drug screening on an antibiotic-resistant bacterial strain to identify potential new leads

271
Q

Ligand-based drug design (LBDD)

A
  • relies on the properties of known bioactive compounds to guide drug discovery
  • Does not require the structure of the target protein, making it useful when this is unknown
272
Q

motivations and assumptions of LBDD

A
  • Motivation: If we find compounds with little bioactivity, we can use LBDD to find compounds with similar chemical features to improve specific outcomes
  • Assumption: Similar structures can lead to similar—hopefully improved—biological effects
273
Q

structure-based VS ligand-based drug design

A

Structure-Based Drug Design:
1. Requires 3D structure of the target protein.
2. Uses the binding site structure to model potential interactions.
3. Often employs docking and molecular simulations.

Ligand-Based Drug Design:
1. Requires no structural information of the target.
2. Uses the chemical structure and activity of known ligands as guides.
3. Relies on molecular similarity rather than direct binding predictions.

274
Q

molecular descriptors

A
  • used to numerically encode chemical properties
  1. molecular weight
  2. LogP
  3. molar refractivity
  4. TPSA
  5. # of rotatable bonds
275
Q

molecular weight

A
  • indicates the overall size of the molecule
  • Impacts drug distribution and elimination rates in the body
276
Q

LogP

A
  • measures lipophilicity (chemical compound’s ability to dissolve in lipids, fats, oils, and non-polar solvents)
  • Influences a molecule’s ability to cross cell membranes and affects absorption and bioavailability
277
Q

Molar Refractivity

A
  • relates to polarizability and electron cloud distribution

*Affecting intermolecular interactions and binding affinity

278
Q

TPSA

A
  • estimates the molecule’s ability to form hydrogen bonds

*impacting solubility and permeability across biological membranes

279
Q

Number of rotatable bonds

A
  • reflects molecular flexibility

*influences binding affinity and oral bioavailability

280
Q

Phenylephrine

A

a synthetic compound that acts as a vasoconstrictor by stimulating alpha-adrenergic receptors

**Molecules can have similar properties, with slight structural differences causing widely different functions

281
Q

Dopamine

A

a naturally occurring neurotransmitter in the brain and interacts with dopamine receptors

**Molecules can have similar properties, with slight structural differences causing widely different functions

282
Q

Extended connectivity fingerprints (ECFPs)

A
  • encode structural features into numerical representations
  • utilize hash functions to encode chemical information (transform info into a numerical format for computers)
283
Q

hashing for molecular fingerprints stepwise

A
  • Hash functions are used to encode chemical information
  1. For each additional iteration of n, incorporate the hashes of connected atoms that are n bonds away.
  2. Then encode the atom IDs that are exactly one bond away
  3. Repeat for all atoms while hashing n-1 IDs
  4. Each iteration encodes local chemical info into each atom’s ID
    — repeat the process for large n, which captures more chemical info at a (small) computational cost
284
Q

atom ids for molecular fingerprints

A
  • We keep track of atom IDs at each iteration to encode multiple “levels” of chemical information

*** Similar structural features will share atom IDs until our iteration starts incorporating different structural features

285
Q

bit arrays

A
  • fixed-length collections of ones and zeros

** allow for efficient operations

  • Atoms are encoded into a bit array to store a collection of atom IDs
286
Q

Converting atom IDs to bit arrays →

A
  1. Decide on length of bit array, for example, 1024 and fill with zeros
  2. Divide each atom ID by the length of the array and determine the remainder
  3. Set the value of the bit array at that index to 1
287
Q

Tanimoto similarity

A
  • compares the ECFPs between two molecules
  • formula measures the ratio of the shared features to the total number of unique features between two molecules.

TS = c / a + b - c
(bits set to vectors a,b,c)

288
Q

Molecular similarity

A

The concept that similar molecules often show similar biological effects.
(Tanimoto)

289
Q

QSAR models

A
  • link chemical structure with biological activity

Purpose: To predict the biological activity of molecules based on their structure.

Motivation:
- Reduces the need for experimental screening.
- Helps identify potential drugs quickly and cost-effectively.

2 types:
- linear and nonlinear

290
Q

Types of QSAR Models

A
  1. Linear Models: Simple, interpretable, e.g., linear regression.
  2. Nonlinear Models: Capture complex relationships, e.g., neural networks.
291
Q

QSAR model systematic steps

A
  1. Data Collection: Gather biological activity and molecular data.
  2. Descriptor Calculation: Calculate numerical descriptors for each molecule.
  3. Model Selection and Training: Use machine learning to correlate descriptors with activity.
  4. Model Validation: Test model accuracy with independent datasets.
  5. Interpretation and Application: Use the model for predicting new molecules.
292
Q

Linear regression models

A
  • Linear regression models are simple but effective for QSAR analysis
    – Fits a linear relationship between descriptors and output
293
Q

pros/cons of linear regression models

A

Advantages: Easy to interpret.
Limitations: Limited to linear relationships; struggles with complex datasets

294
Q

Nonlinear models

A
  • capture complex relationships in QSAR data

Examples =
1. Neural Networks: Capture complex, nonlinear patterns in large datasets.
2. Random Forests: Effective for high-dimensional data, robust against overfitting.

295
Q

pharmacophore

A
  • the 3D arrangement of molecular features required for biological activity
  • defines the essential molecular features needed for biological activity

– Looks at H-bond acceptors/donors, cationic, anionic, hydrophobic, aromatic

296
Q

Building a pharmacophore model

A
  • requires multiple active compounds

Step 1:
- Align active molecules
- Identify common structural features
- Determine spatial relationships
- Consider multiple conformations

Step 2:
- Define feature locations
- Mark shared pharmacophoric points
- Establish distance constraints
- Set tolerance spheres