Computational Structural Biology Exam Flashcards

Question

building an e- density map

Answer 1

- reveals the distribution of electrons in the crystal, indicating where atoms are located - interpreted by fitting atomic models (e.g. amino acids for proteins) into density * Low-resolution data make it difficult to assign atomic positions precisely, leading to uncertainty in the model

Answer 2

- Crystals have the same repeating unit cell, which amplifies our signals If in solution, particles would be: -- Too sparse to diffract -- Moving and diffraction pattern would constantly change

Answer 3

How atomic nuclei interact with magnetic fields and radiofrequency pulses

Answer 4

- how molecules scatter electron beams - beam of high-energy electrons used instead of photons - no crystals used: The sample is sample is rapidly frozen in vitreous ice to preserve its native structure --- By freezing sample, the biological molecules are imaged in their native hydrated state.

Answer 5

- protein information database - Comprehensive database to access curated data about protein structures, functions, sequences, and annotation - Reviewed (Swiss-Prot): experts manually curated and verified these entries, ensuring high accuracy - Unreviewed (TrEMBL): these entities are automatically generated and have no been manually reviewed - entry ID's are unique identifiers for the proteins - Protein Data Bank contains structures (PDB)

Answer 6

- Have much shorter wavelength (~ 0.02 Å at 300 keV) than photons - Light elements which scatter electrons more effectively than X-rays

Answer 7

- main Cryo-EM technique used to determine the 3d structures of individual macromolecules - Millions of image of individual particles are collected from a thin layer - Particles are computationally aligned and classified into different orientations

Answer 8

1. flexibility and disorder 2. x-ray crystallography 3. Cryo-EM and conformational flexibility 4. Intrinsically Disordered Proteins (IDPs) 5. Conformational Heterogeneity and Biological Function

Answer 9

- Molecules are not static - Proteins often exhibit flexibility, disordered regions, and multiple confrontations Why it matters: structural techniques often require ordered/stable configurations

Answer 10

- Flexible or disordered regions do not pack into crystals well, often leading to failure in obtaining high-quality crystals - In cases where crystallization is successful, flexible or disordered regions do not show up clearly in e- density map - Crystals capture a single conformation of the molecule, often ignoring the flexibility or dynamic range

Answer 11

- strength of Cryo-Em is its ability to capture multiple conformational states of a molecule, providing insights into flexibility and structural heterogeneity - Challenge: that highly flexible or disordered molecules may appear as fuzzy or low-resolution regions in the final structure - Advanced computational techniques are required to sort out different conformations present in Cryo-EM data

Answer 12

lack a stable 3D structure under physiological conditions but are still functional, often gaining structure upon binding to partners

Answer 13

Many proteins function by switching between different conformations, which is essential for their activity (e.g. enzymes, transporters, and receptors) ex/ G-protein coupled receptors that adopt different conformations when bound to different ligands, triggering different cellular responses.

Answer 14

adopt different conformations when bound to different ligands, triggering different cellular responses.

Answer 15

Technical Limitations: -- Difficulty in capturing dynamic and flexible regions. Incomplete structures due to unresolved disordered regions. Biological Complexity: -- Dynamic conformational ensembles not represented in static snapshots Resource Constraints: -- Time-consuming and costly experiments

Answer 16

- Protein structure dictates intersections, signaling, and biochemical roles. - Experimental methods (x-ray, Cryo-EM) provide high-resolution structures but are resource-intensive and time-consuming

Answer 17

- Drug discovery: designing small-molecule inhibitors or antibodies that target specific protein conformations/ - Biotechnology: engineering proteins for industrial to therapeutic applications - Disease research: mutations causing structural defects linked to diseases like Alzheimer’s and cystic fibrosis.

Answer 18

- Advances in predictive accuracy are opening new frontiers in biology - integrating predictive models with experimental data is the way forward - Structure prediction complements genomics/transcriptomics to create a holistic understanding of biological function

Answer 19

1. conformational space 2. complex energy landscapes 3. flexibility and dynamics 4. environmental effects 5. post-translational modifications (PTMs) 6. methods are data-driven

Answer 20

- Proteins can adopt a large number of possible conformations. - Levinthal’s Paradox: a protein can’t sample all conformations in a biologically reasonable time, yet it folds quickly. -- Ex/ A protein with 100 amino acids, each capable of adopting about 3 torsion angles, results in ~3 ^100 possible conformations.

Answer 21

- A potential energy surface (PES) represents the energy of a system as a function of the positions of its atoms. -- Understands how the system’s energy changes upon reactions or movements -- Proteins fold to the lowest free-energy state, but this landscape is highly rugged. - Energy calculations are computationally intensive and depend on accurate force fields.

Answer 22

- Proteins are not static; they adopt multiple conformations (flexibility) based on their environment and interactions with other molecules - Some proteins/regions do not adopt a fixed 3D structure but remain disordered or flexible under physiological conditions.

Answer 23

- Proteins fold differently in different environments - Predictions need to capture interactions with solvent molecules, ions, and cofactors

Answer 24

PTMs such as phosphorylation, glycosylation, and methylation can alter protein folding and function Ex/ -- elF4E is a eukaryotic translation initiation factor involved in directing ribosomes to the cap structure of mRNAs -- Ser209 is phosphorylated by MNK1 -- AlphaFold3 accurately predicts changes when they’re already known.

Answer 25

Our predictions rely on similarity to known structures, but novel sequences or folds (for which no homologous structures exist) are difficult to predict accurately. -- Ex/ AlphaFold has made strides, but prediction de novo structures remain challenging, especially for proteins with no templates.

Answer 26

- predicts protein structures based on evolutionary relationships *** The main principle is that proteins with similar sequences tend to fold into similar structures. Common tools for homology modeling: MODELLER, SWISS-MODEL, Phyre2 -- most accurate when sequence identity to other proteins is high (>30%)

Answer 27

HMMs: statistical models representing sequences using probabilities for matches/indels (probabilistic states) - capture evolutionary patterns in proteins - predicts outcomes based on transitional probabilities - captures more robust alignments - include info on hidden states

Answer 28

1. start with a multiple sequence alignment 2. indels can be modeled 3. occupancy and amino acid frequency at each position in the alignment are encoded 4. profile created

Answer 29

1. hidden states 2. match states 3. insertion states 4. deletion states

Answer 30

represent the underlying biological events that are not directly observable

Answer 31

conserved positions in the sequence

Answer 32

- Insertion states: positions where extra residues are added - Deletion states: positions where residues are missing

Answer 33

a tool that uses HMMs to search databases for sequence that match a given profile HMM (homology) -- Used to find homologous sequences, identifying evolutionary relationships across protein families

Answer 34

automated protein structure homology-modelling platform for generating 3D models of a protein using a comparative approach. *** novel proteins are very challenging

Answer 35

- In cases where sequence similarity to known structures is low (<30%), homology modeling becomes unreliable. - Threading matches sequences to known structural folds based on structural rather than sequence similarity *** Phyre2, RaptorX, MUSTER, and I-TASSER are commonly used for threading and takes much longer than homology modeling.

Answer 36

- sequences - LOMETS threading --- template - template fragments for structure assembly - clustering --- cluster centroid - structure re-assembly - lowest E structure --- final model - TM align search - PDB library - structural analogy --- function prediction

Answer 37

- A contact map is a 2D representation of which residues are in close proximity - allow for visualization of residue interactions in proteins

Answer 38

- determined by spatial proximity, not sequence order, typically within a certain distance threshold - Residues on the diagonal are adjacent in sequence (and spatially) - residues far apart in the sequence can still be close in the 3D structure, reflected in contact map

Answer 39

- Traditional methods like homology modeling and threading rely on templates and known structures - ML predicts 3D structures only from sequenced data - AlphaFold (DeepMind) and RosettaFold (Baker Lab) lead the charge in this area.

Answer 40

- Developed by DeepMind *** predicts protein structures with atomic accuracy by using deep learning models trained on large structural datasets Breakthroughs: - AlphaFold 2 achieved near-experimental level accuracy in the 2020 CASP14 competition (critical assessment of protein structure prediction) - AlphaFold 3 (2024) predicts proteins, DNA, RNA, ligands, and post-translational modifications.

Answer 41

- Mutations in one residue often result in compensatory mutations in its interacting partner - This is observed across species through analysis of homologous protein sequences - Correlated mutations indicate functionally significant residue pairs

Answer 42

- helps predict which residues are close in the 3D structure - Residues showing correlated mutations are likely to be spatially close in the folded protein - This is particularly useful when no experimental structure is available.

Answer 43

- using large multiple sequence alignments (MSAs) from homologous proteins. - The more diverse the sequences in the MSA, the better the resolution of coevolving residues. - Evolutionary info from MSAs guides predictions for residue-residue contacts.

Answer 44

- Residues with a high score (i.e. coevolve) are near each other in the protein’s structure (i.e. small distance)

Answer 45

- Not all correlated mutations are due to direct physical interactions; some may be indirect. - Noise from data can come from random mutations or insufficient evolutionary diversity. - Large and diverse sequence data sets are needed for reliable coevolution predictions.

Answer 46

- AlphaFold and RosettaFold utilize coevolutionary data from MSAs to predict residue interactions. - incorporate evolutionary info along with structural features, leading to highly accurate predictions.

Answer 47

input sequence and MSA --> ML models ==> prediction of atomistic structure - Using MSAs and contact maps, DeepMind trained a model to predict protein structures -- Contact maps are converted into dihedral angles

Answer 48

Biggest change is the use of a diffusion model Diffusion models essentially learn to unscramble atoms into a structure. - supercharged for any biomolecule ** breakthrough but not a final solution -- caveat is that proteins are dynamic

Answer 49

- At least 40% of proteins have disordered regions - AlphaFold (and all other methods) struggle with disordered regions. LARP1

Answer 50

- proteins undergo movements like folding, unfolding, and domain motions. -- essential for binding, catalysis, and signal transduction. -- Understanding dynamics is crucial for drug design, protein design, biotech, etc. Protein structure determination and prediction provide fixed snapshots ***DO NOT capture the full range of functional conformations

Answer 51

- provide time-resolved insights into protein behavior - more realistic analysis of proteins - atoms are treated as classical particles (atoms treated as hard spheres) -- involves: 1. simulation of atomic movement 2. visualization and analysis

Answer 52

- MD computes trajectories of atoms over time scales of femtoseconds to microseconds. - It can capture both small-scale vibrations and large-scale conformational changes.

Answer 53

- Provides detailed information on atomic interactions and energy changes. - Enables the study of mechanisms at an atomic level

Answer 54

1. refinement of predicted structures 2. Studying Intrinsically Disordered Proteins 3. Folding and Misfolding Pathways

Answer 55

- MD helps minimize energy and relax structures obtained from modeling. - Improves accuracy by accounting for environmental effects

Answer 56

- MD captures the flexible nature of disorder regions. - Aids in understanding functions that depend on disorder

Answer 57

- Simulates the folding process to identify intermediates. - Investigates misfolding mechanisms relevant to diseases.

Answer 58

- Describes the motion of macroscopic objects - Assumes particles have well-defined positions and velocities - Governed by Newton’s Laws of Motion ** atoms are treated as hard spheres

Answer 59

- Necessary for describing behavior at atomic and subatomic scales - Accounts for wave-particle duality, uncertainty principle, proton tunneling - Electrons exhibit quantum behavior that cannot be captured classically

Answer 60

Nuclei → - Nuclei (protons and neutrons) are much heavier than electrons. - Their de Broglie wavelengths are very small, making quantum effects less significant - At RT, thermal energies dominate over quantum zero-point energies. Electrons → - not explicitly simulated in classical MD. - Their effect are included implicitly through potential energy functions (force fields). - The electronic structure is assumed to remain in the ground state during simulation.

Answer 61

Suitable Systems: - Biological macromolecules (protein, nucleic acids, lipids) - Materials where electronic excitations are not critical. - Processes where bond breaking/forming does not occur. Limitations: - cannot accurately simulate chemical reactions involving electronic transitions. - Quantum phenomena like tunneling and zero-point energy are not captured.

Answer 62

The acceleration of an object is directly proportional to the net force acting on it and inversely proportional to its mass -- given atomic forces, we can calculate atomic movements (F = ma)

Answer 63

- the negative gradients of potential energy -- potential energy is dependent on positions of all atoms -- determines accelerations and thus motion of atoms

Answer 64

- computed by integrating equations of motion - Continuous motion approximated using discrete time steps -- Determine forces -- Move a small amount forward in time -- Repeat - Time step length determines how “smooth” the animation/trajectory

Answer 65

1. 3d coordinates of atoms in the system 2. atoms exert forces on each other 3. using Newton's equation of motion, we can predict their movement

Answer 66

1. Numerical Solution: - Approximate the continuous equations of motion using discrete time steps 2. Update Position and Velocities: - Calculate the new positions and velocities of particles based on current forces.

Answer 67

1. Stability: prevent numerical errors from accumulating over many time steps 2. Accuracy: ensure that the trajectories closely follow the true physical behavior. 3. Efficiency: balance computational speed with the precision of the simulation.

Answer 68

1. Verlet: uses current and previous positions to calculate the next position. 2. Velocity Verlet: an extension of the Verlet algorithm that explicitly calculates velocities.

Answer 69

- determines how smooth the trajectory - smaller time steps lead to more calculations to simulate same amount of time

Answer 70

- used to compute energies and atomic forces - sets of equations that describe the potential energy of a molecule based on atomic positions - based on dynamics of bond lengths, bond angles, and dihedral angles

Answer 71

- behave like springs - Two spheres (atoms) connected by a single spring - The spring resists changes in the distance between the two atoms - bond vibrations are seen as harmonic oscillators

Answer 72

- are determined by bond order and atom types - energy increases (k) in kcal/mol as bond length decreases --- single > double > tripe

Answer 73

- Three balls connected by 2 springs forming an angle, with a “hinge” at the central atom. --- We also have separate spring constants for bond angles.

Answer 74

- the angle between two planes formed by four sequentially bonded atoms (A-B-C-D) - the angle between these two planes. - describes the rotation around the bond between atoms B and C. *** do not behave like springs

Answer 75

Bonds and Angles: - govern local geometry (bond lengths/angles) using quadratic (harmonic) potentials that favor specific distances and angles Dihedrals: - govern torsional or rotational flexibility around bonds, typically using periodic and multi-well potentials to allow for multiple stable conformations.

Answer 76

- capture arbitrary functions with rotational symmetry. ex/ periodic energy functions with varying minima - can be modeled using custom fourier series

Answer 77

- approximate functions as a sum of sine and cosine waves - approximate (any) symmetrical rotational energy function.

Answer 78

improves the approximation - allows the series to closely match the original complex function

Answer 79

- Facilitate the organizations of molecules into complex structures - Determine the macroscopic properties of materials (e.g. solubility, melting points)

Answer 80

- Govern essential processes like enzyme-substrate binding, protein folding, and membrane formation - Critical for understanding biochemical pathways and drug design

Answer 81

While covalent bonds define the primary structure of molecules --- noncovalent interactions are pivotal for dictating how molecules interact.

Answer 82

Nature: - weak, attractive forces arising from instantaneous dipoles in molecules Role: - stabilize molecular assemblies by promoting close packing C6 = dispersion coefficient

Answer 83

Nature: - Strong, short-range forces due to overlapping electron clouds. Role: - Prevent atoms from collapsing into each other, maintaining molecular integrity C12 = repulsion coefficient

Answer 84

Van der Waals forces are modeled using the Lennard-Jones potential --- captures both the attractive and repulsive aspects of noncovalent interactions.

Answer 85

- decay as 1/r, making them significant over longer distances compared to van de Waals forces *** Electrostatic Interactions Drive Charged and Polar Molecule Behavior

Answer 86

bonded and non-bonded interactions

Answer 87

Begins with Quantum Mechanical Data for Smalls Molecules 1. QM calculations 2. data utilization 3. small molecule focus for simplicity and accuracy

Answer 88

QM Calculations: - provides high-accuracy data on molecular geometries, energetics, and electronic distributions Data Utilization: - QM data inform the selection and tuning of force field parameters to ensure they reflect true molecular behavior.

Answer 89

Simplicity: - Smaller molecules have fewer atoms and simpler interactions, making QM calculations more manageable. Accuracy: - QM methods (e.g. Density Functional Theory, Hartree-Fock) yield precise information essential for initial parameterization.

Answer 90

Size & Structure: - protein consists of hundreds to thousands of atoms with intricate 3D structures. Diverse Interactions: - include a variety of noncovalent interactions, such as hydrogen bonds, ionic bonds, hydrophobic interactions, and van der Waals forces.

Answer 91

Computational Cost: - QM calculations become computationally prohibitive for large biomolecules like proteins. Alternative Strategies: - Utilize QM data from representative small segments or use empirical and semiempirical methods.

Answer 92

1. Spectroscopic Data: - Infrared (IR), Nuclear Magnetic Resonance (NMR), and Raman spectroscopy provide insights into bond vibrations and molecular geometries. 2. Crystallography: - X-ray crystallography offers precise information on atomic positions and molecular conformations. 3. Thermodynamic Measurements: - Data on melting points, boiling points, and solvation energies inform interaction strengths.

Answer 93

Fitting Process: - adjusts force field parameters to minimize discrepancies between simulations results and experimental observations. Validation Metrics: - use root-mean-square deviations (RMSD), binding affinities, and structural stability as benchmarks.

Answer 94

Ensures Realistic Simulations -- uses parameter adjustment

Answer 95

Process: - fine-tune force field parameters to minimize discrepancies between simulations outcomes and experimental observations. Techniques: - use of optimizations algorithms and statistical methods to achieve best-fit parameters

Answer 96

1. High Dimensionality 2. Diverse Chemical Environments 3. Dynamic Conformational Changes 4. Long-Range Electrostatic Interactions

Answer 97

Issue: - proteins possess numerous degrees of freedom, making comprehensive parameterization computationally intensive. Solution: - utilize advanced optimization techniques and high-performance computing resources.

Answer 98

Issue: - Different regions of a protein (e.g. active sites, hydrophobic cores, experience varied chemical environments) Solution: - Develop region-specific parameters or use adaptive force fields that can account for environmental variations.

Answer 99

Issue: - proteins frequently undergo conformational shifts that must be accurately captured by the force field. Solution: - Incorporate flexible dihedral terms and ensure that parameters support a wide range of conformational states.

Answer 100

Issue: - Accurate modeling of electrostatics in large, charge systems is computationally demanding Solution: - Implement efficient algorithms like Particle Mesh Ewald (PME) and use approximations where appropriate.

Answer 101

1. Quantum Mechanical Calculations: - obtain high-accuracy data for smell molecules and representative fragments 2. Empirical Data Integration: - Incorporate experimental measurements to validate and refine parameters 3. Parameter Optimization: - adjust force field parameters through iterative simulations and comparisons 4. Advanced Techniques: - utilize machine learning, multi-scale modeling, and automated pipelines to enhance parameters accuracy and efficiency.

Answer 102

*** Different force fields are tailored for specific types of molecules and applications AMBER, CHARMM, OPLS

Answer 103

optimized for proteins and nucleic acids -- optimized for biomolecular interactions

Answer 104

- versatile, used for a wide range of biomolecules - known for its extensive parameter set, suitable for complex systems including proteins, lipids, and membranes

Answer 105

- focuses on liquids and organic molecules - optimized for small molecules, organic compounds, and polymers, with emphasis on accurate non-bonded interactions

Answer 106

- Compatibility with the system being studied - Availability of parameters for the molecules of interest

Answer 107

- crucial for cell growth - Producing red blood cells - Synthesizing purines - Interconverting amino acids - Methylating tRNA - Generating and using formate

Answer 108

- has a cascading effect on essential cellular processes, primarily affecting DNA and RNA synthesis and amino acid metabolism ***This is a useful process for drug design.

Answer 109

- Dihydrofolate reductase (DHFR) is a crucial enzyme that produces THF from dihydrofolate (DHF) DHF + NADPH → THF + NADP(+)

Answer 110

studied as an antibiotic (e.g. trimethoprim) and cancer (e.g. methotrexate) target

Answer 111

- complicates drug design - patient with a bacterial infection is prescribed a drug loosely targeting DHFR ---- deleterious side effects ***Both proteins have high structural similarity, even around the active site - Bacteria and humans have similar structures, but their dynamics are different --- must ensure drugs only bind to bacterial proteins by exploiting dynamics insights

Answer 112

- provides insight into druggable conformations - explore various low-energy conformations that are, hopefully, similar to reality - Knowing conformations unique to bacteria allow us to design a small molecule that competitively inhibits DHFR

Answer 113

- need a starting structure - If our starting structure is very far away from our desired equilibrium, our simulations will take longer - NO static structure for experiment

Answer 114

- Low-quality experimental structures - Inaccurate computational predictions - High-energy conformations - Missing or incorrect cofactors ** wait for the protein to fold to study its dynamics

Answer 115

- Experimental structures offer the best option for their accuracy - PDB contains experimentally determined structures for thousands of proteins (not all equally suitable for simulations) --- Generally resolution preference: X-ray, Cryo-EM, NMR

Answer 116

- resolution - completeness - functional state - B-factors

Answer 117

- refers to how well the atomic positions are determined -- Resolution below 2.0 A is generally preferred for high-quality simulation -- r-factors that are high indicate less structural accuracy

Answer 118

Flexible loops or disordered regions are often missing from the structure

Answer 119

Proteins can exist in different functional conformations: active vs inactive state, bound to ligands or unboard

Answer 120

Higher B-factors suggest more uncertainty in atom positions, which might make that part of the structure less reliable

Answer 121

...residues (specific amino acid in protein) - It's essential to fix chain breaks and missing loops before simulation --- dashed lines indicate unknown and missing info

Answer 122

Missing atoms or residues can be added using modeling software like Modelleer (protein model prediction programs)

Answer 123

- components like ligands or non-essential ions should be removed - ligands, ions, or crystallization agents that are not physiologically relevant ***Distorts protein’s behavior in a simulated biological environment if not removed

Answer 124

- are essential for accurate simulations - Experimental structures often cannot resolve hydrogens, so we need to add them ourselves

Answer 125

Protonation states of amino acids affect the charge distribution, which influences electrostatic interactions during the simulation

Answer 126

- pKa ~6.0 - Protonation switching around pH 6-7

Answer 127

- pKa ~8.3 - Could form disulfide bonds in oxidizing environments

Answer 128

- pKa ~3.9 - Affects interactions like salt bridges and hydrogen bonds

Answer 129

- pKa ~10.5 - Can form ionic bonds with negatively charged residues

Answer 130

- pKa ~4.2 - Glu’s protonation state affects electrostatic interactions

Answer 131

- pKa ~10.1 - Hydrogen bonding and in enzyme active sites

Answer 132

ions, molecules, proteins, organelles, cytoskeleton, membranes

Answer 133

- Protein of interest (already prepared) - Water molecular at the appropriate temperature (310 K) and pressure (1 atm) - Cations (Na+ and K+) and anions (Cl-) at an ionic strength of 150 millimolar - Any cofactors (e.g. NADPH and folate for DHFR)

Answer 134

walls - solved with periodic boundary conditions (PBC)

Answer 135

- a protein in vivo will have lots of room to move around --- could make box very large, but that is very costly - for this simulation, we have to apply force to keep molecules in the box - water molecules and proteins would bounce off these walls in an unphysical manner (edge effects) - PBC simulate infinite systems from a finite box

Answer 136

- PBC simulate infinite systems from a finite box --- We (virtually) place exact copies of our system all directions Atoms that cross the box edge reappear on the other side; thus, do not have edge effects --- think PacMan game

Answer 137

to reproduce quantum chemical and experimental data

Answer 138

- ensures that an atom in the primary box only interacts with the closest image of another atom - Image atoms in adjacent boxes are used to calculate interactions across the boundaries (ensures correct interactions)

Answer 139

1. Generate structures and use quantum chemistry to compute energy and forces 2. Optimize force field parameters until they reproduce the quantum chemistry data set 3. Run MD simulations and predict experimental data (e.g. NMR, Raman spec, solvation energies, etc) 4. Continue to optimize force field parameters to minimizing quantum chemistry and simulation prediction errors

Answer 140

- Force fields are dependent on fitting data and simulation set up -- Force fields are not inherently compatible with each other (causes simulations to be unreliable) - Ex/ protein force fields and DNA force fields are set to different things (proteins and DNA/RNA types) ** therefore are compatible by design, or validated against experimental data

Answer 141

- System type: different force fields are optimized for specific systems - Accuracy VS speed: high accuracy force fields may require more computational resources - Compatibility: choose a force fields based on compatibility with available topology generators and the type of molecules in your simulations

Answer 142

- define the molecular structure and interactions in the simulations - contains info on atom types, bonds, angles, dihedrals, and non-bonded interactions based on the chosen force field *** essentially tells the program which force field parameters to use and where

Answer 143

- Complex molecules and ligands requires parameterization and careful integration - Non-standard residues or ligands are not always included in standard fold field parameter sets ---- require additional parameterization to ensure proper interactions in the simulation

Answer 144

- necessary before running molecular dynamics simulations - adjust the initial structure to remove unfavorable atom positions and steric clashes that could cause instability during simulations **** Without minimization, high-energy configurations may lead to unrealistic results or early failures in the molecular dynamics simulations

Answer 145

- removes steric clashes and optimizes the initial geometry --- Steric clashes occur when atoms are too close, resulting in excessively high energy --- Energy minimization gently adjust the structures to lower the system’s energy

Answer 146

1. Number of particles: - biological systems contain billions of atoms interacting simultaneously 2. Thermal motion: - atoms and molecules are in constant motion due to thermal energy 3. Uncertainty and variability: - exact positions and velocities of particles are inherently uncertain

Answer 147

averages of atomistic behaviors on macroscopic and microscopic levels

Answer 148

Microscopic level: - individual atoms and molecules Macroscopic level: - bulk properties from collective behavior

Answer 149

stochastic (randomly determined), measurable properties are computed as averages.

Answer 150

uses statistical methods to relate microscopic proerties to macroscopic observables

Answer 151

- specifies the temp, pressure, volume, and number of particles of molecular systems --- Large scale system that defines properties of molecular system - changing values of temp, pressure, volume, etc changes the macrostate *** essentially infinite number of macrostates

Answer 152

the collection of all possible microstates of a single macrostate

Answer 153

a unique configuration defined by the positions and velocities of all particles --- a specific configuration of a system by knowing positions and velocities of all particles

Answer 154

- require sampling every possible configuration - Longer simulation provide better sampling of microstates and their probabilities More accurate hydrogen bond distance estimate!

Answer 155

distance -- measure the weighted mean of the microstates --- used to compute expected value of ensemble

Answer 156

- Fixed number of particles (N) - Volume (V) - Energy (E)

Answer 157

- Fixed number of particles (N) - Volume (V) - Temperature (T)

Answer 158

- Fixed number of particles (N) - Pressure (P) - Temperature (T) *** most common

Answer 159

Remember: macrostate observables are ensembles averages --- The instantaneous temperature of microstates will fluctuate, but the ensemble average should be constant *** There should be no net flow of energy!!! *** Kinetic energy determines temperature

Answer 160

- determines temperature - Particle velocities determine kinetic energy --- every particle does not have same velocity; they generally follow the Maxwell-Boltzmann distribution

Answer 161

the velocity at which the peak of the distribution occurs

Answer 162

the mean velocity of all particles

Answer 163

higher temperatures shift the distribution toward higher velocities

Answer 164

adjust the velocities of particles to increase or decrease the system's kinetic energy → thereby controlling the temperature

Answer 165

adjusts the velocities of all particles uniformly based on the current temperature and target temperature -- indicated by velocity scaling factor ----- Velocity scaling factor is computed by slowly/carefully scaling the current velocity based on the temperature deviation

Answer 166

- computed by slowly/carefully scaling the current velocity based on the temperature deviation - prevents abrupt changes that could destabilize the simulation --- Simple velocity scaling does not generate a true canonical (NVT) ensemble; it cannot reproduce realistic temperature fluctuations

Answer 167

mass dependent

Answer 168

- Berendsen thermostats inaccurately models thermal energy transfer via particle collisions - Nose-Hoover thermostat uses momenta scaling provides realistic kinetic energy and thus temperature control

Answer 169

- connect particle momenta to fictitious heat bath - Heat bath allows thermal energy to flow in and out of our simulation - Momenta scaling provides realistic kinetic energy and thus temperature control - dependent on Q ⇒ a “mass” coupling parameter that controls thermostat responsiveness

Answer 170

- Barostats maintain desired pressure during simulations - Adjusts the volume of the simulation box to achieve and maintain target pressure

Answer 171

directly proportional to density and temp

Answer 172

- represents thermal energy of ideal gas - assumes non-interacting particles and elastic collisions

Answer 173

- virial corrections to real gas - corrects for intermolecular forces in pressure equation

Answer 174

- Gentle Pressure Stabilization - Same concept as Berendsen thermostat: Scale box volume based on pressure difference to target - atomic positions get scaled with box size - velocities do not get affected - using barostats, we can keep a consistent macrostate!!!

Answer 175

keep a consistent macrostate!!!

Answer 176

- are not in true thermodynamic equilibrium - starting structures often come from experiments not relevant for our simulations - After minimization, we run a short simulation to let the system adjust to the desired macrostate

Answer 177

- We discard the initial relaxation as it is not our desired macrostate - Once macrostate variable(s) reach steady state, we are now sampling valid microstates

Answer 178

- sample microstates from our desired macrostate - Ensemble averages improve with more simulation time by sampling more microstates *** “Replicates” do not exists as it does experimental biology and chemistry

Answer 179

NVT - short simulation to relax to temperature of interest NPT - short simulation to relax to density of interest NPT - long simulation process

Answer 180

multiple short simulations provides better sampling of microstates

Answer 181

*** provide better change of sampling different microstates 1. Simulation starts here on my potential energy surface (PES) 2. Initial velocities send it in this direction 3. There is a change that it never samples this minima 4. Multiple simulations with random velocities reduces this chance

Answer 182

- measures the overall change in the structure during a simulation, tracking deviations from the starting conformation --- monitors global conformational changes - The difference between the coordinates represents the displacement of atom i from its reference position at time t

Answer 183

- Low RMSD → the structure is very similar to the reference structure (e.g., stable conformation) - High RMSD → indicates significant deviation, suggesting large structural changes or flexibility over time

Answer 184

- identifies regions of flexibility in the protein by calculating the fluctuation of each atom or residue -- Tracking Local Flexibility - This measures how much the atom is fluctuating around its mean, not relative to a reference structure

Answer 185

High RMSF → value for an atom means that it fluctuates a lot, indicating flexibility (often seen in loops or solvent-exposed regions) Low RMSF → atom remains relatively fixed in place, suggesting rigidity (common in well-ordered regions like helices or beta-sheets)

Answer 186

- effective potential that governs the behavior of a system along a collective variable - A collective variable defines the progress of an interaction or molecular reaction --- common collective variables include distances between atoms, bond angles, or dihedral angles.

Answer 187

- This shows you the average energy with respect to h - Bond length is a particular angstroms apart - Important: This is not a covalent bond, so it will not look like our spring model *** Nature prefers to spend time in low-energy conformations

Answer 188

- Probability and energy are intricately linked [ W(x) vs P(x) ] --- display as opposite curve plots

Answer 189

a complex, multi-stage process requiring significant time and resources Many years and millions of dollars

Answer 190

1. Discovery and Preclinical Research -- Potential drugs are identified and tested in non-human studies ***Computation is most helpful with the drug discovery stage 2. Clinical Trials -- Testing in human subjects to assess safety and efficacy 3. Regulatory Approval -- Evaluation by agencies like the FDA before the drug can be marketed 4. Post-Marketing Surveillance -- Ongoing monitoring after the drug is available to the public

Answer 191

- crucial for developing effective and safe drugs - Proteins regulate nearly all cellular processes and drugs and inhibit or activate proteins to correct disease states *** Target identification is accelerated with bioinformatics

Answer 192

- Disease Relevance: the protein plays a critical role in the disease mechanism - Druggability: target has a structure that allows it to bind with drug-like molecules - Specificity: Targeting the protein minimizes effects on healthy cells, reducing side effects

Answer 193

- Chemical space contains an astronomical number of possible compounds to explore - Effective drugs must bind to the target protein with sufficient affinity and specificity ***Estimated to be between 10^60 to 10^200 possible small organic molecules We need methods to navigate chemical space and identify promising leads accurately and efficiently

Answer 194

allows testing of thousands of compounds against the target protein

Answer 195

1. Library Preparation: - Collection of diverse compounds 2. Assay Development: - Design of biological assays to measure compound activity against the target 3. Screening: - Compounds are tested in miniaturized assays 4. Data Analysis: - Identification of “hits” that show desired activity

Answer 196

- evaluates vast libraries to identify potential leads efficiently - Experimental assays are still expensive, and limited to commercially available compounds *** Instead, we can use computational methods to predict which compounds we should experimental validate --- virtual screening allows for screening of millions/billions of compounds allowing for expansion of the search space

Answer 197

- binding to a protein is governed by thermodynamics (and kinetics) - Binding occurs when a compound/ligand interacts specifically with a protein ** reversible

Answer 198

- determined by the Gibbs free energy change - the change in free energy when a ligand binds to a protein determines the binding process spontaneity

Answer 199

- Gibbs free energy combines enthalpy and entropy Enthalpy (delta H) ⇒ accounts for energetic interactions Entropy (delta S) ⇒ how much conformational flexibility changes ***Simulations capture free energy directions instead of treating enthalpy and entropy separately

Answer 200

Enthalpy accounts for non covalent interactions ---- electrostatics, h-bonds, dipoles, pi-pi stacking - Ensemble differences in non covalent interactions provide binding enthalpy

Answer 201

- Molecular interactions are governed by their electron densities (Hohenberg-Kohn theorem) **** For a quantum system, if you know electron densities, then you know everything about that system This is rather difficult, so we often use conceptual frameworks to explain trends (e.g., hybridization and resonance)

Answer 202

1. Coulomb’s law describes the interactions between charges 2. Molecular geometry uniquely specifies an e- density 3. Regions of increased electron density are associated with higher partial negative charges 4. Electron are mobile and can be perturbed by external interactions/other electrons

Answer 203

- govern interactions between charged and polar regions - Charged molecules have a net imbalance between (+) charges in nuclei & (-) charges from electrons *** leads to net electrostatic attractions or repulsions between different atoms

Answer 204

- Long-range interaction: can attract ligands to the binding site from a distance - Anchor points: --- often serves as a key anchoring interactions in the binding site ~5 to 20 kcal/mol per interaction

Answer 205

Attraction between a (donor) hydrogen atom covalently bonded to an electronegative atom and another (acceptor) electronegative atom with a lone pair

Answer 206

- Specificity: Precise orientation of the ligand - Stabilization: Moderately strong interactions - Dynamic: Allows for adaptability of ligands *** strongest when the hydrogen, donor, and acceptor atoms are collinear ~2 to 7 kcal/mol per hydrogen bond

Answer 207

- creates partial charges and dipoles - lead to unequal distribution of electron density - results in regions or partial positive or partial negative charges - Consistent electron density spatial variation results in permanent dipoles

Answer 208

- Directional binding: Highly directional, ensuring that the ligand aligns correctly - Flexibility: Can accommodate slight conformational changes ~0.01-1 kcal/mol per interaction

Answer 209

- weak, non-directional interactions - Dispersion: Electrons in molecules are constantly moving, leading to temporary uneven distributions that induce dipoles in neighboring molecules - Induction: The electric field of a polar molecule distorts the electron cloud of a nonpolar molecule, creating a temporary dipole

Answer 210

- Complementary fit: Maximizes surface contact - Flexibility: Allows small conformational changes ~0.4 - 4 kcal/mol per interaction

Answer 211

- involve stacking of aromatic rings - Noncovalent interactions between aromatic rings due to overlap of pi-electron clouds

Answer 212

Orientation: proper positioning of aromatics Selectivity: recognition of ligands ~1 to 15 kcal/mol per interaction

Answer 213

provides our ensemble average

Answer 214

- accounts for microstate diversity of a single macrostate - defined as S=kBln⁡Ω -- where Ω = total # of microstates available to the system without changing the system state ***Entropy is “energy dispersion” -- Higher entropy implies greater microstate diversity for a given macrostate

Answer 215

can be arbitrarily defined and compared as -- Unbound ligand vs. bound ligand -- Unfolded protein vs. folded protein -- Liquid water at 300 K vs. 500 K

Answer 216

- My macrostate (number of particles, temp, and pressure) remain constant --- rearranges the ligands without binding to the receptor - N choose L grid sites - Number of ways to choose L grid sites out of N is the binomial coefficient *** Smaller grid (with same size site) is decreased entropy

Answer 217

- Depends (increase, no change, decrease) on ligand concentration!!! - How to interpret this: Pick a number of ligands and move to the right (L - 1), does entropy go up or down?

Answer 218

*** For protein-ligand binding, we need to account for how the number of accessible microstates/configurations for protein and ligand - after that point, can run molecular simulations of different states

Answer 219

- Partition functions of protein, ligand, and complex are vastly different - Z is related to the number of all accessible microstates *** many practical limitations to sampling all microstates

Answer 220

- This has several advantages: -- More relevant conformational sampling -- Can run independent simulations in parallel -- Focuses on taking differences with smaller numbers ***This technique is generally called alchemical simulations

Answer 221

- controls our protein-ligand interactions - 1 = interactions are normal - 0 = no intermolecular interactions are on -- Intramolecular interactions are left alone

Answer 222

- VERY expensive *** Use “docking” to more efficiently screen molecule before (if ever) doing alchemical simulations

Answer 223

- Compute energy changes by gradually transforming one molecule into another -- highly precise, offering detailed insights into binding affinities for drug design

Answer 224

- Atomistic forces: --- computes forces for all atoms in proteins, ligands, cofactors, ions, solvents for millions of structures - Detailed sampling: --- captures a wide range of conformations, which adds more dimensions to the calculation Alchemical parameters: --- simulations must be performed at various alchemical parameters *** ~ 10,000 CPU hour (417 days on 1 core)

Answer 225

- Avoid sampling all microstates and determine one “optimal” protein-ligand structure ⇒ using this bound structure, predict a “score” that is correlated to binding affinity - simplifies the binding free energy prediction problem to enhance speed - efficient by avoiding sampling all microstates and determining one “optimal” protein-ligand structure

Answer 226

- Protein-ligand interactions are highly-dependent on the protein’s 3D structure - Using an inappropriate protein conformation can lead to inaccurate docking results

Answer 227

1. Conformational Flexibility: - Proteins are not rigid structures; they exhibit movements ranging from side-chain rotations to large domain motions 2. Impact on Binding Sites: - The shape and properties of the binding site can change, affecting ligand binding affinity and specificity. 3. Limited Experimental Structures: - Crystallography and NMR provide snapshots of protein conformations but may not capture all relevant states.

Answer 228

Experimental Methods: - X-ray Crystallography: Provides high-resolution structures but may miss dynamic conformations. - NMR Spectroscopy: Captures ensembles of conformations but is limited to smaller proteins. Computational Techniques: - Molecular Dynamics (MD) Simulations: Explore the conformational space over time. - Normal Mode Analysis (NMA): Identifies collective motions in proteins. - Ensemble Generation Methods: Generate multiple protein conformations for docking.

Answer 229

Resolution and Quality -- Prefer structures with higher resolution (e.g., <2.5 Å). -- Assess reliability using R-factors and validation reports. Ligand-Bound vs. Apo Structures -- Ligand-Bound (Holo) Structures: Provide direct insight into binding site conformation. -- Apo Structures: May reveal binding site flexibility in the absence of ligands. Relevance to Target Ligand -- Choose structures co-crystallized with ligands similar to those of interest.

Answer 230

- Extract representative structures using clustering algorithms - Identify conformations with open or closed binding sites

Answer 231

- Role in binding: structured water molecules can mediates interactions between the protein and ligand - Inclusion Criteria: retain water molecules that are conserved across multiple crystal structures

Answer 232

- Some docking programs allow explicit water molecules in the binding site - Alternatively, consider their effect implicitly in scoring functions

Answer 233

- The binding pocket is the specific region where a ligand interacts with a protein ** Accurate identification of binding pockets is essential for successful docking and virtual screening.

Answer 234

a cavity that can accommodate a ligand

Answer 235

- Convex Regions: Typically inaccessible to ligands. - Concave Regions (Cavities): Potential binding sites.

Answer 236

1. Orthosteric Sites 2. Allosteric Sites 3. Cryptic Sites

Answer 237

The primary active site where endogenous ligands bind.

Answer 238

Secondary sites that modulate protein function upon ligand binding.

Answer 239

Binding pockets not apparent in the unbound protein structure but form upon ligand binding or conformational change.

Answer 240

alpha shape theory -- uses Delaunay triangulation and alpha complexes to define cavities

Answer 241

alpha spheres touch certain about of atoms (3 atoms only); cannot put any spheres on the outside in protein land Shows pockets based on how many spheres it is touches (group spheres placed in open spaces and indicate it as a pocket)

Answer 242

Methodology 1. Overlay a 3D grid on the protein structure. 2. Classify grid points as inside, outside, or on the surface. Pocket Identification -- Clusters of surface grid points forming concave regions indicate potential pockets.

Answer 243

- Cryptic sites are hidden in the unbound structure and require conformational changes to become apparent Strategies → -- Used enhanced sampling MD methods like metadynamics -- Apply pocket detection to multiple conformations

Answer 244

- Precise ligand poses are crucial for reliable predictions of binding affinity and activity. - Incorrect poses can lead to false negatives or positives, misguiding drug development efforts. *** aka accurate docking

Answer 245

The specific orientation and conformation of a ligand within the binding site of a target protein.

Answer 246

Optimization Goal → -- Identify the energetically most favorable pose that closely represents the true binding mode. Key Components → 1. Orientation: Position and alignment within the binding pocket. 2. Conformation: Internal geometry, including bond angles, lengths, and torsions.

Answer 247

Systematic, stochastic, empirical, machine learning

Answer 248

- numerically iterate over all possible conformations -- Identify important degrees of freedom -- Scan along each angle with a step size of N degrees -- Remove structures with high strain *** only possible for very small molecules = not used often!

Answer 249

- random sampling (Monte Carlo) - provide better balance of sampling and cost - can utilize conformer libraries (pre-generated) Steps: 1. Generate conformation 2. Compute energy change 3. If energy change less than a random sample: make move 4. Repeat ***Allows us to sample efficiently!

Answer 250

- parameterized models to estimate binding affinity after docking - Physics-based methods using force-field like methods - Machine learning (graphing neural networks) have been gaining traction recently

Answer 251

- involves testing compounds on an organism level to identify potential leads ex/ drug screening on an antibiotic-resistant bacterial strain to identify potential new leads

Answer 252

- relies on the properties of known bioactive compounds to guide drug discovery - Does not require the structure of the target protein, making it useful when this is unknown

Answer 253

- Motivation: If we find compounds with little bioactivity, we can use LBDD to find compounds with similar chemical features to improve specific outcomes - Assumption: Similar structures can lead to similar—hopefully improved—biological effects

Answer 254

Structure-Based Drug Design: 1. Requires 3D structure of the target protein. 2. Uses the binding site structure to model potential interactions. 3. Often employs docking and molecular simulations. Ligand-Based Drug Design: 1. Requires no structural information of the target. 2. Uses the chemical structure and activity of known ligands as guides. 3. Relies on molecular similarity rather than direct binding predictions.

Answer 255

- used to numerically encode chemical properties 1. molecular weight 2. LogP 3. molar refractivity 4. TPSA 5. # of rotatable bonds

Answer 256

- indicates the overall size of the molecule * Impacts drug distribution and elimination rates in the body

Answer 257

- measures lipophilicity (chemical compound's ability to dissolve in lipids, fats, oils, and non-polar solvents) * Influences a molecule's ability to cross cell membranes and affects absorption and bioavailability

Answer 258

- relates to polarizability and electron cloud distribution *Affecting intermolecular interactions and binding affinity

Answer 259

- estimates the molecule’s ability to form hydrogen bonds *impacting solubility and permeability across biological membranes

Answer 260

- reflects molecular flexibility *influences binding affinity and oral bioavailability

Answer 261

a synthetic compound that acts as a vasoconstrictor by stimulating alpha-adrenergic receptors **Molecules can have similar properties, with slight structural differences causing widely different functions

Answer 262

a naturally occurring neurotransmitter in the brain and interacts with dopamine receptors **Molecules can have similar properties, with slight structural differences causing widely different functions

Answer 263

- encode structural features into numerical representations - utilize hash functions to encode chemical information (transform info into a numerical format for computers)

Answer 264

- Hash functions are used to encode chemical information 1. For each additional iteration of n, incorporate the hashes of connected atoms that are n bonds away. 2. Then encode the atom IDs that are exactly one bond away 3. Repeat for all atoms while hashing n-1 IDs 4. Each iteration encodes local chemical info into each atom’s ID --- repeat the process for large n, which captures more chemical info at a (small) computational cost

Answer 265

- We keep track of atom IDs at each iteration to encode multiple "levels" of chemical information *** Similar structural features will share atom IDs until our iteration starts incorporating different structural features

Answer 266

- fixed-length collections of ones and zeros ** allow for efficient operations - Atoms are encoded into a bit array to store a collection of atom IDs

Answer 267

1. Decide on length of bit array, for example, 1024 and fill with zeros 2. Divide each atom ID by the length of the array and determine the remainder 3. Set the value of the bit array at that index to 1

Answer 268

- compares the ECFPs between two molecules - formula measures the ratio of the shared features to the total number of unique features between two molecules. TS = c / a + b - c (bits set to vectors a,b,c)

Answer 269

The concept that similar molecules often show similar biological effects. (Tanimoto)

Answer 270

- link chemical structure with biological activity Purpose: To predict the biological activity of molecules based on their structure. Motivation: - Reduces the need for experimental screening. - Helps identify potential drugs quickly and cost-effectively. 2 types: - linear and nonlinear

Answer 271

1. Linear Models: Simple, interpretable, e.g., linear regression. 2. Nonlinear Models: Capture complex relationships, e.g., neural networks.

Answer 272

1. Data Collection: Gather biological activity and molecular data. 2. Descriptor Calculation: Calculate numerical descriptors for each molecule. 3. Model Selection and Training: Use machine learning to correlate descriptors with activity. 4. Model Validation: Test model accuracy with independent datasets. 5. Interpretation and Application: Use the model for predicting new molecules.

Answer 273

- Linear regression models are simple but effective for QSAR analysis -- Fits a linear relationship between descriptors and output

Answer 274

Advantages: Easy to interpret. Limitations: Limited to linear relationships; struggles with complex datasets

Answer 275

- capture complex relationships in QSAR data Examples = 1. Neural Networks: Capture complex, nonlinear patterns in large datasets. 2. Random Forests: Effective for high-dimensional data, robust against overfitting.

Answer 276

- the 3D arrangement of molecular features required for biological activity - defines the essential molecular features needed for biological activity -- Looks at H-bond acceptors/donors, cationic, anionic, hydrophobic, aromatic

Answer 277

- requires multiple active compounds Step 1: - Align active molecules - Identify common structural features - Determine spatial relationships - Consider multiple conformations Step 2: - Define feature locations - Mark shared pharmacophoric points - Establish distance constraints - Set tolerance spheres

Computational Structural Biology Exam Flashcards

(301 cards)