Exam III Flashcards

1
Q

How do we determine protein structures at high resolution?

A

X-ray crystallography
2D NMR spectroscopy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Advantages/Disadvantages of X-ray crystallography

A

High resolution structures

Static picture of the protein in a crystal (no dynamics)

  • if you can crystallize a protein
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Advantages/Disadvantages of 2D NMR spectroscopy

A

Dynamic picture of the protein in solution

Often not as high resolution

  • if you cannot crystallize a protein
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

X-ray crystallography

A
  • dependent on precise conditions for crystallizing proteins into repeating patterns
  • diffraction pattern reveal atomic structures
    – uses math to convert pattern into e- density map
    – fit atoms within density to get atomic model
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

how do we get an adequate signal for x-ray crystallography?

A
  • grow crystals to enable many copies of the molecule to be viewed at once
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Crystallizing protein difficulties

A
  1. Proteins are large and flexible. Sometimes it is difficult to crystallize.
  2. Very flexible proteins are especially difficult → entropic penalty
  3. Impurities can prevent crystallization
  4. Finding the right conditions to promote crystallization can be very challenging
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

crystallizing proteins

A
  • relies on nucleation and growth balance
  1. Proteins first nucleate (~100 molecules).
    – Solution conditions should not favor nucleation.
    – A whole bunch of nucleation sites → many crystals, not just one big one.
  2. After nucleation, the crystals grow.
    – Solution conditions should favor growth.
    – You want a single, big crystal.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

robots and crystallization

A

Robots accelerate the optimization of crystallization conditions.

Very difficult to predict which conditions will encourage growth of a single, large protein crystal.

Robots can test hundreds or thousands of condition combinations

Factors: pH, temperature, salts/ions, large polymers (polyethylene glycol)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

shooting a crystal with X-rays

A
  1. Grab protein crystal with a loop
  2. Flash freeze it in liquid nitrogen
  3. Stick the loop on goniometer (precision rotating machine)
  4. Shoot it with x-ray beam while rotating
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

diffraction patterns

A

Protein lattice scatters the x-rays

Set up diffraction pattern on the screen behind crystal

By observing how the pattern changes as the protein crystal rotates, you can figure out what parts of the crystal unit have the most (electron) density.

But it is damaging to the crystal!!!!

*** x-rays achieve atomic-level resolution!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

atomic fitting

A
  • Places atoms within the electron density (fits to e- density map)
  • PDB visualization tools simplify this analysis
  • accuracy of atomic fitting depends on R and R
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Resolution

A
  • improves atomic detail
  • High resolution = ≥ 1.5 A (2.5 ideal)

1 = individual atoms
2 = amino acids
3 = trace the polypeptide

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

R factor

A
  • difference between the experimental and ideal density predicted from structure

R factor = ≤ 20%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Rfree factor

A

Rfree → refine structure based on training set data; calculate Rfree with testing set

Rfree (real-space R factor) = ≤ 40%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

flipping residues for atomic fitting

A
  • Histidine, asparagine, and glutamine residues often need flipping

** Hard to distinguish between C, N, and O atoms on e- density map

  • Cannot use x-ray crystallography to differentiate between the orientations
  • Computationally flip the side chains to optimize the h-bonding network (use WHATIF)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Nuclear Magnetic Resonance (NMR)

A
  • Reveals atomic environments by measuring spin properties
  • Stick your protein in a ridiculously strong magnetic field.
  • Shoot a pulse of radio-wave radiation at it.
  • Some atoms (1H or 13C) absorb the energy.
  • Wait for atoms to relax/release energy.
  • Time depends on the element and environment (shielding by surrounding electrons).
  • Combine this data with math to figure out structure(s).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

NMR stepwise

A
  1. Dissolve labeled protein at very high concentration (purified protein)
  2. Spin-active nuclei enable NMR analysis
  3. Radio waves excited nuclei in magnetic fields, “flipping” them
  4. Electron shielding affects magnetic resonance
  5. Nuclei flip back, emit radiation, and so produce a spectrum
  6. Isotopic labeling enriches 2D NMR spectra
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

electron shielding

A

** affects magnetic resonance

Electrons near nuclei shield it from the external magnetic fields

  1. Surrounded by lots of electrons
    – Effective magnetic field is lower.
    – Less energy required to flip it.
    – Lower-frequency radio waves do the job.
  2. Surrounded by few electrons
    – Effective magnetic field is higher.
    – More energy required to flip it.
    – Higher-frequency radio waves do the job.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

coupling nmr

A

NMR can also detect spatial and bond-mediated coupling

  • coupling provides info on structure
  1. dipolar coupling
  2. scalar coupling
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Dipolar coupling

A
  • Provides spatial restraints between nuclei
    – Two nuclei close to each other in space influence each others’ magnetic fields, slightly altering the effective magnetic field each experiences.
    – Effect is distant dependent.

So the strength of dipolar coupling gives ranges (restraints) on the distances between pairs of nuclei.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

scalar coupling

A
  • When 2 atoms are chemically bonded, their two nuclei affect each other
  • Provides useful information about torsion angles
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

cryo electron tomography

A
  • Shoot frozen molecules with electron beams
  • Construct 3D images from 2D shadows
    *visualizes molecules in natural states
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

cryo em tomography advantages/disadvantages

A

Advantages:
Specimens do not need stained/crystallized (natural envt)

Disadvantages:
Used to be more difficult to get high-res model, but some have a resolution > 3 A

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

flash freezing

A

Flask freezing preserves biological structures

– Flash freeze sample (in a slab) so the particles in it no longer move.
– Liquid ethane is cold (-150 °C).
– Freezes so fast that ice crystal can’t form. “Vitreous ice.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Electron Beams
- Electron beams cast molecular shadows - Electron microscope shoots beams of e- through the sample - Different shadows, depending on the orientation of the molecule
26
Algorithms help to ID similar molecule shadows
- Images of the shadows are noisy - Reduce image noise by AVERAGING -- Align similar images -- Average multiple aligned images
27
Construct 3D densities by considering multiple shadows
Then construct 3D images from all the averaged (smooth) 2D shadows Make final model by positions atoms within the 3D density
28
visual representations of proteins
1. line 2. licorice style 3. van der waals spheres 4. ribbon/cartoon 5. surface models
29
line representations
- Bonded atoms are connected by simple lines - Not easy to understand protein structure; pile of lines
30
licorice style
- Highlights molecular bonds - Sticks (cylinders) connect bonded atoms instead of lines
31
van der Waals spheres
Model atomic sizes; atoms are represented by solid spheres
32
ribbon/cartoon
- Reveal protein backbone (alpha helices/beta sheets) - Interpolates the positions of the alpha carbons
33
surface models
Reveal protein accessibility SAS: solvent accessible surface SES: solvent excluded surface
34
Ex/ approach for targeting a protein using visual molecular tools
1. Use UniProt to get protein sequence 2. Search PDB for similar proteins 3. Limit search to ones with non-polymer ligands 4. Now you can see if small-molecules (drugs) bind to related proteins
35
sequence and structural alignment
- Sequence alignment matches similar sequence regions -- Adds gaps to amino-acid (or nucleotide) sequences so that similar regions line up - Aligning sequences can show structural, functional, and evolutionary relationships between proteins -- Not all AA mismatches are equally bad (same charge/type)
36
Substitution matrices (BLOSUM matrix)
Improved alignment accuracy - Penalize different substitution based on how unlikely they are - Changes based on how bad the substitution would be to protein
37
Clustal Omega
Performs reliable MSA → gets percent identity between 2 protein sequences * = identical : = similar . = kinda similar
38
structural alignments purpose
Reveal 3D similarities and align based on shape 1. Start with seq alignment to identify equivalent amino acids 2. Rotate/translate one protein until it's superimposed on the other 3. Measure the distance between the 2 protein to judge how similar they are
39
RMSD
- Calculates 3D distances between atoms (not if a mismatch) - Looks at flexibility between proteins and deviations in atom positions from REFERENCE structure - To overlap proteins – minimize the RMSD as much as possible
40
FoldSeek: Searching by structure
- Proteins with similar sequences, often similar structures and functions → but not true 100% - Helpful to search by structural simialriy alone (low RMSD) - Can rapidly search for experimental and predicted structures similar to query protein
41
FoldSeek: High-level overview
- Represent local protein shapes as 3Di letters. -- Encode for how a part of the protein interacts with its neighbors in 3D space (geometry barcode) - Quickly discard proteins with no small matching 3Di chunks in common (no chance of them being similar) - For remaining, perform more expensive alignment of 3Di letters (similar to traditional sequence alignment) - Rank remaining “hits” using more traditional/expensive structural metrics (not RMSD, but conceptually similar; overall fold similarity and local structural similarity)
42
Homology models
Predict protein structures Goal: make a reliable protein model from any amino-acid sequence -- Human genome only encodes ~75,000+ proteins -- There are at most only several thousand unique protein folds Modeling builds on solved protein structures -- Solve enough structures so we can model the rest -- Number that needs to be solved depends on our ability to model High similar sequences have similar structures -- Proteins with homologous sequences (>30% sequence identity) tend to have similar structures -- 25% of known protein sequences are homologous to other sequences
43
Traditional homology modeling stepwise
1. Get protein seq on UnitProt 2. Identify homologous sequences in the PDB 3. Align query sequence with homologues 4. Find structurally conserved regions 5. Identify structurally variable regions 6. Generate coordinates for conserved regions -- Identical AA: transfer all atom coordinates (XYZ) to query protein -- Similar AA: transfer backbone coordinates and replace with side chain atoms -- Different AA: transfer only the backbone coordinates (XYZ) to query sequence 7. Generate coordinates for variable regions 8. Add side chains 9. Refine structure 10. validate structure
44
conservation and structural roles
Conservation suggests structural roles 1. High sequence conservation -- Tend to be stable, at protein’s core -- Secondary structures (helices, sheets, etc.) 2. Low sequence conservation -- Tend to be least stable, most flexible, on protein’s surface -- Often loops and turns
45
alphafold
Revolutionized protein structure prediction very accurate even without good template protein for predicting the structure of a protein
46
alphafold stepwise
1. MSA of related proteins to identify amino acids that tend to evolve together 2. Co-evolving residues probably interacting…try to guess at those interactions 3. Also tries to predict residue-residue distance, trained on known structures 4. Final model subjected to minimization (not ML, but force field)
47
SWISS-MODEL
- Automates homology modeling - Automated, online server for homology modeling (traditional method) ***Not necessarily the best, but probably the easiest
48
good drug targets
Must be disease-related and specific 1. Related to a disease 2. Essential 3. Specific pocket – not a pocket that binds a common metabolite (many side effects)
49
Discovering Drug Targets
- Disease-associated mutations reveal drugable proteins
50
TDR Targets database
- Helps identify drug targets for neglected disease research - Focuses on neglected diseases (bacterial and eukarotic pathogens)
51
“Guilt by Association”
- strategy for assessing drugability - The protein is a member of a protein family that contains other druggable proteins -- protein family: proteins with similar sequences, strucutres, and functions FLAWED → -- Biased towards proteins that have been previously drugged -- Not always true that all members of the same family are equally druggable
52
SCOP2
- Organizes proteins by structural relationships - Structural classification of proteins database to see evolutionary relationships 1. Class: fold type 2. fold 3. superfamily 4. family 5. protein domain 6. species
53
CATH
- groups proteins by structural features 1. Class: Fold type (e.g., beta sheets; same as SCOP) 2. Architecture: Structurally similar, but no evidence of homology (same as SCOP fold) 3. Topology/fold: Group by structural features. 4. Homologous superfamily: Distant common ancestor (same as SCOP superfamily)
54
structural analysis reveals druggable hotspots
- Reveal protein sites amenable to high affinity binding - Binding fragments tend to cluster there - Good for identifying protein where ligands/drugs/chemical probes might bind
55
druggable hotspots
- Usually cavities ⇒ best for small-molecule binding Protein-protein interactions: two large, flat surfaces -- Have a reliable way to design small molecules that disrupt those interactions Binding pockets need specific features for druggabiltiy -- Cavities with features… -- Hydrogen-bondiong opportunities -- Electrostatic interactions Greasy pocket → hydrophobic interactions are often important but non-specific
56
MSCS: Multiple solvent crystal structures
Experimental method that identifies druggable hot spots 1. Soak a protein in an aqueous solution of ~6 organic probes 2. Aligning the structures resolved via X-ray crystallography 3. Identifying regions where probes tend to congregate
57
nmr detection for binding
- Detects ligand (fragment) binding through spectral shifts - When a ligand binds a protein, the protein atoms it touched and themselves in a different environment - Causes a detectable shift in the NMR spectra of the atoms - Chemical shift with increasing ligand concentrations
58
FTMAP
- Identifies potential binding hotspots computationally - A computational method that is faster and easier - FTMap virutalyl floods protein models with chemically diverse, small organic probes (a kind of docking) - Protein regions where organic probes consistently congregate are often druggable - FTMAP web server provides easy access to druggabiliy tools
59
MolModa
- Provides a tool for detecting druggable protein pockets - Based on the fpocket program for druaggable-pocket detection -- Fpocket is a command-line porgram
60
molecular motions
- occur on distinct time scales
61
NMR results
- Reveals information about protein flexibility - NMR is powerful, but it is also: -- Time and resource intensive -- Sometimes difficult to perform -- Limited in applicability (e.g. small proteins)
62
Crystallography results
- Ligand binding can alter dynamics - Proteins resolved with different bound ligands (or no ligand) can have different shapes - Crystallography provides only limited info about dynamics
63
Simulations results
- Reveal dynamic molecular details - Simulations: Very detailed -- Down to femtosecond time resolution (one quadrillionth, or 1e-15 of a second -- Down angstrom spatial resolution (1e-10 of a meter, or a tenth of nanometer) -- “Single molecule experiment”
64
molecular interaction energies
- determine simulated motions - forces that act on the atoms - force field: energy functions. and parameters
65
force fields
forces that act on the atoms - bonding stretching - angle bending - van der Waals - bond rotations (torsions) - electrostatics
66
Classical MD cannot simulate chemical reactions
- There is no bonding breaking and formation. - So can’t model catalysis, for example - Would require quantum-mechanics calculations
67
Accurate parameters enable reliable simulations
- parameters based on small molecules - Spectroscopy data -- Bond stretching parameters -- Angle bending parameters - High-level QM calculations -- Atomic charges -- Dihedral parameters
68
Hydrogen placement is critical for accurate simulations
- Crystal structures: usually no hydrogen atoms - Where to add hydrogen atoms depends on the pH - need to optimize h-bond network
69
Proper protonation states must be assigned
- at any pH in the body - Arginine and lysine are protonated - Histidines are wild card (at 7.4, both protonated and neutral forms present)
70
Counter ions maintain electroneutrality
- Add counter ions (NaCl) -- To neutralize system electrically -- To simulate physiological concentrations (150 mM)
71
Proteins are immersed in a virtual “water box”
Immerse the protein in a box of (explicit) water molecules -- use periodic boundary conditions
72
What are the advantages or disadvantages of explicit vs implicit solvent simulations?
1. Simulations with explicit waters are more computationally intensive, but generally more accurate 2. Simulations with implicit waters are faster, but less accurate
73
Accurate parameterization ensures realistic simulations
- PDB files just include coordinates, atom names, etc - Nothing about the stiffness of the bonds, the partial atomic charges, etc. - You must parameterize the structure according to the selected force field.
74
molecular dynamics
Molecular dynamics simulates atomic motion over time 1. initial atomic model 2. calculate molecular forces acting on each atom 3. move each atom according to those forces 4. advance simulation time by 1 or 2 fs
75
Minimization vs simulation
- E minimization refines molecular structures - Simulations explore diverse energy landscapes - MD simulations not only produce energy minima, but various higher energy conformations
76
brownian dynamics
Simplifies simulations Assumption: because of high friction, average acceleration of molecule is very small -- Constantly crashing into water molecules, so reasonable in many cases -- Note: does not mean velocity is 0 Total acceleration (0) is assumed to be a function of: -- Forces acting on atoms (Fi ) -- Drag of water (yi vi mi ; no explicit water molecules) -- A random force (Ri ) caused by Brownian motion * can solve for the velocity of molecules to predict molecular motions -- rigid body physics -- excluded volumes -- electrostatics
77
Trajectory Alignment and RMSD
- Molecules (including proteins) bounce around a lot in solution. -- MD captures these movements. -- It’s useful to align each frame (conformation) of the simulation to a single standard (usually the first frame) - Translate and rotate each frame so as to minimize the RMSD
78
distance/angle measurements
- useful for monitoring protein dynamics - Distance measurements to monitor pocket opening and closing - Using distance to monitor electrostatic interactions.
79
Root Mean Square Fluctation (RMSF)
“Floppiness” of the atoms you’re analyzing. -- flexibility of specific atom positions -- deviation from a reference, averaged over time -- highlights dynamic regions The higher the RMSF ⇒ the more flexibility or floppiness of the particle
80
RMSD vs RMSF
RMSD is the deviation from a reference, “averaged” over the atoms. (distance differences of structure) RMSF is the deviation from a reference, averaged over time. (flexibility of atoms)
81
To calculate the average position of points in more than one dimension →
average each of the coordinates, so that the average location for the points (1,3) and (3,5) ⇒ (2,4)
82
Principal Component Analysis (PCA)
“A statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components.” - Simplifies molecular-motion data - describes major variances - visualizes structural variations - captures atomic positional variance
83
RMSD and RMSF
Measure structural changes and fluctuations RMSD: how different is a protein’s shape (conformation) relative to some reference shape? RMSF: On average, how much does a given atom move relative to its mean position over the course of a simulation?
83
PCA stepwise
Imagine calculating PCA in 3N dimensional space (N = number of atoms in the protein). For each conformation (frame) in the simulation, calculate the first two principal components. Plot all those, and bin them into a 2D histogram… -- During simulations, regions of conformational space near energy minimas are more heavily sampled. So this histogram says something about the energetic landscape of our protein.
84
Principal components
- Simplify multidimensional data - reduces complexity while retaining key information (dimensionality reduction) --- lower order components explain most of the data 1. First principal component: the line that passes through your data points in the direction they are most spread out, so you can see the overall pattern clearly. 2. Second component: a line that best captures the remaining pattern, rotated 90 degrees from the first component.
85
clustering MD simulations
- Molecular dynamics captures multiple proteins conformations - Experimental methods for resolving protein structures reveal only limited (discrete) conformations. - Molecular dynamics simulations can sample conformational space more continuously. *** Clustering identifies representative conformations - From all the many conformations sampled over the course of an MD trajectory, which ones are most representative? (The “ensemble.”)
86
clustering methods using Butina algorithm
1. For each molecule protein conformation in your library simulation trajectory, calculate the set of associated “nearest neighbors.” Near here means sufficiently similar (e.g., per a Tanimoto coefficient) within a user-specified RMSD of each other. 2. Are there any molecules conformations that have no near neighbors? These are “singletons.” Remove them from the pool of molecules. 3. Which compound conformation has the largest set of nearest neighbors? Remove those from the pool, but remember that particularly popular compound conformation (a “centroid”). 4. Repeat step 3 until there are no remaining compounds conformations in the pool. ***The set of centroids is your diversity set representative conformational ensemble.
87
Why would someone want a clustered conformational ensemble? Why not just consider all the conformations of the simulation?
Some structural analyses are too computationally intense to apply to every conformation. Computer docking (future lecture) is a good example.
88
cryptic pockets
- Require dynamic identification methods **Only apparent when a drug binds - Cannot see them in crystal structures otherwise - Find cryptic binding pockets using molecular dynamics simulations - Can identify cryptic pockets without having to resolve crystal structures with pocket-opening ligands. - Durggabilty simulations can identify cryptic allosteric pockets
89
DrugGUI
Perform simulations of proteins in the present of many, many small organic probes Identify “interactions spots”
90
The Relaxed Complex Scheme
- Molecular docking predicts ligand binding poses and affinities (preview) - Predicting small-molecule/target binding in silico. 1. Ligand pose prediction (“docking”) 2. Affinity prediction (“scoring”): maps binding geometry to a score that is correlated with affinity
91
Clustering
Clustering: Extract diverse receptor conformations from the simulation. Accounting for receptor flexibility can improve docking accuracy
92
The relaxed complex scheme integrates receptor flexibility
1. Dock each of the library molecules into each of the receptor conformations. 2. Each small molecule maps to a whole spectrum (ensemble) of docking scores. 3. Library compounds are ranked by some ensemble based metric
93
workflow for identifying potential inhibitors
Integrated MD → clustering → docking workflow identifies potent inhibitors *** Key interactions are predicted to stabilize ligand binding
94
binding free energy
Binding energy: how strong (or how tightly) the ligand binds the protein - Drugs that bind tightly can better compete with natural molecules in the cell that might bind at the same location
95
2 states for binding free energy
State 1: A protein and ligand are floating in solution far apart from each other. Don’t even “feel” each other’s presence. State 2: The ligand is bound to the protein and so forms many molecular interactions with the binding pocket. Binding energy = the difference between these states.
96
scoring functions
- evaluate molecular binding computationally 1. Ligand pose prediction (“docking”) 2. Binding-energy prediction (“scoring”): binding geometry → score that correlates with energy -- Main advantage here is speed, but at the expense of accuracy
97
scoring function differentiations
-- differ in accuracy and speed - Force-field scoring functions. - Non-bonded interactions between the protein and ligand - Pose-strain energies between the bonded atoms of ligand (some methods). - Implicit solvent (some methods).
98
empirical scoring
- predicts binding with weights 1. Count the number of predicted interactions between the protein and ligand. 2. Combine those counts into a score, weighting each of the counts to give the predictions that best match experiment (regression, training)
99
knowledge based scoring functions
- databases reveal patterns in molecular interactions 1. Look at large databases of protein-ligand complexes. 2. How often do certain atoms on the ligand come within certain atoms on the receptor? 3. If atoms are close to each other more than you’d expect given random chance, they probably participate in energetically favorable interactions.
100
machine learning
- identify hidden patterns - In trying to find patterns in data, the creator imposes no pre-conceied assumptions - The program itself finds these patterns. *** improves binding predictions
101
neural networks
- mimic biological processing - relies on connections and weights of neurons (determining the strength of connections) - Artificial neural networks process data through layered structures
102
training neural networks
- starts with encoding data in the input layer 1. Encode info about the protein-ligand pose 2. Systematically adjust the strength of the connections in the hidden layer (learns from systematic adjustments) 3. The output layer encodes the correct binding energy *** most neural networks are much more complicated than this
103
goal of neural networks
The goal is to predict experimental binding affinities from 3D ligand poses Need a vast database of protein-ligand structures, with thousands of associated experimentally determined binding affinities
104
simulations and binding energies
Simulations can increase the accuracy of binding energy estimates Much more computationally intensive than scoring functions, BUT can be more accurate *** binding free energies can be directly simulated
105
energy differences and molecular probabilities
- Energy differences depend on molecular probabilties - Molecules spend more time in energetically favorable states. ***Boltzmann equation
106
Boltzmann equation
Given the probabilities that your system is in one state or another (e.g., ligand-bound and unbound states), you can calculate the energy difference between the two states: *** energy difference between 2 equally probable states (probability, temp, boltzmann's constant)
107
state functions
State function: the path taken from point A to point B does not impact the “state” at those 2 points - simplify energy challenges - dependent only on end points *** (Binding) free energy is a state function
108
path functions
- depend on route taken
109
alchemy
- inspired early advancements in chemistry Middle ages: Transmuting “base metals” such as lead into gold. ***In some ways, the precursor to early chemistry and medicine. Since binding energy is a state function, it doesn’t matter hope the ligand gets into the pocket
110
computational alchemy
- accelerates energy calculations - Instead of one (very long) binding simulation… -- Two disappearing (alchemical) simulations. -- “Disappearing” means slowing turning down the electrostatic and van der Waals forces. ** relative free energy calculations: even small changes impact/improve affinity *** ghost part of the ligand to guide chemical optimization
111
limitations of computational alchemy
- Computationally intensive - Molecular dynamics force fields are not perfect - simulated different states for long enough to sample all the major conformations? - disappeared your molecules slowly enough?
112
docking
- predicts molecular interactions - predicts molecular recognition in silico -- ligand pose prediction -- affinity prediction by mapping binding geometry to a score that is correlated with affinity *** known binding sites make docking easier
113
local vs global docking
1. local: known pocket -- find position of ligand in binding site 2. global: no known pocket -- more difficult bc need to search for the binding site as well as the position of ligand in the binding site
114
Ways to evaluate virtual-screen performance
1. Receiver operating characteristic (ROC) curves 2. Enrichment factors
115
Receiver Operating Characteristic (ROC) Curve
ROC AUC (area under curve) meaning ⇒ The area under this curve is the probability that a randomly picked active will rank better than a randomly picked inactive (or decoy) molecule.
116
ROC cutoffs
For each cut off there are: False positive (FP) True positive (TP) False negative (FN) True negative (TN) *** ROC curves graph FPR vs TPR for every possible cutoff
117
False positive rate (FPR)
Together the number of “ground truth negatives”
118
True positive rate (TPR)
number of “ground truth positives”
119
ROC Advantages and Disadvantages
- Assesses the entire screen from best - predicted ligand to worst -- useful for benchmarking and comparing new VS methods - BUT only care about the top -scoring compounds (the one’s you’ll recommend for experimental testing)
120
enrichment factor
Calculate the percentage of all compounds in your screen that are true ligands. For every possible cutoff: 1. Calculate the percentage of compounds above cutoff that are true ligands. 2. Calculate how many times higher (or lower) that percentage is than the allcompound percentage
121
Enrichment factor Advantages and Disadvantages
- Doesn’t evaluate best predicted and worst predicted binders equally. - Shows how well your virtual screen performed among those compounds you’ll likely recommend for testing.
122
transformative drugs
Docking has contributed to the development of transformative drugs ex/ treating Cox-2, alzheimer's, and HIV
123
LLMs
Large Language Model Ex/ ChatGPT, Claude, Gemini, llama, GROk - Understands/generates human-like text. -- Answer questions -- Provide explanations -- Generate creative content - Learns from a vast (internet-scale) dataset of text to predict the next word in a sentence. One model → many different applications.
124
Pros and Con to Teaching about Chatgpt
Pros: - It’s super fun - Most professionals will use it in the future, and I want to prepare you for your future careers Cons: - Inappropriate use can 100% wreck any changes of success in a future career before you even get started - Inappropriate use can 100% wreck your otherwise successful career after it’s started - There’s a tiny chance robots will one day end humanity goal: find a middle ground
125
lead optimization purpose
Lead optimization improves efficacy and safety A crucial stage in the drug discovery process Improves initial ligand found in high-throughput or virtual screen
126
Early drug discovery follows a structured pipeline -->
1. Target ID and validation 2. Hit ID and optimization 3. Lead optimization 4. Candidate selection
127
what are we trying to optimize with lead optimization?
1. Potency (affinity): Increase the drug's ability to bind and modulate its target 2. Selectivity: Minimize off-target interactions and reduce side effects 3. Pharmacokinetics: Optimize absorption, distribution, metabolism, and excretion (ADME) properties 4. Solubility: Enhance drug solubility to improve bioavailability
128
Potency/efficacy is typically expressed as →
Dissociation constant (Kd): measure of affinity IC50: concentration of drug required to get 50% of the max possible activity (dependent on experimental setup, e.g., temperature, substrate concentration, etc.) EC50: often used to measure impact on phenotype; conc needed to have half the impact on phenotype *** lower values == stronger binding -- critical for effective therapeutic action
129
Bioisoteres
- Chemical moieties that can replace a part of a molecule while retaining target binding and activity - Similar physical/chemical/electronic/size properties ** Improve potency, selectivity, etc., and reduce toxicity Benefits: Exploration of chemical space, intellectual property protection, improved chemical properties, overcoming drug resistance, etc.
130
fragment swapping
Adding fragments can enhance potency of the drug -- lead optimization
131
strategies for lead optimization
1. Fragment addition/swapping 2. Merging 3. Linking (preserves poses when not linked)
132
DeepFrag
- Input structures give out finger prints to create label set - Recommends fragment additions - The receptor and parent are voxelated - Uses ML for this type of drug discovery 1. Takes a bunch of proteins and ligand structures 2. Parent ligand and protein become voxal grids (grids of 24x24x24 points of 3D space) - For each atom in the receptor, we see how they contribute to protein-ligand interactions - Basically mapping atom positions onto a grid - The fragment is vectorized - Converted fragments to fingerprint vectors using RDKFingerprint algorithm. (0s and 1s)
133
Training time for DeepFrag
- Final model: five days to converge (GPU) - To prospectively evaluate a single receptor/parent complex (at inference time) can easily run on a CPU (~30 seconds)
134
Humans can’t read RDKFingerprints
- A separate look-up table (label set) of known fragments with associated RDKFingerprints - Cosine similarity to find fragment most like prediction - Label set independent of TRAIN/VAL/TEST sets -- One can use the trained DeepFrag model with different label sets -- It can be general or customized (fragments of interest)
135
Which of the following best describes the main assumption behind QSAR (Quantitative Structure-Activity Relationship) modeling?
QSAR assumes that molecules with similar structures often have similar biological activities. This is a key principle in QSAR modeling, as it relies on identifying structural similarities among molecules to predict their biological activities.
136
Why is it important to carefully select molecular descriptors when building a QSAR model?
Using too many descriptors can reduce model interpretability and lead to overfitting, making predictions less generalizable.
137
Why is it important to separate data into a training set and a testing set when building a QSAR model?
Separating data ensures that the model is evaluated on unseen data, preventing overfitting and improving generalizability. If a model is trained and tested on the same data, it may simply memorize patterns rather than learning general relationships, leading to poor performance on new compounds.
138
What is a key challenge in QSAR modeling when working with a congeneric series of molecules?
Small structural changes can sometimes lead to large, unpredictable changes in biological activity, making modeling difficult. While QSAR assumes that structurally similar molecules have similar activities, real-world data often show "activity cliffs," where minor modifications drastically alter binding affinity.
139
What is one reason why advanced machine learning methods, such as neural networks and random forests, are often preferred over traditional linear regression in QSAR modeling?
Advanced machine learning methods can capture complex, non-linear relationships between molecular descriptors and biological activity. Many QSAR relationships are not purely linear, and methods like neural networks and random forests can model intricate dependencies that linear regression cannot.
140
Which of the following statements best describes a key limitation of X-ray crystallography in determining protein structures?
X-ray crystallography provides high-resolution structures but typically captures proteins in a static crystalline state, which may not reflect their natural flexibility.
141
Which of the following best describes a key challenge in using NMR spectroscopy to determine protein structures?
NMR spectroscopy requires high concentrations of soluble, non-aggregated protein, which can be difficult to achieve for many proteins.
142
What is the primary reason cryoelectron tomography (Cryo-ET) is useful for studying biological molecules in their native state?
Cryo-ET does not require crystallization or staining, allowing researchers to study molecules in their natural environment. Unlike X-ray crystallography, which requires crystallization, or certain electron microscopy methods that require staining, Cryo-ET allows molecules to be observed in a near-native frozen state, preserving their biological structure.
143
Why is it challenging to crystallize certain proteins for X-ray crystallography?
Proteins are large and flexible, and their flexibility can make crystallization difficult due to entropic penalties.
144
What is the purpose of averaging multiple molecular shadows in cryoelectron tomography (Cryo-ET)?
Averaging multiple similar molecular images reduces noise and improves the clarity of the final 3D structure. The raw images obtained from Cryo-ET are often noisy due to the low electron dose used to prevent sample damage. By aligning and averaging multiple similar molecular projections, researchers can enhance signal quality and improve resolution.
145
What is the purpose of flash freezing in cryoelectron tomography (Cryo-ET)?
Flash freezing prevents the formation of ice crystals, preserving biological structures in a near-native state.
146
Why is it necessary to computationally flip histidine, asparagine, and glutamine side chains during protein structure determination?
Computational flipping optimizes the hydrogen-bonding network, ensuring correct side-chain orientation in the electron density map. In X-ray crystallography, electron density maps do not always clearly distinguish between nitrogen, oxygen, and carbon atoms in certain amino acid side chains. Computationally flipping these residues helps optimize hydrogen bonding and improves model accuracy.
147
Which of the following best describes the purpose of different molecular visualization models in protein structure analysis?
Different molecular visualization models emphasize various aspects of proteins, such as backbone organization, atomic interactions, or solvent accessibility.
148
Which of the following best describes a key advantage of using UniProt for protein searching?
UniProt provides protein sequence and functional information so researchers can explore protein properties, structures, and interactions.
149
What is a primary advantage of using the Protein Data Bank (PDB) for structural biology research?
The PDB is a repository of protein structures that allows researchers to analyze molecular conformations, interactions, and binding sites.
150
What is a primary advantage of using advanced PDB searches for protein exploration?
Advanced PDB searches allow users to refine queries based on chemical, sequence, and structural attributes. Unlike simple keyword searches, advanced PDB searches support filtering by chemical descriptors, sequence motifs, and ligand interactions, making it easier to find proteins with specific structural or functional properties.
151
What is the primary reason why amino-acid sequence alignment is useful in bioinformatics?
Sequence alignment reveals structural, functional, and evolutionary relationships between proteins.
152
What is the role of substitution matrices, such as BLOSUM, in sequence alignment?
Substitution matrices assign scores to amino acid substitutions based on their likelihood, improving alignment accuracy.
153
Why is root-mean-square deviation (RMSD) commonly used in structural alignments?
RMSD quantifies a kind of average distance between equivalent atoms in two aligned protein structures, providing a measure of structural similarity.
154
What is the main advantage of homology modeling in protein structure prediction?
Homology modeling allows researchers to predict protein structures by leveraging known structures of homologous proteins.
155
What is the main challenge in modeling flexible and variable parts of a protein, such as loops, during homology modeling?
These regions are difficult to model accurately because they are flexible and often lack similar regions in known protein structures.
156
Why is relying solely on the “Guilt by Association” strategy potentially flawed when assessing protein druggability?
It assumes all members of a protein family are equally druggable, which is not necessarily true
157
Which of the following structural features most strongly suggests that a region on a protein could be a druggable hotspot?
A cavity with hydrogen-bonding and electrostatic interaction potential
158
Why can the presence of disease-associated mutations in a protein support the idea that the protein is druggable?
Because such mutations suggest that the protein plays a functional role in the disease and may have sites important for regulation or binding
159
What is a key feature of the TDR Targets database that makes it particularly useful for drug discovery in neglected diseases?
Its search feature can identify essential proteins in bacterial and eukaryotic pathogens with available crystal structures
160
How can X-ray crystallography provide insights into protein dynamics, despite being a method that captures static structures?
By revealing different conformations of the same protein in complexes with various ligands, suggesting the protein samples multiple dynamic states.
160
Which experimental method can be used to verify that a small molecule binds to a specific site on a protein by detecting spectral shifts?
NMR
161
Which statement best describes how molecular dynamics force fields approximate the forces between atoms?
Bonded and non-bonded interactions are represented using spring and potential energy functions that model chemical bonding and physical forces, respectively.
162
In hybrid quantum mechanics/molecular mechanics (QM/MM) simulations, why is it essential to model only a subset of the system quantum-mechanically, and what specific limitations of classical MD does this hybrid approach address?
QM is applied to chemically active regions where classical force fields fail to model effects like bond formation and proton transfer.
163
Which statement best distinguishes the “lock-and-key” model from modern models of ligand binding, and what are the implications for identifying druggable receptor conformations?
The lock-and-key model assumes a rigid receptor with a preformed binding site, whereas modern models recognize that proteins exist in multiple conformations and that ligands can selectively bind or stabilize these dynamic states, expanding the set of druggable targets.
164
Why are cryptic binding sites often missed in X-ray crystallographic structures, and how can molecular dynamics simulations help reveal them?
Cryptic sites are typically absent from static crystallographic structures because they exist in low-population, transient conformations that may only be exposed during protein motion—motions that can be captured by molecular dynamics simulations.
165
What fundamental problem in traditional virtual screening does the relaxed complex scheme (RCS) address, and what is a key limitation that still affects its accuracy?
RCS better accounts for receptor flexibility by docking into an ensemble of protein conformations, but it still suffers from limited conformational sampling and scoring inaccuracies.
166
What is the primary reason alchemical methods can be used to accurately estimate binding free energies, and what major challenge limits their broader adoption in pharmaceutical research?
Alchemical methods rely on the fact that free energy is a state function, so even a non-physical transformation (like “disappearing” a ligand) can yield an accurate ΔG, but the approach is highly sensitive to insufficient conformational sampling during simulations.
167
Why are fast docking-based scoring functions still commonly used in virtual screening pipelines, even though more accurate free-energy methods like alchemical calculations exist?
Docking-based scoring functions are computationally inexpensive and allow screening of large compound libraries, making them practical for early-stage filtering, despite their lower accuracy compared to alchemical methods.
168
What is the key tradeoff introduced by accelerated molecular dynamics (aMD), and how does this affect the interpretation of results in structure-based drug discovery?
aMD improves sampling by artificially lowering energy barriers between conformational states, allowing transitions that would otherwise be rare, but this introduces artifacts that can affect the physical accuracy of structural interpretations.
169
Preparing a protein structure for a classical molecular dynamics (MD) simulation involves several essential setup steps. Which of the following accurately describes one of these steps?
Force field parameters, such as bond stiffness values and partial atomic charges, must be assigned to the coordinates because this information is not typically included in standard PDB files.
170
Classical molecular dynamics simulations depend on a force field to calculate the potential energy and forces governing atomic motion. Based on the components typically defined in these force fields, which statement correctly identifies the interactions being modeled?
The force field calculates the forces between bonded atoms (e.g., bond stretching, angle bending, dihedral torsions) and non-bonded atoms (van der Waals forces and electrostatics).
171
An important step in preparing a protein structure for MD simulation is to correctly assign protonation states and charges, especially for ionizable residues. Considering the role of pH, which statement accurately describes this aspect of simulation setup?
Assigning correct protonation states to residues like aspartic acid, glutamic acid, lysine, and arginine is crucial; histidine, especially, often needs careful evaluation near neutral pH.
172
Molecular dynamics simulations of proteins generally include representation of the aqueous solvent environment. Regarding how water is modeled, which statement correctly differentiates between explicit and implicit solvent approaches?
Explicit solvent models involve simulating many individual water molecules, whereas implicit models represent the solvent as a continuous medium, trading atomic detail for speed.
173
Which of the following best describes the algorithmic goal of trajectory alignment in MD simulations?
To minimize the root-mean-square deviation (RMSD) by aligning each frame to a reference frame
174
What does Root Mean Square Fluctuation (RMSF) primarily measure in MD simulations?
The average deviation of atomic positions from their mean position over time
175
What is the primary purpose of Principal Component Analysis (PCA) in molecular dynamics simulations?
To reduce the dimensionality of molecular motion data by identifying the principal axes of variance
176
What is the primary assumption in Brownian dynamics simulations that simplifies the calculation of molecular motion?
The average acceleration of the molecule is very small due to high friction
177
What is the primary use of distance and angle measurements in molecular dynamics (MD) simulations?
To monitor conformational changes and interactions between residues over time
178
Which of the following best explains why clustering is used when analyzing molecular dynamics (MD) simulations of proteins?
To identify a small set of representative protein conformations from a simulation that samples many structures.
179
What best characterizes cryptic binding pockets in proteins?
They are pockets that appear in some protein conformations, often only upon ligand binding.
180
Which statement best describes the role of DruGUI in identifying potential drug-binding sites?
DruGUI uses molecular dynamics with small probe molecules to reveal dynamic interaction hotspots.
181
What is the main advantage of the relaxed complex scheme in structure-based drug discovery?
It accounts for protein flexibility by docking each compound into a set of diverse receptor conformations.
182
What is the primary purpose of molecular docking in structure-based drug discovery?
To predict a binding pose and generate a score that hopefully correlates with binding affinity.
183
Which statement best explains the difference between local and global docking in structure-based drug design?
Local docking is used when the binding site is known, while global docking is generally necessary when the binding site is unknown.
184
What does an enrichment factor (EF) of 4.0 at a given cutoff indicate in a virtual screen?
The proportion of known ligands in the top-ranked compounds is four times higher than in the overall screening library.
185
What does the area under the ROC curve (AUC) represent in the context of virtual screening?
The probability that a randomly chosen active compound will rank higher than a randomly chosen other (probably inactive) compound.
186
Which of the following statements best describes a key advantage and disadvantage of using ROC curves to evaluate virtual screening results?
ROC curves provide a full performance overview across all thresholds but may be less informative when the top-scoring compounds are the primary concern.