Exam II Flashcards

1
Q

proteins purpose

A
  • bind to other molecules to mediate critical biological processes
  • catalyze chemical reactions
  • control signaling pathways
  • structural proteins maintain cellular architecture and function
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

4 levels of protein structure

A

Primary: amino acid sequence; nothing to do with structure

Secondary: local structures folding (helices/sheets); hydrogen bonding

Tertiary: local conformations fold into 3D structure of a protein

Quaternary: multiple chains of proteins that associate into multi-protein complexes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

primary

A

Defined by amino acids sequences

Proteins are polymers of 20 amino acids

Common backbone, variable side chains

Side chains determine differences between identities/charges

Folding allows charges to interact

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

protein stabilization/folding

A

Specific interactions stabilize protein shapes/determine how they fold

Covalent/Disulfide bonds, Salt bridge, Hydrogen bond, Long-range electrostatic interaction, VDW interactions

hydrophobic: protein core
other interactions: allow side chains to interact/fold

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

secondary

A

Regularly repeating backbone conformations, held in place by hydrogen bonds

Backbones of AA interact with one another through H-bonds

AAs that interact in the local structures can be far apart in sequence, but close together in structure

BETA SHEETS
ALPHA HELICES
SURFACE LOOPS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

beta sheets

A

Provide stability through H-bonding

Strands run parallel or antiparallel to each other with H bonding between backbone groups

Alternates H-bonds between the backbone atoms (acceptors and donors)

Hydrogen-bonding amino acids are distant from each other in terms of sequence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Alpha Helices

A

Most common secondary structure in proteins

Hydrogen bonding between C=O/N-H groups that are four residues apart

H-bonding AAs are close together in terms of sequence

Can be amphipathic: having hydrophilic and hydrophobic sides

Helps proteins associate with membranes and stabilize structures

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Surface Loops

A

Provide flexibility and binding specificity

Not structured; no regular backbone conformation

Good for binding site

Loops on surface (compared to AH/BS in backbone of protein) interact with small molecules

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Tertiary

A

Structure arises from folded secondary elements

Pack secondary-structure elements together

Very complex, so many degrees of freedom (rotatable bonds), but typically folds into determined 3D conformation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

oligomers

A

more than one polypeptide chain

Homo: 1 kind of sequence/chain
Hetero: different sequences/chains

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Quaternary

A

Involves multiple chains (monomers assemble into larger protein oligomer)

dimer, trimer, tetramer…

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

monomers

A

individual oligomer chains

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

drug target

A

biological entity (proteins/nucleic acids) to which a drug/ligand can bind

Binding alters drug target activity

classified by interactions/role: protein, nucleic acid, lipid

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

pockets VS surfaces

A

Surfaces BAD; Pockets GOOD (well-defined pocket good for fitting small molecule)

Protein-protein interactions are much harder to inhibit than enzymes with small molecule substrates

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Good Drug Target Qualities

A
  1. pocket vs surface
  2. essential to disease
  3. specific pocket
    – common shaped pockets can cause excessive binding and many side effects
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

ligands

A

anything that binds to a protein/drug target → proteins, nucleic acids, small molecules

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

drugs / drug types

A

Substance that causes physiological change in the body (often a ligand)

Biologics: drugs from biology
(Antibodies, vaccines, gene therapies, stem-cell therapies)

Small-molecule drugs: synthesized molecules

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Drug classifications

A
  1. Pharmacological effect
  2. Target system
  3. Site of action
  4. Structure/family of molecules
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

agonist VS antagonist

A

Agonist: binds and alters activity of receptor

Antagonist: binds, but doesnt not alter activity of receptor, but blocks (competitive) or dampens (allosteric) action of the agonist; reduces activity by preventing other binding

Effect: Agonists activate receptors, while antagonists block them.
Action: Agonists mimic natural substances, while antagonists oppose them.
Response: Agonists produce a response, while antagonists prevent a response.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

metabolite analogs

A

Mimic natural molecules for therapeutic effects

  1. Starts with a bioactive molecule
  2. Modify to it still binds to a target, BUT doesnt have the same biological effect

Disadvantage: binding promiscuity (ex/ ATP analog will bind to many pockets)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

high throughput screening (HTS)

A

Tests thousands of compounds for activity

targets cells, tissues, and proteins

Uses robotics, data processing, and control software, liquid handling devices, and sensitive detectors

Allows screening in 384, 1536, or 3456 well formats

UltaHTS (uHTS) enables testing of 100,000+ compounds per day!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

2 types of HTS

A
  1. phenotypic screening
  2. receptor-centric screens
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Phenotypic screens

A

Testings 100,000s of compounds for effects on cells/tissues

Very expensive

UNKNOWN DRUG TARGET

Can see how effects the whole cell overall

Phenotypic screening does not reveal:
- Why a compound is active
- If a compound is specific for a given target

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

receptor-centric screening

A

Testing 100,000s of compounds for effects on specific proteins

Often miniaturized enzymatic assays

Virtual screening…doing this in silico

KNOWN TARGET CELL

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
drug hit VS drug lead
Drug hit: a molecule found in a HTS that has activity Optimized hits → leads Drug lead: hits that are chemically “tweaked” (optimized) to improve selectivity, potency, or pharmacokinetics (how compound is metabolized)
26
Computer-Aided Drug Discovery (CADD)
any computational technique that helps identify potential drugs, without having to make and test them in reality accelerate drug discovery without the time and expense of HTS Goal is to prioritize compounds for subsequent testing (so fewer need be tested costly and time-intensive
27
CADD costs
Cost Invested: $350 million per single drug; $5 billion per successful new medicine Time Invested: 11-14 years
28
CADD success increasing
CADD success is driven by advances in data, technology, and algorithms Becoming increasingly successful in the last 20 years due to: 1. Availability of structural information 2. Increases in computer hardware performance 3. New theories, algorithms, and software 4. Advances in computing power enabling larger molecular simulations
29
Enthalpic Interactions
- critical for protein-ligand binding - measures disorder and probability of binding - how can system be arranged with same overall energy? -- molecule flexibility about single bonds (poses) *** systems tend to move to a higher entropy -- bc energy is required to go against entropy, which is bad for binding
30
chain vs ring entropy
Chains: more flexible so more entropically favorable Rings: more rigid, less entropic penalty bc change in entropy is not as much
31
2 Main Theories of Ligand Binding
1. Rigid “lock-and-key” -- Receptor and ligand fit perfectly together and are rigid 2. Induced FIt -- Induces a change in the shape of the binding pocket, so that the ligand can fit when bound *** More accurate theory (accounts for protein flexibility)
32
selective fit theory of binding (induced fit)
1. Receptor exists in a population of energetically low-lying sub-states (ensemble) 2. Ligand binds to one of these sub-states and shifts the population toward the favorite bound conformation --- Stabilizes energy when fit is compatible ** Population/conformational shift into different ensembles
33
ways to measure drug efficacy
1. Kd (Dissociation Constant) 2. Residence time 3. IC 50 & EC 50
34
Kd (Dissociation Constant)
Measures drug-receptor binding affinity Lower Kd → higher affinity Kd = Koff / Kon Fast Kon + slow Koff → good affinity Slow Kon + fast Koff → bad affinity Units: M-1s-1
35
Koff & Kon
Koff (Association Rate Constant) = how quickly does the ligand leave pocket/unbind Kon (Dissociation Rate Constant) = how quickly does the ligand enter pocket/bind Fast Kon + slow Koff → good affinity Slow Kon + fast Koff → bad affinity
36
Kon and Diffusion Limitation
Physical limits on how fast Kon can be At max speed, Kon is constrained by the rate molecules diffuse through water to target Many binding events involve induced fit, which can slow Kon below its theoretical diffusion limit
37
residence time
Residence time (T) = 1 / Koff Longer residence time → prolonged drug action Drugs with low Koff often have long-lasting effects even if plasma levels drop Covalent inhibitors effectively have near-infinite residence times
38
IC50 & EC50
IC50 (Half-Maximal Inhibitory Concentration): Drug concentration needed to inhibit by 50%. EC50 (Half-Maximal Effective Concentration): Drug concentration needed for 50% max biological effect
39
Key Differences between Kd, IC50, and EC50
Kd: true thermodynamic constant (intrinsic property of a ligand-protein interaction) ***Independent of receptor concentration IC50/EC50: depend on experimental set up ***Dependent of receptor concentration
40
2 Modes of Receptor Modulation/Pocket types
orthosteric vs allosteric pockets
41
Orthostery Pocket
Where natural ligand binds, triggers protein function (primary or active site) Easier to design bc of defined binding pocket target
42
Allosteric Pocket
Other pocket separate from orthosteric site Ligands regulate activity (enhance or inhibit) w/o directly competing with natural ligand Disable on a range (minimizes not eliminates → helps with side effects)
43
Advantages & Disadvantages of Orthosteric Drugs
- direct competition with endogenous ligands - more likely to fully block activity - known pockets - easy to ID - lacks selectivity - dose-limiting toxicity: more likely to fully inhibit drug targets, so could cause excessive responses - competition with endogenous ligands: drugs must be in high concentrations to outcompete natural ligands
44
Advantages & Disadvantages of Allosteric Drugs
- more selective targeting - fine-tuned control, partially inhibit/activate - encourages new drug designs, can target proteins that are difficult to affect with traditional (orthosteric) drugs - more diverse - easier to create allosteric agonists - harder to discover (sites are often hidden/cryptic) - may have weaker effects; need proteins natural ligand to be present to work effectively
45
SMILES
SMILES: Simplified Molecular Input Line Entry System simplifies chemical structure representation into simple text string SMILE strings are NOT unique; many possible strings for a given molecule
46
SMILES rules
Atoms: B, C, N, O, P, S, F, Cl, Br, or I - Aromatic atoms are LOWERCASE Bonds: Assumed to be single, unless otherwise specified - Double: = - Triple: # Branches: parentheses - CCC(C)CC Rings: carbons marked with numbers connect together with coordinating number - C1CCCCC1
47
Canonical SMILES
ensures consistency in molecule representation Bc molecules can have multiple valid SMILES string
48
pros/cons of SMILES
Advantages: easier to compress for database storage Disadvantages: SMILES only includes info about atoms/bonds, NOT atomic positions in 3D space
49
2D SDF files
Encode both connectivity and limited spatial data (makes up for disadvantage of SMILES) Projected onto 2 coordinates Used for visualizing molecules in an easier way than 3D - # of atoms - # of bonds - bond type
50
3D SDF file
Provide full molecular coordinates Used for storing small molecules
51
PDB files
Describe biomolecular (e.g. protein) structures Do not typically need to specify the bonds for proteins bc every amino acid will have the same bonds every time Easier file to manipulate than 3D sdf - atom index - atom name - residue name - chain - residue index - XYZ Chain > residue > atom
52
differentiating between 2D VS 3D sdf file
The 1 (fifth field) suggests 3D coordinates. If this field is 0, it typically means 2D.
53
RDKIT
Powerful cheminformatics library for Python Useful for loading/viewing molecules
54
PubChem
database of chemical molecules and their activities against biological assays. (NIH) Integrates chemical and biological data Lists contain experiments where drug was both active/inactive Supports versatile chemical information searches
55
types of PubChem Searches
Compounds searching: - By name - By substructure (substructure found in molecule matches sub of target molecule) - By similarity (Tanimoto) - By identity -Information available (SMILES, chemical properties) Bioassay searching: - Filtering - Information available
56
MolModa supports many PubChem functions
Loading a compound by name (e.g., aspirin) Retrieving: - Name of an existing compound - Properties of an existing compound - Bioassay activities of an existing compound Building a library of similar compounds in terms of: - Substructure - Superstructure - Tanimoto similarity
57
DrugBank
Bridges chemical and biological drug data “The DrugBank database is a unique bioinformatics and cheminformatics resource that combines detailed drug data with comprehensive drug target information.” Common Drugbank Tasks: 1. Chemical structure searching 2. Drug name searching 3. Target searching
58
common DrugBank tasks
Chemical structure searching Drug name searching Target searching
59
3 kinds of molecular searches
Substructure – Finds molecules that contain the query structure as part of their larger structure. Superstructure – Finds molecules that expand upon the query structure by adding more atoms or groups. Similarity – Finds molecules that resemble the query structure based on overall shape or properties.
60
similarity searching
1. Convert molecular substructures into sets - “Binary fingerprints descriptor” - 1 = substructure present, 0 = not present 2. Calculate Tanimoto (or Jaccard) index between the sets (finds similarity between 2 molecules) - Allows us to compare two sets of things → -- Size of intersection divided by size of union. --- Intersection: Number of bits in common --- Union: Number of bits in one or the other
61
calculating tanimoto similarity
finds similarity between 2 molecules Size of intersection divided by size of union. -- Intersection: Number of bits in common -- Union: Number of bits in one or the other c / a + b - c a = # of 1 bits in set A b = # of 1 bits in set B c = # of same 1 bits in both set A and B
62
molecular libraries
Purpose: high-throughput or virtual screening to find potential drug candidates Well designed molecular libraries balance diversity and similarity
63
molecular library goals
Goal is: 1. Structurally diverse molecules 2. Molecules that are similar to known hits 3. Molecules that are easy to synthesize
64
diversity sets/butina algorithm
- use algorithm to get set of unique molecular representatives 1. For each molecule in your library, calculate the set of associated “nearest neighbors.” - Near here means sufficiently similar (Tanimoto coefficient). - You must pick the cutoff that defines “sufficiently similar.” 2. Are there any molecules that have no near neighbors? - These are “singletons.” - Remove them from the pool of molecules, but remember them as “centroid.” - Remove most popular centroid 3. Which compound has the largest set of nearest neighbors? - Remove those from the pool, but remember that particularly popular compound (a “centroid”). 4. Repeat step 3 until there are no remaining compounds in the pool. ***The set of centroids is your diversity set.
65
molecular similarity
Measure similarity with tanimoto similarity Congeneric series aid in systematic drug optimization by using substructures or tanimoto similarity
66
Molecules that are easy to synthesize
Combinational libraries expand chemical diversity for drug discovery look at synthesizing pathways
67
Diversity VS Similarity libraries when choosing drug targets
Diverse molecular library when aiming to explore a broad chemical space and identify novel leads for a wide range of targets Similarity library of compounds is preferred for focused screening and structure-activity relationship (SAR) studies
68
Calculating Chemical Properties: Lipinski
Filtering by chemical properties improves drug-discovery efficiency Criteria for orally available drugs ***Allowed to violate 1 rule Lipinski's Rule of Fives: <= 5 nitrogens/oxygens with attached hydrogens (“hydrogen-bond donors”) <= 10 nitrogens/oxygens total (“hydrogen-bond acceptors”) Molecular mass < 500 daltons Octanol-water partition coefficient (logP) <= 5
69
lipinski's rules of 5s
Lipinski's Rule of Fives: <= 5 nitrogens/oxygens with attached hydrogens (“hydrogen-bond donors”) <= 10 nitrogens/oxygens total (“hydrogen-bond acceptors”) Molecular mass < 500 daltons Octanol-water partition coefficient (logP) <= 5 [sol in octanol] / [sol in H2O]
70
why use lipinski?
Need to filter bc there are SOOO many drugs out there(10^60), and we need to figure out which ones would be more likely to work within the library Resources can be limited (time & money)
71
PubChem for Lipinski
Provides comprehensive chemical property data 1. Search for the compound 2. Chemical properties: molecular weight, LogP (XLogP3-AA), etc. 3. Other useful information (SMILES, known protein targets)
72
Pan Assay Interference Compounds (PAINS)
Filters improve screening library reliability Molecules with certain substructures often appear as hits in high-throughput screens, against many proteins. (false positives that are not worth optimizing)
73
Bubonic Plague
Caused by Yersinia pestis Evidence of human infection for at least the past 6,000 years First plague (6th century): 50 million people About 200 million people on earth at the time Second plague (14th century): ⅓ of Europe Third plague (19th century): 12 million deaths in India and China Since 1965…dropped death rates for diseases using pharmaceuticals (antibiotics)
74
Dr. Barry Marshall
Bad scientist Everyone thought ulcers were caused by stress and spicy foods Dr. Marshall and Dr. John Warren thought it was H.pylori Bad experimental design… -- drink H. pylori, vomiting, so give antibiotics
75
Martin Shkreli: the “pharma bro”
Controversy: Costs a lot, but also abuses CEO/founder Turing Pharmaceuticals Got rights to the antiparasitic drug Daraprim, increased price from $13.50 to $750 overnight Convicted of securities fraud and conspiracy → prison Asked court for release in 2020 so he could develop a cure for COVID-19. Denied!!
76
why does drug process take so long?
1. Identify diseases 2. Isolate protein involved in disease -- 2-5 years 3. Find drug effective against disease protein -- 2-5 years 4. Preclinical testing -- 1-3 years 5. Formulation and scale-up 6. Human clinical trials -- 2-10 years 7. FDA approval -- 2-3 years
77
3 ways for drug discovery
From luck From nature From medicinal chemistry
78
from luck
Penicillin (1928): Accidental contamination of a bacterial culture by mold (Alexander Fleming) Librium (1957, the first benzodiazepine): - Chemical synthesis mistake (accidentally used wrong reactant, made wrong compound) - Tested all but one anyway, just to see. No effect. - Decided to go back and test the last one a year and a half later just to see…
79
From Nature…
Secondary metabolites: Defenses against predators, parasites, diseases, interspecies competition, and to facilitate reproduction. - natural sources provide templates for drug innovation (different climate --> different relationships) - Natural products are often potents and have very specific biological activities (chemistry of molecules had eons to evolve for good affinity) - Isolating natural products requires extensive effort and expertise ** top hits are usually from plants ***34% of drugs approved from 1981-2010 were natural-product based
80
Challenges of Natural Products:
Difficult to ID Limited supply Difficult chemistry → manufacturing nightmare Expensive
81
From Medicinal Chemistry…
Many drugs are entirely artificial, with no inspiration in natural compounds Chemical libraries → combinatorial chemistry Drug discovery begins by identifying and optimizing lead compounds
82
3 Alternatives to Phenotypic Screening
Bioassays: biological tests used to screen for bio activity/potency -- Key tools for screen compound libraries Activity: type of bio effect (antifungal, antibacterial…) Potency: magnitude of that effect
83
Bioassays
Bioassays: biological tests used to screen for bio activity/potency Key tools for screen compound libraries In vitro and in vivo bioassays offer complimentary screening insights
84
in vitro vs in vivo bioassays
in vitro assay: Carried out in a test-tube Cheap and quick in vivo assay: Carried out in organisms (animals,plants) Expensive, but more reliable for testing activity and potency
85
lead optimization
Refines drug properties for efficacy and safety Synthesize and test hundreds (or thousands) of structurally related compounds Computational techniques can accelerate this process. “SAR by catalogue.” (structure activity relationship)
86
small molecule models require considering molecular variants
1. desalting 2. adding hydrogens 3. chirality 4. cis-trans isomerism 5. tautomers
87
Desalting
Removing salts improves molecular database accuracy Salts: charged molecules on target Don't have bio activity, so remove -- Typically keep only the molecule with the greatest # of atoms
88
adding hydrogens
Yields models that better reflect biological reality -- SMILES strings don't include H-atoms Oxygen and nitrogen atoms can be more complicated because protonation depends on the pH of the solution and local protein environment ***use henderson-hasselbalch equation
89
pKa (Acid Dissociation Constant)
Determines protonation states (predicts proton presence) More positive the pKa → weaker the acid Henderson-Hasselbalch equation: determines protonation given pH
90
Henderson-Hasselbalch equation
determines protonation given a specific pH 10 ^ (pH - pKa) : 1
91
Chirality
*** mirror-image, non-superimposable molecules Impacts molecular recognition SMILES can include information about chirality… But not always --- "@" means following neighbors are listed anticlockwise
92
Cis-trans Isomerism
Identical groups on the same VS opposite side of the double bond Affects biological activity
93
tautomers
Interconvert based on environmental conditions structural isomers that rapidly interconvert through the movement of a proton Ex/ enol vs keto form
94
Molecular Force Fields
Force fields enable conversion from 2D to 3D structures approximates bonded and nonbonded forces (VDW and partial forces between atoms) Sets parameters for deciding ideal lengths/angles between atoms Bonds ⇒ springs (ideal length/tension with some flexibility)
95
Converting from 2D to 3D Stepwise
1. initial atomic model 2. calculate molecular forces acting on an atom 3. move each atom according to those forces 4. advance sim time by 1-2 fs 5. repeat at step 2-4 to optimize
96
Conformer Libraries
Capture the 3D shapes that molecules can adopt using: 1. Rotation about single bonds 2. Energy barrier 3. time in a given rotamer/interconversion between conformers Energy barriers define the stability/interconversion of conformers *** Some docking programs require a pre-calculated conformational library
97
Alternating Ring Conformations
Ring conformations depend on steric and electronic factors Some ring conformations are more energetically favorable/likely to have a higher retention time in this conformation Ex/ chair is lower in energy than boat
98
CADD
computer aided drug design uses QSAR models
99
QSAR models
Quantitative Structure-Activity Relationships Predicts activity without structural data Uses ML to predict if a molecule would bind to a target (w/o info on target structure) Looks for common features among drugs that all product the same biological response (design a drug that combines these features) Based on properties of small molecules only!!!
100
Congeneric series
Same basic structure with varying substituents → provide insights for QSAR modeling Structure-activity relationships (SAR) Making minor changes in a lead compound to produce analogs
101
congeneric series dream vs reality
Dream: small changes always have small impacts on affinity Reality: Sometimes simple changes drastically alter binding (sometimes small changes prevent binding entirely)
102
103
types of minor changes in a lead compound to produce analogs
Change the size/shape of carbon skeleton Change nature/degree of substitution Change the stereochemistry of the lead
104
Use regression or classification for QSAR modifications
Regression: continuous values of binding affinity Classification: does it bind? Yes or no?
105
Molecular Descriptors
Capture structural/chemical information (Physio-chemical properties) *** Molecular weight, LogP (lipophilicity), QM descriptors Careful selection of descriptors enhances QSAR accuracy Links properties to biological activity Goal is to map numerical properties of ligand to known measures of bio activity
106
Calculated (theoretical) molecular descriptors scales
0D: Molecular weight, number of bonds, tallies of atoms, etc. 1D: Derived from the molecular formula (atoms). Molecular fragment counts. Number of hydrogen bond donors and acceptors. 2D: Consider bonds between atoms/fragments. “Topological representation.” 3D: Considers shape of a small molecule in 3D space.
107
2D QSAR Descriptors
Advantages: Simple info about molecular structure. -- Don’t have to calculate the 3D conformation(s). Topostructural descriptors: Include information about the distance between atoms ”in bond space.” Topochemical descriptors: Topology as above, but also information about the chemical properties of the atoms --- chemical identity, hybridization
108
Databases for training QSAR models require:
1. Chemical structures 2. Molecular properties (or you can calculate/predict from the structures) 3. Experimentally measured biological outcomes (e.g., binding affinities)
109
Training & Linear Regression
Training QSAR models minimizes prediction errors Linear regression is the “machine learning” method traditionally applied to QSAR. *** find the coefficients (ax) that minimize the error given descriptors xn and biological activity y.
110
Weaknesses of a linear regression QSAR method:
1. The relationship between descriptors might not be linear. 2. You have to have a lot of known ligands. 3. You can only predict new ligands if they are similar to ones you trained on.
111
Model Validation
Ensures predictive accuracy -- Separate data into training and testing set N-fold cross-validation improves model reliability -- Changing what sections of the data are training/testing
112
QSPR
Quantitative Structure-Property Relationships Predicts chemical properties *** BP, solubility, viscosity, carcinogenicity, drug metabolism and clearance
113
PoseView vs BINANA
PoseView, which represents interactions in a 2D schematic BINANA provides a 3D perspective, helping you grasp the geometry of interactions between ligands and protein residues.