8. Drug Discovery Flashcards
give examples of chemical & biological data
- drug info
- drug target
- drug side effects
- drug chemical interaction
where can this info be obtained
PubChem = structures & chemical activities, PDB = structure of proteins
how long do drug discoveries usually take
decade +, hence why data analysis & algos can expedite the process
how can ml be used in drug discovery & development
predicting drug-target interactions between chemical compounds and biological targets (proteins)
how can ml be used to predict effects
determine adverse side effects or unintentional therapeutic effects (unacceptable toxicities):
- drug/drug interaction - multi-target interaction
how can ml be used post-marketing
finding patterns in drug-related adverse events because clinical trials are for a limited duration and only study limited patient characteristics. models can represent multidimensional space and determine the relationship of drug variables to adverse events
what is the pharmacological space
integration of chemical space & genomic space to infer unknown drug-target interactions
how are unknown drug-target interactions found by ml
integration of chemical & genomic space to create pharmacological space
- embed known interactions between compounds & proteins
- regression models are learned to map the pharmacological space (between genomic & chemicals)
- interacting compound-protein pairs are predicted by connecting compounds & proteins that are closer than a threshold (similarity scores are computed)
how are feature based similarity scores computed
inner product of the chemical and genomical vectors
what is GNN
graph neural network
a graph is a matrix that represents some information between two points (i and j)
a GNN demonstrates thi by passing node features as message along it’s edges. each node that is connected to other nodes, aggregates the messages from it’s neighbours via these edge connections
what are the limitations of GCN
every node sums features of the neighbouring nodes, but not itself unless there’s a self-loop
the adjacency matrix is not normalised, so multiplication of values can cause large scale differences
what is GCN
a GNN that solves limitations by using normalisation of values & a self-loop to include the node itself in the sum of the neighbouring node features
what is vgae
variational graph auto-enconder
- doesn’t use a fixed latent representation for inputs. rather, it learns the mean & sd of the latent distribution so unknown outputs can be generated