AI Flashcards

1
Q

Network Medicine is based on…

A

network science, physics, applied mathematics and statistics, computer science, biology, and medicine

*
Patients are unique
*
Patients with the same clinical picture do not share necessarily the same disease pathophenotype
*
Networks of molecular interactions (interactome) to identify unknown disease phenotypes and pathogenic event
*
Network of Networks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Network Medicine is …?

A

*
Different biological networks capture the complex interactions between genes, proteins, RNA molecules, metabolites and genetic variants in the cells of organisms
*
These networks, also interchangeably known as graphs, are representations in which the complex system components are simplified as nodes that are connected by links (edges)
*
Network medicine is largely discovery driven, rather than hypothesis driven, uncovering previously unknown relationships and leading to the identification of new biomarkers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Network-based studies have to primarily identify two things…?

A

*
what are the critical entities in the system under investigation (nodes)
*
what is the nature of the interactions between theseentities (edges)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what are the kinds of grraph is network medicine ?

A

Binary vs Weighted
Directed vs undirected

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

how is the Identification of disease associated network components within the interactome done ?

A

*
Consideration of the topological properties of the nodes and assess the functional role of their hubness which is the property of having a higher number of connections
*
Identification of new disease genes in the network by using “guilt-by-association“a property not based on direct evidence but association with other disease genes
*
Prioritization of candidate disease genes, molecular interaction networks assists in the identifification of sub-networks mechanistically linked to disease phenotypes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

how does the Co-expression based network modeling to identify disease biomarkers work ?

A

*
Patterns of transcript abundance are studied in the context of the disease after construction of Gene Co-expression Networks (GCNs)
*
Combination of important seed genes with an organic network of co-expression patterns derived from the gene expression data from the same system
*
GCNs identify the functionally coordinated participation of genes in response to an external stimulus or condition
*
GCNs can be signed or unsigned, weighted or unweighted, and may either be constructed using microarray or RNA-Seq data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

how are we Inferring ( forming an opinion ) Phenotype Specific Gene Regulatory Networks?

A

*
Separate networks can be built for each phenotype which may be case-control, disease-specific, tissue or cell-specific, sex-specific, or for different disease subtypes
*
Network comparison model stems from the axiom of “differential networking” over “differential expression”
*
The comparison of networks helps to uncover the specific rewiring of pathways, such as those induced by disease, pharmacological treatment, or environmental stimuli and more

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

The Future Needs in NM?

A

*Define as much as possible the biological heterogeneity to increase the precision of risk prediction and the personalization of prevention and intervention strategies
*Help the researchers to better understand the human physiological and clinical relevance (to avoid reverse technological processes) and to focus on the relevance for the patients needs
*Integrate data of different nature in a way able to rapidly reduce the dimensionality in order to distill implementable results in drug discovery/healthcare management

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what is the use of NM?

A

Disease Understanding: Network medicine enables researchers to characterize diseases as perturbations in complex biological networks rather than isolated anomalies. By mapping out the interactions among genes, proteins, and other molecular entities, network medicine provides insights into disease mechanisms, progression, and heterogeneity. This holistic approach aids in identifying novel biomarkers and therapeutic targets.

Personalized Medicine: By integrating patient-specific data, such as genomics, transcriptomics, and clinical information, with network-based models, personalized treatment strategies can be devised. Network analysis helps in identifying patient subgroups with similar molecular profiles and predicting individual responses to drugs, allowing for tailored therapeutic interventions.

Drug Discovery and Repurposing: Network medicine facilitates the identification of drug targets and the repurposing of existing drugs for new indications. By analyzing drug-protein interaction networks and their effects on disease-associated pathways, researchers can identify candidate compounds with therapeutic potential and optimize drug combinations for synergistic effects.

Systems Pharmacology: Network medicine provides a systems-level understanding of drug actions and their effects on biological pathways. By integrating pharmacological data with molecular networks, researchers can predict drug efficacy, side effects, and interactions, aiding in the design of safer and more effective treatments.

Biomarker Discovery: Network-based approaches help in the identification of molecular signatures and biomarkers associated with disease diagnosis, prognosis, and treatment response. By analyzing the connectivity and dynamics of biomolecular networks, researchers can uncover diagnostic markers for early disease detection and monitor disease progression.

Biological Network Visualization and Interpretation: Network visualization tools and software platforms allow researchers to visually explore and interpret complex biological networks. By representing molecular interactions as graphical networks, researchers can identify key nodes (e.g., hubs, bottlenecks) and pathways implicated in disease pathogenesis, facilitating hypothesis generation and experimental validation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what are Artificial
Intelligence ( AI)& Machine Learning (ML) ?

A

*
AI : the theory and development of computer systems able to perform
tasks that normally require human intelligence, such as visual
perception, speech recognition, decision making, and translation
between languages.
*
ML : The use and development of computer systems that are able to
learn and adapt without following explicit instructions, by using
algorithms and statistical models to analyze and draw inferences from
patterns in data.
-
-> Artificial intelligence is simulated intellectual tasks. Machine Learning is algorithms
trained on data to learn patterns to make predictions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Machine
learning use cases in life science
Genomics

A

Genomics

*
Variant calling
*
Genetic sequence
of a cancer e.g.
druggable targets
*
Functional
predictions

OMICS &
life
science
*
Risk factors (e.g.,
hypertension)
*
Integration of
Multiomics
*
Protein structure
predictions
*
DDI networks
*
Drug Discovery

Diagnostics
*
Images of
patients e.g. eye,
skin, hair
*
CT pictures e.g. of
the head , cancer
*
X ray films
*
Real time video
of a colonoscopy

Healthcare
Diagnostics
*
Alerts &
diagnostics from
ral time EHR data
*
Predictive health
management
*
Healthcare
provider
sentiment
analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what is the big difference between deep learning and machine learning ?

A

feature extraction is done manually in machine learning whereas in deep learning we don’t give it the features , it learns how to classify by itself

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

can we have both acuracy and interpretability in ML?

A

Trade
off between accuracy and interpretability for ML models

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

how does chat gpt work ?

A

The chat gpt splits the words to models
It predicts what word comes after the other

Possible
token levels
*
Sentence
*
Words
*
Subword
*
Character

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

how does supervised learning work ?

A

Supervised learning we give training data that is categorized
so then it can say if its good or bad for example ( binary )

What if we have more than one input ?
It can draw a line in two dimensions and categorise the elements

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

how does Unsupervised
learning work ?

A

“the data comes only with inputs
x but not output labels y,
and the algorithm has to find some structure or some
pattern or something interesting in the data.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Questions
, apply supervised or unsupervised learning algorithm

*
Given email labeld as spam /not spam , learn a spam filter

*
Given a set of published papers found on pubmed , group them
into sets of articles about the same research topic

*
Given a databse of expression data of patients , automatically
discover signals and group patients into different response
groups

*
Given a datasdet of patients diagnosed as either having
diabets or not, learn to classify new patients as having
diabetes or not

A

Supervised

Unsupervised

Unsupervised

Supervised

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

what is the basic principle of supervised regression learning ?

A

training set - learning algorithm = Feature - model- Prediction (Estimated y)

What
is f?
𝑓(𝑥)=𝑤𝑥+𝑏
Linear
regression with one
variable/ feature
=Univariate linear regression

Needed:

Matrix of features

Matrix of coefficient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Principle of machine learning algorithms

A

3 step process

Infer / Predict

Error / Loss

Train / Learn

-Predict : MOVE

-Error: BAD or GOOD

-Learn :Oh,
this was a
terrible
idea

-Reinforcment :
Well done , do it again
Model:
Decreasing or increasing the weights

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

what does the cost function do ?

A

Squared error cost function
calculates the distance( Mean Squared Error) from the correct value and then :

𝑓(𝑥)=𝑤𝑥+𝑏
Optimize w and b to get lowest Mean Squared Error ( sometimes this can be a loval minimum and thats a problem )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

what is overfitting ?

A

Overfitting is an undesirable machine learning behavior that occurs when the machine learning model gives accurate predictions for training data but not for new data. When data scientists use machine learning models for making predictions, they first train the model on a known data set. It is too fitted to the training data xw+x^2w + x^3w….+ b

21
Q

why is alphafold not perfect ?

A

Functional predictions of variants
Prediction of “
AlphaFold has not been validated for
predicting the effect of mutations . In particular,
AlphaFold is not expected to produce an unfolded
protein structure given a sequence containing a
destabilising point mutation.”

Best
assessment of whether a variant has structural or
functional impact also requires contextual knowledge

but

You can predict the function of variants with alphafold misssence

22
Q

can we predict
CYP2D6 phenotype with Machine learning ?

A

yes

and we ca do Functional assessment of
pharmacogenomic variants

Predicting with Machine learning for CYP2D6 we can skip annotation as star alleles and allocating numeric values of 1, 0 and we also get great results

There is other ways to predict also using star alleles

23
Q

what is Ensemble
or Metalearner?

A

In statistics and machine learning, ensemble methods use multiple learning algorithms to obtain better
predictive performance than could be obtained by any of the constituent algorithms.

You can use multiple machine learning and the algorith decides which ones are better- Superlearner pachage in R

24
Q

appllications ?

A

The hemoglobin levels and the amount of blood transfused can be estimated with less error than before because of ML

Supervised machine
learning methods trained using SNPs
and total baseline depression scores predicted remission
and response at 8 weeks with area under the receiver
operating curve (AUC) 0.7
70%
prediction acccuracy

Assesment
of drug drug interactions in polypharmacy using graph
convolutional networks

AI performs as well as doctors in university tests

25
Q

wha is the difference btween unsupervised machine learning and deep learning ?

A

Unsupervised Machine Learning:

In unsupervised learning, the algorithm is given a dataset without explicit instructions on what to do with it. The algorithm must find patterns, structure, or relationships within the data on its own.
Unsupervised learning techniques include clustering, dimensionality reduction, and association rule learning.
Clustering algorithms like K-means or hierarchical clustering group similar data points together based on their inherent patterns or similarities.
Dimensionality reduction techniques such as Principal Component Analysis (PCA) or t-distributed Stochastic Neighbor Embedding (t-SNE) aim to reduce the number of features in a dataset while preserving its important characteristics.
Unsupervised learning is often used for tasks such as anomaly detection, data compression, and exploratory data analysis.
Deep Learning:

Deep learning is a subset of machine learning that involves neural networks with multiple layers (deep neural networks). These networks can automatically learn hierarchical representations of data.
Deep learning models are typically trained using large amounts of labeled data, and they learn to extract features directly from the raw data without the need for manual feature engineering.
Deep learning has shown remarkable success in various tasks such as image recognition, natural language processing, speech recognition, and recommendation systems.
Common architectures in deep learning include Convolutional Neural Networks (CNNs) for image-related tasks, Recurrent Neural Networks (RNNs) for sequence data, and Transformers for natural language processing tasks.
While deep learning models can be used for unsupervised learning tasks (e.g., autoencoders for dimensionality reduction or generative adversarial networks for generating synthetic data), they are more commonly associated with supervised learning where they learn to map inputs to outputs.

26
Q

key points of machine learning and AI in pharmacogenomics?

A

Personalized Medicine: Machine learning and AI techniques enable the development of personalized medicine approaches in pharmacogenomics. By analyzing an individual’s genetic information, along with other relevant clinical data, these techniques can predict a patient’s response to a particular drug or dosage regimen.

Predictive Modeling: Machine learning algorithms can build predictive models that identify genetic markers or signatures associated with drug response or adverse reactions. These models can be used to stratify patient populations and guide treatment decisions, ultimately leading to more effective and safer drug therapies.

Drug Discovery and Development: AI algorithms can accelerate drug discovery and development processes by analyzing vast amounts of genomic and chemical data. These techniques help identify potential drug targets, predict drug-drug interactions, optimize drug candidates, and design more effective clinical trials.

Genomic Data Analysis: Machine learning methods are instrumental in analyzing large-scale genomic datasets, including genome-wide association studies (GWAS) and next-generation sequencing data. These techniques can uncover genetic variants associated with drug metabolism, pharmacokinetics, and pharmacodynamics.

Drug Repurposing: AI-driven approaches facilitate drug repurposing efforts by identifying new therapeutic indications for existing drugs based on their genomic and pharmacological profiles. This approach can expedite the development of novel treatments for various diseases.

Adverse Drug Reaction Prediction: Machine learning models can predict the likelihood of adverse drug reactions based on genetic factors, enabling proactive measures to mitigate risks and improve patient safety.

Clinical Decision Support Systems: AI-powered clinical decision support systems integrate genomic data with electronic health records (EHRs) to provide healthcare professionals with personalized treatment recommendations and dosage adjustments tailored to individual patients.

27
Q

Supervised Machine Learning (SML) methods we learned ?

A

Linear regression, K-nearest neighbours (KNN), Random Forest

28
Q
  • Unsupervised Machine Learning (UML)
A

K-means clustering, Hierarchical clustering, Principal component

29
Q

AI, Machine Learning, Neural Network and
Deep Learning. What’s the difference?

A

Machine learning (ML) is a subfield of AI, or
a path to AI
Algorithms to learn insights and recognise
patterns from data
Deep Learning and Neural Networks are
methods of ML
Deep Learning structures algorithms in
Neural Networks, with the aim of teaching
them to take decisions

30
Q

Supervised Machine Learning (SML) , how does it work ?

A

In SML, algorithms learn from labelled data
* Regression is used to understand the relationship between dependent and
independent variables
* Classification assign test data into categories based on specific variables

31
Q

Simple Linear (and logistic) regression , when can we apply it ?

A

Used to predict (forecast) the value of
the dependent variable based on the
independent variable
* Linear regression is applied on
continuous variables, whilst logistic
regression on discrete

32
Q

Simple linear regression, how does it work ?

A
  • Residuals can be used to validate the model by making sure that they are
    independent and normally distributed
  • As independent variables increases, multiple linear regression is applied

𝑦 = 𝑎 + 𝑏𝑥+ ∈

33
Q

Multiple linear regression, how does it work?

A
  • Builds a model to describe Y in the
    best way using Xn
  • Use independent variables to predict
    the dependent variable. Example:
    à Total Cholesterol = a + b1BMI +
    b2
    Time exercising +
    b3*Shoe size… + ∈
  • But is shoe size relevant?

𝑦 = 𝑎 + 𝑏!𝑥! + 𝑏”𝑥” + 𝑏#𝑥# + …+ ∈

34
Q

Multiple linear regression assumptions ?

A
  • Parametric test based on assumptions:
  • Linear relationship between Y and X
  • Xi are not highly correlated with each other
    -The variance of the residuals is constant
  • Independence of observations
  • Residuals are normally distributed
35
Q

how can we test a Multiple linear regression model?

A
  • Model can be tested with Root Mean
    Square Error (RMSE), the standard
    deviation of the residuals: adding all the residuals squared, deviding by the number sample size, quare rooting all
36
Q

how to use Multiple linear regression for prediction ?

A
  1. Create a random 80/20 split of the data, generating
    training data (80%) and test data (20%)
  2. Train a regression model on the training data
  3. Apply the model on the test data
  4. Calculate RMSE of the training data (in-sample RMSE)
    and test data (out-of-sample RMSE)
    * Compare the RMSE. Indicates how well the model
    performs on new data.
    * More complex model à Decreasing RMSE à Overfitting
37
Q

Linear regression models pros and cons ?

A

Pros:
* Can be used on continuous
(linear) and discrete (logistic)
data
* Determine influence of
independent variables on the
dependent
* Identifying outliers
Cons:
* No mixed data (continuous &
discrete
* Many assumptions
* Requires complete data and no
missing data

38
Q

K-nearest neighbors (KNN) , how does it work ?

A
  • Non-parametric algorithm i.e. no
    strong assumptions
  • Often used for classification,
    predicting the group of a data point
  • Applies majority voting based on:
    Distance metrics
    Number of K’s
  1. Calculate the distances, usually with Euclidean distance
  2. Find the nearest neighbours by ranking the distances
  3. Majority vote on the predicted class label based on the
    K nearest neighbours

K is the number of nearest neighbors taken into account

39
Q

KNN pros and cons ?

A

Pros:
* It is easy to implement
* No need to train a model
* Versatile, distance algorithms can handle different types of data
Cons:
* Data should be of the same scale which can be difficult with large datasets
* Setting the K can be challenging
Tips:
* Test different K’s
* K should be odd numbers to avoid any draws

40
Q

Decision tree and random forest, how does it work ?

A

Random forest is based on
decision tree’s
* Generates many decision tree’s
creates the random forest to
classify unlabeled dataà A single tree is not accurate
* Can use both categorical and
continuous variables

Random forest
1. Create a bootstrapped dataset that is the
same size of the original
à Randomly selected data, where duplicates
are allowed
2. Create a decision tree using the
bootstrapped data using a random subset
of variables
3. Repeat 1 and 2 multiple times
4. Impute your unlabeled data and let the
random forests’ many classifiers label
5. Majority vote classifies the unlabeled data

Random forest validation with Out-of-Bag

  • The Random forest model can be
    validated using the Out-of-bag
    error
  • The Random forest is used to
    predict labels of data not
    selected for the bootstrapped
    data (test set)
41
Q

Random forest pros and cons ?

A

Pros:
* Can be used on many types and
mixes of data
* Can be applied on both
classification and regression
problems
* Can be applied on data with
missing values
* No overfitting and curse of
dimensionality
Cons:
* Very complex and you can’t follow
the decision of the tree
* Training the model takes time and
computing power

42
Q

types oreasons to use Unsupervised machine learning (UML) and how does it work ?

A
  • In UML, algorithms are used to analyze and cluster unlabelled data
    àData grouping based on patterns
    àSimilarities and differences of the data
  • Clustering is applied on raw data and groups it based on similarities and
    differences between the structure and/or patterns of the data
  • Dimensionality reduction can be applied to reduce complexity of data
    whilst preserving the structure to reduce ”noise” and overfitting ML
    algorithms.
43
Q

K-means clustering ?

A
  • Not to be confused with KNN
  • Groups similar datapoints in
    clusters
  • K is the number of cluster and
    means generated
  1. Set the number of K’s
    With Elbow plot
  2. Generates K random centroids
  3. Creates K clusters by assigning each
    data point to closest centroid
  4. Calculates new centroids for each
    cluster
  5. Reassigns points with new centroids
    If new assignments, repeat 4
    If no new assignments, terminate
    algorithmElbow plot determines number of K’s
    * First step of K-means clustering is
    to set the K
    * The Elbow method is common
    * Distortions is the sum of squared
    distances of data points from
    cluster centers
    -Decreases as K increases.
    -0 when K = number of points
44
Q

Hierarchical clustering, how does it work ?

A
  • Groups similar data points to clusters
  • Defines clusters that are distinct from
    each other and datapoints within are
    similar
  • Creates cluster by ordering clusters:
  • Bottom-up (Agglomerative)
  • Top-down (Divisive)
  • The length of the branch in the dendogram show
    how similar the data points are.
    à Long branch = dissimilar, short branch = similar
45
Q

Hierarchical clustering pros and cons ?

A

Pros:
* Easy to use
* The dendrogram gives information
about the data structure
* Can be used to set number of
clusters
Cons:
* Sensitive to outliers
* Does not work well with missing
data or mixed data
* In complex data, difficult to
determine number of relevant
clusters

46
Q

Principal component analysis (PCA)?

A

Common and versatile method used for:
* Analysing the structure of data
features
* Pre-processing for other ML
algorithms
* Visualisation
Summarises large multi-dimensional
datasets to smaller number of dimensions
(ideally 2) that can be visualised

  1. Plot the data. Gene 1
    & 2 is higher in sample
    1 & 2…
  2. Calculate the average of
    gene 1 and 2 (and n) to find the
    center of the data.
  3. Center the data at
    the origin (0,0)

Find the line, through the origin, with the best fit. The best fit is defined by PCA
projecting the distance of the point to the line and minimizing it.
The line is called Principal Component 1 (PC1)

The eigenvectors are
calculated.
Higher loading indicated
more influence on the PC
i.e. Gene 1 (0.82) influence
more than Gene 2 (0.57).

Multi-dimensions and PC n
* PC2 is perpendicular to PC1. PC3 is
perpendicular to PC1 and PC2 etc.
* PCs are the same number as genes
* PC1 explains most of the variance in the
data. P2 the second most etc.
* Projection in 2D, so two PC’s are projected

  • The datapoints are projected onto PC.
  • Hopefully, we see some clustering…
47
Q

PCA pros and cons ?

A

Pros:
* Can remove noise (correlated
features)
* Improve ML algorithms by
removing noise
à Reduces overfitting
* Visualisation
Cons:
* PCA turns independent variables
to PC’s which can be hard to
interpretate
* Requires standardised data and
therefore does not work well on
mixed data
Karolinska Institutet 03/02/2023 35
tSNE and UMAP are advancements of
PCA, projecting the data better
making clustering easier

48
Q

how is actually ML being used in medicine and
pharmacology?

A
  • ML algorithms are used together
  • Nested in networks or parts of
    pipelines
  • Used as tools, from a ML toolbox
  • Important to know when and
    why to use it
49
Q

GestaltMatcher and Face2Gene is an example of the use of which ML type ?

A

Supervised classifiers are
often used in image
analysis, for example when
diagnosing rare diseases.
Here, KNN is nested into
a Deep Neural Network.
Datapoints in the KNN is
other phenotype patients

50
Q

what does DESeq2 do?

A
  • Most used method in analysing bulk RNAsequencing
    data
  • Other methods are limma and edgeR. Commom
    aim is to find differentially expressed genes
    (proteins, lipids etc.)