HC 8 - Multi-omics: Integration & Interaction Networks Flashcards
hoorcollege 8
Why multi-omics studies?
Because omics levels interact
Challenge of multi-omics
Integration of multiple omics
Why integration of multiple omics datasets?
> Describe parts of unknown system and effect after perturbation (alteration in function/ disruption)
find a good marker of system state or change
pinpoint a certain mechanism: which molecules are impotant for development or treatment of a disease in a mechnism
Systems biomedicine: holistic measures for preventive medicine
Which kinds of measurements are important in systems biomedicine?
-Omics measurements
-diet, cardiovascular data
-Coaching sessions
P4 participatory medicine
Systems medicine, big data and patient involvement leads to predictive, preventive, personalized and participatory medicine.
Characteristics of systems medicine
-Pro-active patient involvement
-Earlier diagnose, earlier treatment
-Effective stratification: personal treatment (more effective)
-reduction of time, costs, and error margins
Integration of datasets in systems biomedicine
Integration of personalized omics and clinical phenotyping
Synergy
Wet-lab experiments, bioinformatics and computational modeling
> communication between life scientists, medical doctors and computational scientist
Challenges multi-omics integration
- There are differences in the data
> Different technical limitations
> Different dynamic ranges
> Different number of analytes - There are differences in time scales
-Incongruent analytes
-Unknown analytes
-Missing links
The differences of dynamic ranges between transcriptomics, and metabolomics in volcano plot
Much more significance for the RNA, because of higher dynamic range in sequencing than in massspec
Differences in time scales: metabolomics, proteomics, transcriptomics and phosphoproteomics life spans
-Phosphoproteomics: 15-200 s
-Metabolomics: <1 min
Transcriptomics: 10 hr
Proteomics: 1 day
Differences in time scale: order of pace/resp. time for metabolomics, proteomics, transcriptomics and phosphoproteomics
Phosphoproteomics: 15-200 s
Metabolomics 0.1 micros - 10 s
Transcriptomics: 10-100 nt / s
Proteomics: 10 aa/s
Problem with different time scales
When are you performing the measurement: when it is important for metabolomics bc of predicted difference? on that moment: is there interesting information about other omics?
> if you measure different time points, can you compare those
Problem of different data
if you measure more transcripts than metabolites due to different technical limitations, dynamic ranges and amounts of analytes > problems with isoforms and difficult in statistic tests > which data are comparable
Incongruent analytes
If you got a protein, is this connectable to a specific isoform of a transcript?
Missing links
Which metabolite belongs to which enzyme?
Approaches systems biology
-Bottom up
-Top down
Bottom up systems biology
-You use prior knowledge about the system and make a mathematical model based on this to learn something about the whole system
> e.g. metabolite reaction of the glycolysis enzymatic reaction: what happens to X,Y and Z when the input (glucose) is lower
> first less X then Y then Z until steady states: formulate differential equations
Bottom up with non-linear systems: the problem
> even simple systems may not allow us to make reliable predictions regarding their responses to stimuli > no obvious responses to changes in input
we cannot longer rely on intuition for predicting response of such systems
Emergent properties
Systems properties that differ from the properties of the systems parts
> result from interactions
> self-organisation
Top-down systems biology
-Measuring omics
-Statistical method to find out what parts you measured for process of interest
Can top-down systems biology detect the non-linear properties and model them?
Yes, but that is easier to see based on bottom-up approach. In top-down, it is more complicated.
Approaches multi-omics integration
-Using sequence information (top-down)
-Using purely (semi-) quantitative omics data (top-down)
-Using knowledge and omics data (bottom-up)
Multi-omics integration: using sequence information
-Based on steps of central dogma
-e.g. measure mRNA ratios and gene functions based on clusters with protein and transcript stability > correlation protein and mRNA amounts depend on function for the cell
Integration based on quantitative data: what is quantitative data?
Something like an amount of molecules in a certain volume or per cell
Semiquantitative
You know that there is more/less phosphorylation but not exact for example
A small proportion of the dataset from the quantitative omics approach is categorial data like:
-Conditions
-Experimental treatments
-State of health
Integration based on quantitative data: common dimensionality reduction
Reduce features across samples for multiple omics to a single matrix by using factors like conditions, experimental treatments, health, age, organisation, tissue, time points and give weights for contribution of the features to the integrated matrix of factors against samples
> factors: the reduced dimensions from different dimensions
> integrate as samples against factors (like with PCA)
The position of every dot in a factor dotplot of the samples is determined by
the interaction/connection of multiple omics levels
> 1 set value per sample
> which genes were important for placing dots: dependent on the weights on different genes
Integration based on quantitative data: predictability
How good is 1 omics level in predicting the other
> e.g. effect microbiome on metabolome of host
> can the microbiome predict metabolites in the blood?
> microbiome ko mice in germ free environment
> measuring all microbes and analysis of every metabolite > use different regression for every metabolite with use of microbiome
Why integration based on quantitative data?
-Pinpoint mechanism (what is the reason for disease) > not possible, a causal relationship cannot be measured
-Describe parts of unknown system (what happens after perturbation)
-Fund good marker for system state or change (what is specific for disease)
What do you do with metabolites in blood which are well predictable by microbiome?
Check the function since they might be related to microbiome > bile acid metabolism
What is a biological model?
A (mathematical) description or representation of a biological system
> falsifiable simplification, usually of some predictive value
> is not a statistical model
Biological systems characteristics
-Non-linear
-Dynamic
-Have emergent properties
-Span scales (size, space, complexity, time)
Bottom up systems biology uses
mechanistic models
Integration based in quantitative data: association networks: what is association?
-Observed together
-Similar change (correlation)
-Other dependence/predictability (anti-correlation, mutual information)
How to make an association network with different omics levels
-Make nodes of different omics variables (metabolites, genes etc)
-Make edges between nodes with association
-Label edges red or blue for negative / positive correlation
What is mutual information
How much information of variable B rests in variable A
Correlation graph and associations
Positive correlation: positive slope
Negative correlation: negative slope
In association molecules, different molecules from different omics levels are compared …
separately
Correlation can be unjustly. How to test for this?
Permutation test
> watch out for false positives
What is a network?
-Consisting of objects with pairwise relationship
-Objects: nodes (NL: knopen)
-Connections: edges (NL: kanten)
-Nodes and edges can have attributes
-Mathematicans call a network a graph
Why use networks
-Show interaction and test hypotheses
-Find out which objects show same patterns of behavior (delineate components of same interaction patterns)
-identify components with special interaction patterns
-maths needed, but less knowledge needed than for differential equations
-Visual
-intuition of a connected world (with hubs and centrality)
Sorts of networks
-Static vs dynamic
> which nodes are close to each other and which distances change?
-Directed vs undirected
> do the edges have a direction like kinase for protein
> could be both ways
-Uni-, bi- or multipartite
> which types of nodes are in network (levels) and how are they connected?
-Weighted vs unweighted
> weights, labels
Bipartite
For example TFs and DNA
> DNA can interact with TFs but not with other DNA
> TFs can interact with DNA but not with other TFs
-Only from other levels
Unipartite
All levels can interact > all from the same group
Weights/labels in networks
-Thicker edges for stronger interactions
-Labels: positive or negative interactions
> mathematically needed otherwise all edges are understood as equal importance
Which network to choose?
Dependent on question: is direction important, or is that not possible (when measuring correlation: no direction)
Ways of presenting a network?
-Edge list: table which shows one edge per row
> two columns of source node and sink node
-Adjacency matrix: table which shows pairwise interactions as binomial values.
> if directed, the rows are source nodes and the columns are sink nodes
> can be weighted: numbers larger than 1 for interaction
What does a network tell us?
-Topological characteristics
> of network: connectedness, path length, how large is the number compared to whole number
> of node: centrality, degree (amount of edges), clustering
> of edge centrality, load
> of sets of nodes/edges: participation in paths, circles, modules, motif: amount of nodes in a structure
Networks to search subnetworks containing …
-Many nodes of interest
-Relevant paths
Correlation network can be used to compare ..
groups with disease or healthy for example
Integration of multi-omics: use prior knowledge and omics data (bottom-up)
Knowledge-based integration
> show interaction between two omics levels with use of a model (of metabolism f.e.)
> you know that the mechanism of interaction exists but not how it changes
How is the knowledge for knowledge-based interaction derived?
F.e. protein-protein interaction from small studies with specific interest of a specific TF and TFs with which it interacts
Detection protein interaction with interactomics: how is the method called and how does it work?
Yeast-to-hybrid screen
> test if protein A interacts with protein B
> split a TF of a reporter gene in two parts (DB, AD), each bound by A and B (DB-A, B-AD)
> if interaction > normal function of the DB-AD transcription factor > expression reporter gene
-Performed in yeast
-Measure for 17,500 proteins as A and 17,500 as B
How is the knowledge of transcription-factor targets derived?
-ChIP-seq
-promotor sequence
-small scale studies
Where is he knowledge about phosphorylation and metabolism derived from
Small-scale studies > databases
+ text-mining + real reading
Integration approach (bottom-up) first step
Analyses each dataset on its own and look for similar patterns
> identify problems: difference in technical limitations, dynamic ranges and amounts of analytes
Integration approach bottom-up second step
Enrich analysis with prior information from ChIP-seq for transcription factor targets f.e.
> kinase and phosphatase targets
-Then: activity calculation
Activity calculation based on connecting transcripts and protein phosphorylation
Which transcripts/kinase or phosphatase have which TFs connected
> activity difference between tumor and healthy > f.e. a kinase is more active in tumor sample
-Then build multi-omics networks of transcripts, protein phosphorylation and metabolites based on model of the signalling and metabolism.
COSMOS network
multi-omics integrated network by COSMOS software
> some nodes are measured and some are from databases
Analysing a network
-Smart algorithm: which possible paths have the largest possible effect or agree best with measurements
-Highest scoring subnetwork: subnetwork with best agreement
> like tumor suppressors, or purine metabolism
> forming of new hypothesis: are these nodes good markers for tumors f.e.?
Validation of hypothesis based on network
Different multi-omics dataset needed
> do you find the same subnetworks?
When the follow-up experiment is the investigation of a specific response, what needs to happen with the amount of replicates in the experimental design?
-Possibly less
> no need to adjust FDR (FDR is lower)
> less needed for statistical power
> targeted assay is more precise
-Possibly more
> because measurement is simpler, cheapre, quicker
> we may want to reach more certainty or be more general
How can a study of microbiome as predictors for metabolites be validated? Microbe to metabolite
By growing microbes in a medium with heavy isotopes of carbon
> metabolomics with TOF
> identify labeled metabolites
> calculate ratio labelled/unlabelled
> compare colonized and control
- vice versa: inject mouse with isotope labelled threonine and analyse gut content
> you know the direction, the metabolites are derived from microbes
Validation: microbial metabolite to effect
> find candidate gene in metagenome
express enzyme for microbial metabolite biosynthesis in lab-bacteria
extract metabolites from bacteria
check for presence of metabolite
check for effect
Validation: screening for signalling
-Extract metabolites from microbiome
> fractionate
> determine metabolites
> incubate cell lines expressing receptor-reporters with metabolite fractions
> one cell line per receptor
> retrieve receptor-metabolite pairs
Challenges for multi-omics integration: , which approaches?
-Differences in timng
> Sequence information
> Quantitative data
-Incongruent analytes
> Sequence information
> Knowledge-based
-Different technical limitations and biases, dynamic range, number of analytes
> Quantitative data
-Unknown analytes or missing links
> Knowledge-based