Data Integration Flashcards

1
Q

Data integration

A

take multiple datasets and bring them together

Conceptual integration
Statistical integration
Model-based integration

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Source-matched

A

Use the same source for the sample but to collect different information/samples. For example you take the urine and blood samples of the same people then you try to integrate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Split sample study

A

You split in half all the samples you took and you can make different measurements
Problem: need to have a large enough sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Conceptual integration

A

the situation where multiple omics data sets are analysed separately, and then, the resulting conclusions are matched without any further analysis of the data set as a whole

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Statistical integration

A

statistical associations are sought between the elements from the different data sets.

Correlation-based integration
Concatenation-based integration
Multivariate-based integration
Pathway-based integration

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Model-based integration

A

computational model of biological preknowledge to generate data that is not yet available

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Repeated study

A

Redo the entire experiment. Not same sample, not done at the same time.
Problem is that its not reproducible
Batch effects: cant correct
BUT independent measurement ! -> can take advantage for the analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Replicate matched study

A

Take all samples twice
differs critically from a repeat study, as the samples for both omics are produced/obtained at the same time, and thus the introduction of batch effects is avoided.
NO statistical independence
Use when not possible to split sample !

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Concatenation-based integration

A

Put both dataset after one another and perform the analysis
=> different omics have different distributions and background noise. Can have different number of parameters could give too much weights ( can normalize), entities tend to cluster

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Correlation based integration

A

Similarity measures between the 2 datasets
metabolites can have postive/negative correlation under one set of conditions and none under another. These relation can cancel out and the correlation be hidden when putting the 2 sets together.

Correlation can be used to compare things through time. Find alignment because dont occur simultaneously. Dynamic time wraping

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Multivariate Data integration

A

PCA: principal component analysis (graph will ellipse). Mathematical model links the 2 matrices. Partial Least Squares is related to it. Each direction represent the most variance, copes well with colinearity. Principal components and latent variables represent the covariance. PLS DA is regression model to predict classes. O PLS alows to rotate the model to have othogonal or parallel directions. Can have vertically what is not important to seperate class and horizontally what is. Easier to distinguish

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Pathway based integration

A

pathway contain gene/proteins and metabolites

easy for interpretation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly