HC 7 - Metabolomics Data Analysis 2: Biomarkers & Validation Flashcards

hoorcollege 7

1
Q

PCA scores and loadings

A

Scores: describe X
Loadings: variables important differences between samples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

DPLS scores and regression coefficients

A

PLSscores: describe X + predict Y
PLS regression coefficient: important variables for discrimination

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

After a biomarker pattern is found, what is done next?

A

> Statistical validation and repeat steps from clean data to biomarker pattern
Biological and/or external validation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Biological validation

A

New experiments
> PLS calculates scores and regression coefficients, but is the analysis correct?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

PLS model: multivariate model

A

Use coherence to make a model for discrimination of classes
> you need a PLS method which describes how the coherence between the variables is
> you want a situation with less variables than groups
> composed variables into scores

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Optimistic bias in statistic models

A

All statistic models are too optimistic because the parameters are estimated from the data used for creating the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Validation of model principle

A

Can the classification model preduct the health status of new individuals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Statistical multivariate model validation approaches

A

-Average prediction error
-Statistical significance (p-value) of multivariate model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Main points Average prediction error

A

-Confidence intervals for new samples
-Separate blinded test set
-For small number of samples: cross-validation: keep one individual away and use model to predict.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Main points statistical significance of multivariate model

A

-For H0 distribution > is the difference larger than expected for equal groups
-Permutations: classify each individual randomly, does this situation differ from the prediction model?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Average prediction error: test set validation

A

Use b-PLS coefficients to predict class of a testset from which the class is known. Measure the prediction error of the test set.
(e.g. the y-pred is 0.3 but it is a healthy individual who should be 1 and threshold is 0.5)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Summarize prediction errors in a confusion table. How?

A

the columns are categorized Positive and negative as true condition (for patient and healthy for example) and the rows positive and negative for prediction.
> So the cell with negative true condition and positive prediction represent the false positives, and so on

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Sensitivity

A

True Positives / (True Positives + False Negatives)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Specificity

A

True Negatives / (True negatives + False positives)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Cross validation

A

For small sample size, divide samples in Xtest and Xtrain
> make an optimalized model M with Xtrain (M = bPLS coefficients)
> Use M to predict Y for Xtest
> Repeat with different Xtrain and Xtest until each sample has been in a test set
> Measure numbers of misclassifications

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

When using your training set as a test set, then …

A

there is a biased prediction (too optimistic for new data)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Statistical significance validation

A

-Make permutations of class labels (or y-values)
-Make new models between X and y that should represent situations where there is no link between X and y
-Compare original prediction error with the prediction errors of many models of permutated data
-calculate p-value

18
Q

How to calculate p-value for statistical test for validation of model

A

p < (1+number of permutation models better than original) / all permutations

19
Q

PLS-DA validation with cross model validation and permutation tests

A

> measure misclassifications cross validation
calculate p-value with permutation models

20
Q

The p-value for permutation validation is significant for metabolites which …

A

are the best indicators for the coefficients and the classes

21
Q

What is the H0 for permutations?

A

During permutations, bPLS coefficients for the meaningless permutation models are made. The H0 is made from the amounts of misclassifications

22
Q

What are the expected bPLS values for the permutation models?

A

There is no relationship between X and y, therefore there is no effect for this variable and they are expected to vary around 0

23
Q

If the variable (metabolite/biomarker) is important, then the bPLS coefficients from the model should be …. than the bPLS coefficients from permutation

A

Larger or smaller.

24
Q

Visualize a plot which shows variables raked by permutated coefficients on the x-axis and coefficients on the y-axis. Black lines are shown as a sideways parabolic around 0 and a red spikey curve inbetween with red circles where the red line crosses the black line. What could this mean?

A

-The back lines indicate the 95% confidence interval H0 from permutaiton tests (H0: PLS reg coeff = 0)
-The red line: PLS Reg Coeff from original model per variable
-Red circles: points where the original model reaches out of the 95% CI of the permutation model PLS Reg Coeff and therefore is significantly unequal to the bPLS Reg Coeffs of the permutation test. These variables are good biomarkers

25
Q

Explain the following concepts in one sentence: Cross validation, training set, test set, permutation test

A

-Cross validation: with a small number of samples
-Training set: to create models (estimate loadings/coefficients)
-Test set: new data to test PLS model regression coefficients by predicting the class of new samples
-Permutation test: random ordering of class labels

26
Q

Do you make a new model every time when you take individuals from the data for prediction in cross validation?

A

Yes, with cross-validation you create a new model every time, your training set is also always a little different because you leave out other samples.
- You take out a sample and use the rest as a training set and the one sample(s) you take out use it as a test set. Instead of cross-validation, you prefer to use a good test cohort in which validation of the model can be done for individuals other than those from the training set.

27
Q

PCA loadings (P) give information about

A

The most important variables to describe variation between samples in the data

28
Q

DPLS: bPLS coefficients give most information on

A

Important variables for discrimination

29
Q

Data interpretation, what does it mean if a certain group of metabolites is more abundant in healthy or disease condition?

A

Then they may be part of a network, or pathway
> pathway analysis
> network visualization
> Comprehensive metabolite databases

30
Q

Lets say you have a cumulative predictive accuracy list, and the top metabolite is maltose. What happens if we remove maltose from the prediction model?

A

The predictive accuracy becomes a lot smaller

31
Q

To identify groups of metabolites which are very different between the groups and therefore predictive, what can be done with the cumulative prediction accuracy list?

A

Give metabolites color codes for different metabolite groups, like glucose metabolism for maltose

32
Q

How do you recognize a specific over-represented group of metabolites in the cumulative accuracy list?

A

Lots of metabolites of same group in the top

33
Q

For biological interpretation of the group of biomarkers, it is important to change interpretation from … to …

A

single metabolites to groups or pathways like biochemical pathways

34
Q

Over-representation analysis input and output

A

Use a list of significant metabolites (not ordered)
> metabolite set enrichment
> over-representation analysis
> result: biological processes which are different between the groups

35
Q

Metabolite set enrichment

A

Give weights to metabolites based on relevance (fold change or importance)

36
Q

Single Sample Profiling

A

-Metabolite concentrations fall in normal range, if not, compare deviation pattern to known causes

37
Q

Over-representation analysis uses a …. distribution

A

hypergeometric

38
Q

Binomial coefficient

A

n over k > in how many ways can I select k objects from a group of n?
(n k) = n! / k!(n-k)!

39
Q

Over-representation analysis using hypergeometric distribution. Use 52 metabolites divided into 4 pathway groups (A-D) of each 13 metabolites. If you select 12 metabolites, what is the probability that 10 are from pathway A (because this was actually measured)

A

Probability using binomial coefficients = ( (13|10) * (39|2) ) / (52|12)
-(13|10): 10 metabolites from 13 possible in A
-(39|2): 2 metabolites from 39 of B-D.
-(52|12): all possibilities of selecting 12 from 52.

40
Q

If the probability from the over-representation is very small, then …

A

there is a systematic effect between two groups, which is important (we say the chance is too low to consider the result being chance, the disease (or class) is related to this metabolite group > pathway A differs between the two groups)