Chapter 11 + 12 Flashcards

Corresponding to the HC's of week 1 of the course's second part

1
Q

Similarities between data analysis pipelines between different omics approaches

A

-All technologies yield many measurements for each sample
-Same way of handling dimensionality
-yields hundreds or thousands of variables per sample like different genes, proteins or metabolites

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Samples organised in matrix

A

Rows: the samples
Columns: the variables (like genes)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Four components of the generalized data analysis pipeline

A
  1. Experimental design and data collection
  2. Data preprocessing and quality control
  3. Data analysis
  4. Biological interpretation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

The experimental design can have large impact on the statistical power and therefore the …

A

Conclusions that are reached

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

First step in experimental design

A

Frame a biological question

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the aim of the biological question?

A

Determine the hypothesis that will be tested and the statistical test that will be executed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does the biological question determine which is needed for an interpretable and successful outcome?

A

The experimental preconditions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Three types of main objectives which require a different type of experimental design

A
  1. Detection of responsive features under controlled experimental conditions (perturbation study)
  2. detection of biomarkers
  3. identification of regulatory or mechanistic relationships between variables
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Experimental designing after biological question

A

Identify noise factors and design the experiment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Noise factors

A

Factors that can disturb a proper measurement (from the biological experiment up to and including the measurement)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Noise factors can lead to …

A

bias

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Three basic principles to deal with noise factors.

A
  1. Replication
  2. Randomization
  3. Blocking
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the aim of the experimental design?

A

Ensure reliable measurements free from bias

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Replication

A

Duplicate, repeat or perform the same measurement more than once
> obtain an estimate of the experimental error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

On what factor is the type of error which is estimated with replication dependent?

A

On how the replication is done
> For estimating and controlling biological variability: different organisms or batches of cells samples should be processed in the same manner.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Types of replication errors

A

-Repeatability: error based on repeats of sample measurement (same sample)
-Reproducibility: error based on sample workup or sampling/the whole experiment (larger errors)

17
Q

Types of replicates

A

-Biological replicates: error based on the whole experiment (also the organisms) > not interested in 1 individual
-Technical replicates: to gain statistical power

18
Q

Randomization

A

Requiring the experimenter to use random choices for every factor that is not of interest but might influence the outcome of the experiment
> random selection of individuals for groups
> hybridization of mRNA samples from treatment and control group: sensitive for external factors: important to not measure all controls first and then all treated: impossible to distinguish between time effect (not interesting) and treatment effect (interesting)

19
Q

Confounder

A

A Confounder is a variable whose presence affects the variables being studied so that the results do not reflect the actual relationship (e.g. time: randomize over time to eliminate the bias)

20
Q

Blocking

A

Arranging experimental samples in groups (blocks) that are similar to one another
(e.g. gender, or different columns)
> but within the groups the variation of treated/control needs to be similar
> or: blocks because not all measurements can be done on one day
> eliminating confounding effect of gender or LC column

21
Q

General rule for blocking

A

Block what you can, randomize what you cannot (treated/control is not blockable)

22
Q

Which instruments show drift in time?

A

GCMS and LCMS (for metabolomics and proteomics)

23
Q

Where is the order in which the samples defined and why is it of importance?

A

In the measurement design: important because in LCMS or GCMS when the number of samples is large and several batches are needed, instrumental drift causes samples to be measured in the beginning to be slightly different than when measured at the end of the series.

24
Q

Why is randomization crucial in different batches in the LCMS or GCMS

A

Because of instrumental drift, when no randomization is performed, the observed difference could be only due instrumental drift and there is bias. the actual results are not destinguishable from the bias.

25
Q

In data preprocessing: disturbances need to be removed from data which can enter during sampling, sample workup, measurement. Which two types of disturbances do we know?

A

-Disturbances of a whole sample
> different amount of sample measured
> different dilution of samples
> sample workup unequal
> effect of order of measuring (begin/end of day)
-Disturbances of a single variable within a sample (e.g. singel metabolite)

26
Q

Which methods of preprocessing are used for correction of whole sample disturbances?

A

Normalization methods

27
Q

Normalization methods

A

-Internal standard
-QC samples

28
Q

Internal standard

A

Compound added to each sample in equal amount which does not occur naturally
> intensity of the standard has to be the same in all samples
> difference across samples: correction of all variables with same factor

29
Q

Quality control samples

A

For correction of instrumental drift
> pooled samples are used
> after each 8 samples the QC sample is measured
> many QC sample measurements over whole day
> intensity of each metabolite should be the same but due to instrumental drift it may not be the same at different time points
> use differences to correct studied samples inbetween the QC samples

30
Q

Normalization

A

Correct for different dilutions e.g. urine is less diluted in the morning: correction by certain ‘concentration measure’ > not true concentrations anymore but samples are better comparable.

31
Q

Which correction methods are used for single variable disturbances due to column aging in LCMS/GCMS?

A

Alignment methods for aligning peaks at different retention times for different samples such it is clear they belong to the same variable (metabolite/protein)

32
Q

Correction methods for baseline (background signal) is unequal to zero?

A

Background correction methods

33
Q

How is clean data stored after data preprocessing?

A

data matrix
> for metabolomics data, after preprocessing and normalization: normalized data matrix: starting point for data analysis and biological interpretation

34
Q

What can a zero mean in the data matrix?

A

Not present, or below detection limit.

35
Q

Are different variables always measured in the same units?

A

No, but often yes

36
Q

11.4 data analysis

A