Chapter 11 + 12 Flashcards by Tobias H

Similarities between data analysis pipelines between different omics approaches

-All technologies yield many measurements for each sample
-Same way of handling dimensionality
-yields hundreds or thousands of variables per sample like different genes, proteins or metabolites

How well did you know this?

Not at all

Perfectly

Samples organised in matrix

Rows: the samples
Columns: the variables (like genes)

How well did you know this?

Not at all

Perfectly

Four components of the generalized data analysis pipeline

Experimental design and data collection
Data preprocessing and quality control
Data analysis
Biological interpretation

How well did you know this?

Not at all

Perfectly

The experimental design can have large impact on the statistical power and therefore the …

Conclusions that are reached

How well did you know this?

Not at all

Perfectly

First step in experimental design

Frame a biological question

How well did you know this?

Not at all

Perfectly

What is the aim of the biological question?

Determine the hypothesis that will be tested and the statistical test that will be executed.

How well did you know this?

Not at all

Perfectly

What does the biological question determine which is needed for an interpretable and successful outcome?

The experimental preconditions

How well did you know this?

Not at all

Perfectly

Three types of main objectives which require a different type of experimental design

Detection of responsive features under controlled experimental conditions (perturbation study)
detection of biomarkers
identification of regulatory or mechanistic relationships between variables

How well did you know this?

Not at all

Perfectly

Experimental designing after biological question

Identify noise factors and design the experiment

How well did you know this?

Not at all

Perfectly

Noise factors

Factors that can disturb a proper measurement (from the biological experiment up to and including the measurement)

How well did you know this?

Not at all

Perfectly

Noise factors can lead to …

bias

How well did you know this?

Not at all

Perfectly

Three basic principles to deal with noise factors.

Replication
Randomization
Blocking

How well did you know this?

Not at all

Perfectly

What is the aim of the experimental design?

Ensure reliable measurements free from bias

How well did you know this?

Not at all

Perfectly

Replication

Duplicate, repeat or perform the same measurement more than once
> obtain an estimate of the experimental error

How well did you know this?

Not at all

Perfectly

On what factor is the type of error which is estimated with replication dependent?

On how the replication is done
> For estimating and controlling biological variability: different organisms or batches of cells samples should be processed in the same manner.

How well did you know this?

Not at all

Perfectly

Types of replication errors

Study These Flashcards

-Repeatability: error based on repeats of sample measurement (same sample)
-Reproducibility: error based on sample workup or sampling/the whole experiment (larger errors)

Types of replicates

Study These Flashcards

-Biological replicates: error based on the whole experiment (also the organisms) > not interested in 1 individual
-Technical replicates: to gain statistical power

Randomization

Study These Flashcards

Requiring the experimenter to use random choices for every factor that is not of interest but might influence the outcome of the experiment
> random selection of individuals for groups
> hybridization of mRNA samples from treatment and control group: sensitive for external factors: important to not measure all controls first and then all treated: impossible to distinguish between time effect (not interesting) and treatment effect (interesting)

Confounder

Study These Flashcards

A Confounder is a variable whose presence affects the variables being studied so that the results do not reflect the actual relationship (e.g. time: randomize over time to eliminate the bias)

Blocking

Study These Flashcards

Arranging experimental samples in groups (blocks) that are similar to one another
(e.g. gender, or different columns)
> but within the groups the variation of treated/control needs to be similar
> or: blocks because not all measurements can be done on one day
> eliminating confounding effect of gender or LC column

General rule for blocking

Study These Flashcards

Block what you can, randomize what you cannot (treated/control is not blockable)

Which instruments show drift in time?

Study These Flashcards

GCMS and LCMS (for metabolomics and proteomics)

Where is the order in which the samples defined and why is it of importance?

Study These Flashcards

In the measurement design: important because in LCMS or GCMS when the number of samples is large and several batches are needed, instrumental drift causes samples to be measured in the beginning to be slightly different than when measured at the end of the series.

Why is randomization crucial in different batches in the LCMS or GCMS

Study These Flashcards

Because of instrumental drift, when no randomization is performed, the observed difference could be only due instrumental drift and there is bias. the actual results are not destinguishable from the bias.

In data preprocessing: disturbances need to be removed from data which can enter during sampling, sample workup, measurement. Which two types of disturbances do we know?

-Disturbances of a whole sample > different amount of sample measured > different dilution of samples > sample workup unequal > effect of order of measuring (begin/end of day) -Disturbances of a single variable within a sample (e.g. singel metabolite)

Which methods of preprocessing are used for correction of whole sample disturbances?

Normalization methods

-Internal standard -QC samples

Internal standard

Compound added to each sample in equal amount which does not occur naturally > intensity of the standard has to be the same in all samples > difference across samples: correction of all variables with same factor

Quality control samples

For correction of instrumental drift > pooled samples are used > after each 8 samples the QC sample is measured > many QC sample measurements over whole day > intensity of each metabolite should be the same but due to instrumental drift it may not be the same at different time points > use differences to correct studied samples inbetween the QC samples

Normalization

Correct for different dilutions e.g. urine is less diluted in the morning: correction by certain 'concentration measure' > not true concentrations anymore but samples are better comparable.

Which correction methods are used for single variable disturbances due to column aging in LCMS/GCMS?

Alignment methods for aligning peaks at different retention times for different samples such it is clear they belong to the same variable (metabolite/protein)

Correction methods for baseline (background signal) is unequal to zero?

Background correction methods

How is clean data stored after data preprocessing?

data matrix > for metabolomics data, after preprocessing and normalization: normalized data matrix: starting point for data analysis and biological interpretation

What can a zero mean in the data matrix?

Not present, or below detection limit.

Are different variables always measured in the same units?

No, but often yes

11.4 data analysis

Chapter 11 + 12 Flashcards

Corresponding to the HC's of week 1 of the course's second part (36 cards)