HC 2 - Data Analysis and Experimental Design (+Preprocessing) Flashcards

hoorcollege 2

1
Q

Data analysis pipeline

A

Biological question
> Experimental design
-Power analysis
-Treatment design
> Data acquisition
-QC strategy
-Measurement design
> Data pre-processing
-Normalisation
-Quantification
> Metabolite identification
> Statistical Data analysis
-Explorative
-Predictive
-Hypothetical biomarkers
> Biological interpretation
-MSEA
-Pathway analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Parts of the experimental design and data collection

A

-Frame a biological question
> testable with statistical analysis
-Design factor (controlled, drug vs placebo) / observed factor (not controlled, like health vs ill)
> different treatments (levels) / select from predefined groups
-Identify noise factors (confounding)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Confounding noise factors

A

Other sources of variation that could have an effect on the study like sex, medication, bmi

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Hoe worde individuen bij een design factor aan een bv placebo of drug gelinked?

A

At random

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Design: random, blocks and replicate. Random: Blind vs Double Blind vs Triple Blind

A

-Blind: random selection who is getting placebo or drug
-Double blind: individuals also do not know what they got (patients and researchers)
-Triple Blind: data analysis does not know who is getting placebo or drug.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Type of questions

A
  1. Designed: Detection of responsive features (genes, proteins, metabolites) under controlled experimental conditions (perturbation study, causal relationships) > h0: gene unperturbed = gene perturbed > which genes are affected by treatment
  2. Biomarkers: Detection of biomarkers (observational, difference patient and control) > h0: gene patient = gene control > we dont know if difference is caused by disease
  3. Regulation: Identification of regulatory or mechanistic relationship between features > no relationship vs relationship (linear, exponential) > associations, correlations, more explorative analysis > measure if correlation between metabolites or genes are changed
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Noise factors

A

Disturbing correct estimation of the effect of the experimental factor like time, temperature, gender and age.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Controlling noise factors

A

Taking only one gender or constant temperature

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Ways to take not controllable noise factors into account:

A

Randomization, blocking and replication

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Randomization

A

-Random assignment of treatments to different individuals
-Random experiments over time (time has no effect)
-Randomize sample over batches/ slides: do not measure all controls in one batch and then the treated samples in a separate batch.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

When are blocks of experiments made?

A

If:
-Not all experiments can be done in one day
-Measured levels could be different for specific groups > e.g. men and women because that is a confounding factor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Blocking over days

A

Fix the samples over days: same amount of cases and controls each day
> Randomize samples within days
> ignore the effect of day

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Blocking over groups

A

Fix treated/controls equal over groups (men/women)
> Randomize treatment / control within group
> ignore the effect of groups

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Which effect is stated irrelevant when blocking, and has to be removed?

A

The block effect (the difference between blocks)
> correction > remove average block effect from data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

The rule for blocking: fix over the blocks, but … within the blocks

A

randomize

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Replication

A

-Replicate measurements
-Repeat analysis to decrease biological variation and/or analytical variation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Types of replication

A

-Measure more individuals per group
-Repeat treatment for an individual
-Measure sample multiple times

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

The mean is better estimated when more measurements are performed. Why?

A

Because the influence of outliers and coincidence becomes smaller

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Repeatability

A

The degree of agreement between measurements conducted on the same sample in the same location by the same people
> which value to exprect when repeating data collection from the analysis step

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Reproducibility

A

The degree of agreement between measurements conducted on replicate samples in different locations by different people.
> which value to expect when repeating data collection from the sample collection from the same patients

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Biological variability

A

Variation between individuals in the same group: offset and effect-size within individuals between biological experiments
> which value to expect when repeating the data collection from patient selection

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Analytical variation includes:

A

-Bias: mean value is not equal to actual value
-Repeatability and reproducibility

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

From what is the amount of individuals selection per group based?

A

On the statistical power cutoff: when there is a difference between groups: how often can we detect it?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

How to increase power of a test?

A

-Increase effect size
-Decrease SDx (standard deviation of the mean) > improve measurement
-Decrease SEMx (standard error) > increase n (more replicates)
-SEM = SD / sqrt(n)

25
Q

Standard error of the mean formula

A

SEM = SDx / sqrt(n)

26
Q

Power cutoff value

A

> 80

27
Q

What is alpha, beta and the power?

A

Alpha: chance that H0 is rejected but is true, and there is no difference (5% as cutoff, then we find this chance low enough to call a difference)
Beta: chance that H0 is accepted but should be rejected
Power = 1 - Beta

28
Q

Types of design

A

-Parallel design
-Repeated measures design

29
Q

Parallel design

A

-Measure both groups on the same time point
-Individuals are tested at one treatment
-Used when small ‘between individual’ variation (variations between individuals of the same groups)
-Test between individuals of different groups measured at same time point with t-test or ANOVA (comparison of means)
-Within group - variation is much smaller than between group variation

30
Q

The reliability of a parallel design is dependent on the …

A

variation within a group

31
Q

Parallel designs are used for … studies

A

animal

32
Q

Repeated measures design

A

-Every individual gets both treatments with a time interval between the two treatments
-Use the same individuals for multiple tratments
> determine before and after treatment values after both treatment 1 and 2

33
Q

When is repeated mesures design used

A

When the ‘between individual’ variation is large
> variation between individuals expected large compared to variation due to treatment (within individual)
> no bias between individuals because of the correction (due to between individuals variation) > more significant results

34
Q

Which design corrects for the time effect of repeated measures?

A

Cross-over design: random treatment order assignment

35
Q

Which values are taken for comaparison in repeated measures?

A

Means of the difference values
> the correction leads to less noise and more reliability

36
Q

For which kinds of studies is repeated measures used?

A

Human studies

37
Q

Parallel vs repeated measures

A

-Equal effect
-Noise (standard deviation) is different because the between individual variation is ignored in parallel design

38
Q

In a multivariate data matrix: what are the rows and what are the columns?

A

Rows: individuals, samples or countries
Columns (variables/features): metabolites, genes, qualitative (m/v) / quantitative

39
Q

What do these values mean in the multivariate data matrix: NaN, unexpected negative values, 0 values, outliers

A

-NaN; not a number
-Unexpected negative value: for example negative value for intensity
-0 values: below the detection limit perhaps
-Outliers: value differs a lot from the range of the other values of the column

40
Q

Disturbances of a whole sample

A

-Amount of sample is different
-Some samples are more diluted than others
-Order of measuring affects measurement

41
Q

-Dilution of samples could be different e.g. urine : why is correction needed?

A

to remove systematic variation between experimental conditions unrelated to the biological differences (dilutions, mass)
> due to drugs, disease, day/night rhythm the urine amount can change (and therefore dilution)

42
Q

Metabolite levels are considered … and gene expression values are …

A

Metabolite levels: quantitative
- Normal distribution assumed
- T-test / ANOVA
Gene expression: counts
- Poisson distribution: negative binomial distribution
- not symmetric
- special tests that correct distribution

43
Q

Sample normalization

A

Differences between individuals due to metabolic differences and dilution differences
> Corrected by a correction value

44
Q

Sample normalization corrections

A

By
-A reference compound originally in sample (creatinine in urine f.e.)
-Total sum or total peak area
> peak area/height / total sum
-Dry mass, volume etc

45
Q

Disadvantages with the correction values

A

-Sum of total peak areaL problem with changing profiles of large peaks that differ over individuals (could be relevant)
-Creatinine: protein from muscle degradation which highly depends on muscle weight (m/v, children/adult)
-Volume: total amount of compounds (=concentration x volume) is used (not concentration) > problem with women not emptying their bladder completely.

46
Q

Disturbances of single feature of a sample

A

-Alignment problems due to aging of chromatographic column
-Wrong baseline measurement: unequal to 0
-Not the whole array has the same quality

47
Q

What is an internal standard?

A

A compound added in fixed amounts to the sample before sample workup

48
Q

Internal standards correction

A

-Peakheight for internal standard should be equal in all samples (otherwise something is wrong with the sample, correction is needed)
-To correct for variation in sample workup/measurement.
-The internal standard is expected to behave similar as other features in sample
-Ratio feature / internal standard is expected to stay constant

49
Q

Quality Control (QC) samples

A

-To check variation in instrument over time
-Because QC sample is always the same, the measured signal is expected equal for ALL compounds in the sample
-Something is wrong with the analysis machine if the peak value for QC samples isn’t consant

50
Q

When are QC samples measured?

A

Every other 8-10 samples

51
Q

How are study samples corrected with QCs?

A

Use trends in signals to correct study samples inbetween QCs

52
Q

What are QC samples?

A

Pooled samples (combination of all study samples)

53
Q

Which compound is always added to the QC sample?

A

The internal standard

54
Q

What does the QC check?

A

If the ratio (compund peak/ IS peak) is constant

55
Q

QC correction corrects

A

Between batch difference and within batch differences due to drift > correction factors for QCs are applied to samples for each metabolite

56
Q

Name three different characteristics between IS and QC

A

IS : Same effect on every other peak
QC: Study samples inbetween two QC samples are corrected
IS: Every sample has an internal standard added
QC: in QC sample all peaks are present
IS: Assumed that if the IS is 2 times higher, the other peaks (all) are 2 times higher
QC: Peak A in QC sample can go up and peak B can stay the same (correction per variable)

57
Q

In large studies IS and QC samples are combined. Why?

A

For optimal correction of instrumental drift

58
Q

Waar moeten welke correcties worden gebruikt bij: als sample van tevoren al anders zijn, voor sample workup en instrument van zelfde samples, en voor instrumenten

A

Samples anders van tevoren: normalisatie met creatinine, total amount of volume
Sample workup en instrument: IS
Instrument: QC

59
Q

What is sample workup?

A

From the sample to the measurement in the machine