HC 2 - Data Analysis and Experimental Design (+Preprocessing) Flashcards

Question 1

Q

Data analysis pipeline

Answer

A

Biological question
> Experimental design
-Power analysis
-Treatment design
> Data acquisition
-QC strategy
-Measurement design
> Data pre-processing
-Normalisation
-Quantification
> Metabolite identification
> Statistical Data analysis
-Explorative
-Predictive
-Hypothetical biomarkers
> Biological interpretation
-MSEA
-Pathway analysis

Question 2

Q

Parts of the experimental design and data collection

Answer

A

-Frame a biological question
> testable with statistical analysis
-Design factor (controlled, drug vs placebo) / observed factor (not controlled, like health vs ill)
> different treatments (levels) / select from predefined groups
-Identify noise factors (confounding)

Question 3

Q

Confounding noise factors

Answer

A

Other sources of variation that could have an effect on the study like sex, medication, bmi

Question 4

Q

Hoe worde individuen bij een design factor aan een bv placebo of drug gelinked?

Answer

A

At random

Question 5

Q

Design: random, blocks and replicate. Random: Blind vs Double Blind vs Triple Blind

Answer

A

-Blind: random selection who is getting placebo or drug
-Double blind: individuals also do not know what they got (patients and researchers)
-Triple Blind: data analysis does not know who is getting placebo or drug.

Question 6

Q

Type of questions

Answer

A

Designed: Detection of responsive features (genes, proteins, metabolites) under controlled experimental conditions (perturbation study, causal relationships) > h0: gene unperturbed = gene perturbed > which genes are affected by treatment
Biomarkers: Detection of biomarkers (observational, difference patient and control) > h0: gene patient = gene control > we dont know if difference is caused by disease
Regulation: Identification of regulatory or mechanistic relationship between features > no relationship vs relationship (linear, exponential) > associations, correlations, more explorative analysis > measure if correlation between metabolites or genes are changed

Question 7

Q

Noise factors

Answer

A

Disturbing correct estimation of the effect of the experimental factor like time, temperature, gender and age.

Question 8

Q

Controlling noise factors

Answer

A

Taking only one gender or constant temperature

Question 9

Q

Ways to take not controllable noise factors into account:

Answer

A

Randomization, blocking and replication

Question 10

Q

Randomization

Answer

A

-Random assignment of treatments to different individuals
-Random experiments over time (time has no effect)
-Randomize sample over batches/ slides: do not measure all controls in one batch and then the treated samples in a separate batch.

Question 11

Q

When are blocks of experiments made?

Answer

A

If:
-Not all experiments can be done in one day
-Measured levels could be different for specific groups > e.g. men and women because that is a confounding factor

Question 12

Q

Blocking over days

Answer

A

Fix the samples over days: same amount of cases and controls each day
> Randomize samples within days
> ignore the effect of day

Question 13

Q

Blocking over groups

Answer

A

Fix treated/controls equal over groups (men/women)
> Randomize treatment / control within group
> ignore the effect of groups

Question 14

Q

Which effect is stated irrelevant when blocking, and has to be removed?

Answer

A

The block effect (the difference between blocks)
> correction > remove average block effect from data

Question 15

Q

The rule for blocking: fix over the blocks, but … within the blocks

Answer

A

randomize

Question 16

Q

Replication

Answer

A

-Replicate measurements
-Repeat analysis to decrease biological variation and/or analytical variation

Question 17

Q

Types of replication

Answer

A

-Measure more individuals per group
-Repeat treatment for an individual
-Measure sample multiple times

Question 18

Q

The mean is better estimated when more measurements are performed. Why?

Answer

A

Because the influence of outliers and coincidence becomes smaller

Question 19

Q

Repeatability

Answer

A

The degree of agreement between measurements conducted on the same sample in the same location by the same people
> which value to exprect when repeating data collection from the analysis step

Question 20

Q

Reproducibility

Answer

A

The degree of agreement between measurements conducted on replicate samples in different locations by different people.
> which value to expect when repeating data collection from the sample collection from the same patients

Question 21

Q

Biological variability

Answer

A

Variation between individuals in the same group: offset and effect-size within individuals between biological experiments
> which value to expect when repeating the data collection from patient selection

Question 22

Q

Analytical variation includes:

Answer

A

-Bias: mean value is not equal to actual value
-Repeatability and reproducibility

Question 23

Q

From what is the amount of individuals selection per group based?

Answer

A

On the statistical power cutoff: when there is a difference between groups: how often can we detect it?

Question 24

Q

How to increase power of a test?

Answer

A

-Increase effect size
-Decrease SDx (standard deviation of the mean) > improve measurement
-Decrease SEMx (standard error) > increase n (more replicates)
-SEM = SD / sqrt(n)

Question 25

Q

Standard error of the mean formula

Answer

A

SEM = SDx / sqrt(n)

Question 26

Q

Power cutoff value

Question 27

Q

What is alpha, beta and the power?

Answer

A

Alpha: chance that H0 is rejected but is true, and there is no difference (5% as cutoff, then we find this chance low enough to call a difference)
Beta: chance that H0 is accepted but should be rejected
Power = 1 - Beta

Question 28

Q

Types of design

Answer

A

-Parallel design
-Repeated measures design

Question 29

Q

Parallel design

Answer

A

-Measure both groups on the same time point
-Individuals are tested at one treatment
-Used when small ‘between individual’ variation (variations between individuals of the same groups)
-Test between individuals of different groups measured at same time point with t-test or ANOVA (comparison of means)
-Within group - variation is much smaller than between group variation

Question 30

Q

The reliability of a parallel design is dependent on the …

Answer

A

variation within a group

Question 31

Q

Parallel designs are used for … studies

Question 32

Q

Repeated measures design

Answer

A

-Every individual gets both treatments with a time interval between the two treatments
-Use the same individuals for multiple tratments
> determine before and after treatment values after both treatment 1 and 2

Question 33

Q

When is repeated mesures design used

Answer

A

When the ‘between individual’ variation is large
> variation between individuals expected large compared to variation due to treatment (within individual)
> no bias between individuals because of the correction (due to between individuals variation) > more significant results

Question 34

Q

Which design corrects for the time effect of repeated measures?

Answer

A

Cross-over design: random treatment order assignment

Question 35

Q

Which values are taken for comaparison in repeated measures?

Answer

A

Means of the difference values
> the correction leads to less noise and more reliability

Question 36

Q

For which kinds of studies is repeated measures used?

Answer

A

Human studies

Question 37

Q

Parallel vs repeated measures

Answer

A

-Equal effect
-Noise (standard deviation) is different because the between individual variation is ignored in parallel design

Question 38

Q

In a multivariate data matrix: what are the rows and what are the columns?

Answer

A

Rows: individuals, samples or countries
Columns (variables/features): metabolites, genes, qualitative (m/v) / quantitative

Question 39

Q

What do these values mean in the multivariate data matrix: NaN, unexpected negative values, 0 values, outliers

Answer

A

-NaN; not a number
-Unexpected negative value: for example negative value for intensity
-0 values: below the detection limit perhaps
-Outliers: value differs a lot from the range of the other values of the column

Question 40

Q

Disturbances of a whole sample

Answer

A

-Amount of sample is different
-Some samples are more diluted than others
-Order of measuring affects measurement

Question 41

Q

-Dilution of samples could be different e.g. urine : why is correction needed?

Answer

A

to remove systematic variation between experimental conditions unrelated to the biological differences (dilutions, mass)
> due to drugs, disease, day/night rhythm the urine amount can change (and therefore dilution)

Question 42

Q

Metabolite levels are considered … and gene expression values are …

Answer

A

Metabolite levels: quantitative
- Normal distribution assumed
- T-test / ANOVA
Gene expression: counts
- Poisson distribution: negative binomial distribution
- not symmetric
- special tests that correct distribution

Question 43

Q

Sample normalization

Answer

A

Differences between individuals due to metabolic differences and dilution differences
> Corrected by a correction value

Question 44

Q

Sample normalization corrections

Answer

A

By
-A reference compound originally in sample (creatinine in urine f.e.)
-Total sum or total peak area
> peak area/height / total sum
-Dry mass, volume etc

Question 45

Q

Disadvantages with the correction values

Answer

A

-Sum of total peak areaL problem with changing profiles of large peaks that differ over individuals (could be relevant)
-Creatinine: protein from muscle degradation which highly depends on muscle weight (m/v, children/adult)
-Volume: total amount of compounds (=concentration x volume) is used (not concentration) > problem with women not emptying their bladder completely.

Question 46

Q

Disturbances of single feature of a sample

Answer

A

-Alignment problems due to aging of chromatographic column
-Wrong baseline measurement: unequal to 0
-Not the whole array has the same quality

Question 47

Q

What is an internal standard?

Answer

A

A compound added in fixed amounts to the sample before sample workup

Question 48

Q

Internal standards correction

Answer

A

-Peakheight for internal standard should be equal in all samples (otherwise something is wrong with the sample, correction is needed)
-To correct for variation in sample workup/measurement.
-The internal standard is expected to behave similar as other features in sample
-Ratio feature / internal standard is expected to stay constant

Question 49

Q

Quality Control (QC) samples

Answer

A

-To check variation in instrument over time
-Because QC sample is always the same, the measured signal is expected equal for ALL compounds in the sample
-Something is wrong with the analysis machine if the peak value for QC samples isn’t consant

Question 50

Q

When are QC samples measured?

Answer

A

Every other 8-10 samples

Question 51

Q

How are study samples corrected with QCs?

Answer

A

Use trends in signals to correct study samples inbetween QCs

Question 52

Q

What are QC samples?

Answer

A

Pooled samples (combination of all study samples)

Question 53

Q

Which compound is always added to the QC sample?

Answer

A

The internal standard

Question 54

Q

What does the QC check?

Answer

A

If the ratio (compund peak/ IS peak) is constant

Question 55

Q

QC correction corrects

Answer

A

Between batch difference and within batch differences due to drift > correction factors for QCs are applied to samples for each metabolite

Question 56

Q

Name three different characteristics between IS and QC

Answer

A

IS : Same effect on every other peak
QC: Study samples inbetween two QC samples are corrected
IS: Every sample has an internal standard added
QC: in QC sample all peaks are present
IS: Assumed that if the IS is 2 times higher, the other peaks (all) are 2 times higher
QC: Peak A in QC sample can go up and peak B can stay the same (correction per variable)

Question 57

Q

In large studies IS and QC samples are combined. Why?

Answer

A

For optimal correction of instrumental drift

Question 58

Q

Waar moeten welke correcties worden gebruikt bij: als sample van tevoren al anders zijn, voor sample workup en instrument van zelfde samples, en voor instrumenten

Answer

A

Samples anders van tevoren: normalisatie met creatinine, total amount of volume
Sample workup en instrument: IS
Instrument: QC

Question 59

Q

What is sample workup?

Answer

A

From the sample to the measurement in the machine

HC 2 - Data Analysis and Experimental Design (+Preprocessing) Flashcards

hoorcollege 2