HC 2 - Data Analysis and Experimental Design (+Preprocessing) Flashcards
hoorcollege 2
Data analysis pipeline
Biological question
> Experimental design
-Power analysis
-Treatment design
> Data acquisition
-QC strategy
-Measurement design
> Data pre-processing
-Normalisation
-Quantification
> Metabolite identification
> Statistical Data analysis
-Explorative
-Predictive
-Hypothetical biomarkers
> Biological interpretation
-MSEA
-Pathway analysis
Parts of the experimental design and data collection
-Frame a biological question
> testable with statistical analysis
-Design factor (controlled, drug vs placebo) / observed factor (not controlled, like health vs ill)
> different treatments (levels) / select from predefined groups
-Identify noise factors (confounding)
Confounding noise factors
Other sources of variation that could have an effect on the study like sex, medication, bmi
Hoe worde individuen bij een design factor aan een bv placebo of drug gelinked?
At random
Design: random, blocks and replicate. Random: Blind vs Double Blind vs Triple Blind
-Blind: random selection who is getting placebo or drug
-Double blind: individuals also do not know what they got (patients and researchers)
-Triple Blind: data analysis does not know who is getting placebo or drug.
Type of questions
- Designed: Detection of responsive features (genes, proteins, metabolites) under controlled experimental conditions (perturbation study, causal relationships) > h0: gene unperturbed = gene perturbed > which genes are affected by treatment
- Biomarkers: Detection of biomarkers (observational, difference patient and control) > h0: gene patient = gene control > we dont know if difference is caused by disease
- Regulation: Identification of regulatory or mechanistic relationship between features > no relationship vs relationship (linear, exponential) > associations, correlations, more explorative analysis > measure if correlation between metabolites or genes are changed
Noise factors
Disturbing correct estimation of the effect of the experimental factor like time, temperature, gender and age.
Controlling noise factors
Taking only one gender or constant temperature
Ways to take not controllable noise factors into account:
Randomization, blocking and replication
Randomization
-Random assignment of treatments to different individuals
-Random experiments over time (time has no effect)
-Randomize sample over batches/ slides: do not measure all controls in one batch and then the treated samples in a separate batch.
When are blocks of experiments made?
If:
-Not all experiments can be done in one day
-Measured levels could be different for specific groups > e.g. men and women because that is a confounding factor
Blocking over days
Fix the samples over days: same amount of cases and controls each day
> Randomize samples within days
> ignore the effect of day
Blocking over groups
Fix treated/controls equal over groups (men/women)
> Randomize treatment / control within group
> ignore the effect of groups
Which effect is stated irrelevant when blocking, and has to be removed?
The block effect (the difference between blocks)
> correction > remove average block effect from data
The rule for blocking: fix over the blocks, but … within the blocks
randomize
Replication
-Replicate measurements
-Repeat analysis to decrease biological variation and/or analytical variation
Types of replication
-Measure more individuals per group
-Repeat treatment for an individual
-Measure sample multiple times
The mean is better estimated when more measurements are performed. Why?
Because the influence of outliers and coincidence becomes smaller
Repeatability
The degree of agreement between measurements conducted on the same sample in the same location by the same people
> which value to exprect when repeating data collection from the analysis step
Reproducibility
The degree of agreement between measurements conducted on replicate samples in different locations by different people.
> which value to expect when repeating data collection from the sample collection from the same patients
Biological variability
Variation between individuals in the same group: offset and effect-size within individuals between biological experiments
> which value to expect when repeating the data collection from patient selection
Analytical variation includes:
-Bias: mean value is not equal to actual value
-Repeatability and reproducibility
From what is the amount of individuals selection per group based?
On the statistical power cutoff: when there is a difference between groups: how often can we detect it?