V7 Flashcards
statistical tests
the principles of microarrays (general info being it)
spotted microarray: can measure the expression level of more than 20.000 genes in a single experiments
hybridisation by forming H-binds between complementary nucleotide base parts -> DNA spots attached to solid surface
- samples are labeled with fluorescent dyes
how to set up a microarray plate
column are the different samples , every column is given the same sample all the way
- every row is tested for a different gene
- gene expression levels are then seen on plate
common microarray workflow
- create oligo-arrays
- acquire samples, extract RNA
- RNa to DNA reverse transcription
- PCR(optional), Cy3 and Cy5 labelling
- hybridisation and scanning
- data storage
- extract expression levels
- data normalisation
- gene expression analysis
- data interpretation
why normalise ?
- remove technical variation from noisy data
- assumption: global changes across samples are due to unwanted technical variability
- to remove these differences has the potential to remove interesting biologically driven variation
different options for normalisation/standardization
mean centering: new = old - mean(of group)
standard screw/student’s T-statistic: new = (old - mean) / SD
quantile normalisation: rank based method
quantile normalization
- normalises between the samples -> makes them very homogenous, even with samples from different tissues
- > might remote differences between the samples that naturally occur
ratios
- simplest way to look at differences
- ratio = mean(tissue1)/mean(tissue2) -> the ratio is very bias, not advisable
- log2 ratio = long2(mean(tissue1)/mean(tissue2)) - so the one ‘tissue’ doesn’t overtake the other - its more unbias
what defines statistical significance
- if it has been predicted as unlikely to have occurred by chance alone
- measured by probability value (p-value)
- rejected if p < 0.5
- the smaller the p value, the larger the significance
student’s t-test
- compare the difference between two groups
- assumes normal distribution of data
- t.test function in r
single sided t-test
- hypothesis groupa < group b
- > is more powerful
two sided t-test
- hypothesis groupa != groups
- > always this unless you have a certain hypothesis/previous info
one sample thest
- test the hypothesis(H0) that the population mean is qual to a specific value µ0
two sample t-tests
- independents (unpaired) samples
- > 100 people - 50 control, 50 treatment
- > preferably two groups in equal size and variance
paired sample (“intervention” in the middle)
- > generally preferred
- > “repeated measures” t-test
- > before and after treatment measurements
- > reduces (or eliminates) the effects of confounding factors
normality assumptions - parametric tests
- assume the input data follows a known distribution
- each of the two populations being compared should follow a normal distribution
- variance of the two populations are also assumed to be equal
- samples should be random and independent
how to test for normality
shapiro-Wilk test : shapiro.test()
how to behave for variance between populations
if not : Welch’s t-test (default)
if yes: use t.test(,var.equal = TRUE)
normality assumptions - non-parametric tests
- does not assume data follows a certain distribution
- > often rely on rank methods
Tests:
Mann_whitney U test
wilcox. test
Kruskal Wallis test
friedman test