V7 Flashcards

statistical tests

1
Q

the principles of microarrays (general info being it)

A

spotted microarray: can measure the expression level of more than 20.000 genes in a single experiments

hybridisation by forming H-binds between complementary nucleotide base parts -> DNA spots attached to solid surface

  • samples are labeled with fluorescent dyes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

how to set up a microarray plate

A

column are the different samples , every column is given the same sample all the way

  • every row is tested for a different gene
  • gene expression levels are then seen on plate
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

common microarray workflow

A
  1. create oligo-arrays
  2. acquire samples, extract RNA
  3. RNa to DNA reverse transcription
  4. PCR(optional), Cy3 and Cy5 labelling
  5. hybridisation and scanning
  6. data storage
  7. extract expression levels
  8. data normalisation
  9. gene expression analysis
  10. data interpretation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

why normalise ?

A
  • remove technical variation from noisy data
  • assumption: global changes across samples are due to unwanted technical variability
  • to remove these differences has the potential to remove interesting biologically driven variation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

different options for normalisation/standardization

A

mean centering: new = old - mean(of group)
standard screw/student’s T-statistic: new = (old - mean) / SD
quantile normalisation: rank based method

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

quantile normalization

A
  • normalises between the samples -> makes them very homogenous, even with samples from different tissues
  • > might remote differences between the samples that naturally occur
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

ratios

A
  • simplest way to look at differences
  • ratio = mean(tissue1)/mean(tissue2) -> the ratio is very bias, not advisable
  • log2 ratio = long2(mean(tissue1)/mean(tissue2)) - so the one ‘tissue’ doesn’t overtake the other - its more unbias
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what defines statistical significance

A
  • if it has been predicted as unlikely to have occurred by chance alone
  • measured by probability value (p-value)
  • rejected if p < 0.5
  • the smaller the p value, the larger the significance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

student’s t-test

A
  • compare the difference between two groups
  • assumes normal distribution of data
  • t.test function in r

single sided t-test

  • hypothesis groupa < group b
  • > is more powerful

two sided t-test

  • hypothesis groupa != groups
  • > always this unless you have a certain hypothesis/previous info
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

one sample thest

A
  • test the hypothesis(H0) that the population mean is qual to a specific value µ0
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

two sample t-tests

A
  • independents (unpaired) samples
  • > 100 people - 50 control, 50 treatment
  • > preferably two groups in equal size and variance

paired sample (“intervention” in the middle)

  • > generally preferred
  • > “repeated measures” t-test
  • > before and after treatment measurements
  • > reduces (or eliminates) the effects of confounding factors
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

normality assumptions - parametric tests

A
  • assume the input data follows a known distribution
  • each of the two populations being compared should follow a normal distribution
  • variance of the two populations are also assumed to be equal
  • samples should be random and independent
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

how to test for normality

A

shapiro-Wilk test : shapiro.test()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

how to behave for variance between populations

A

if not : Welch’s t-test (default)

if yes: use t.test(,var.equal = TRUE)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

normality assumptions - non-parametric tests

A
  • does not assume data follows a certain distribution
  • > often rely on rank methods

Tests:

Mann_whitney U test
wilcox. test
Kruskal Wallis test
friedman test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Mann-Whitney U test

A

2 -group Mann-Whitney U test
- wilcox.test(y~A) : y = numeric measurements A= boolean factor (GroupA/GroupB)
-wolcox.test(x,y): x=numeric group A measurements y=numeric group B measurements
2-group wilcoxon signed rank test
- wilcox.test(x,y,paired=TRUE) : where x and y are numeric “repeated measurements”

17
Q

Kruskal Wallis test

A
  • one way anova by ranks
    kruskal. test(y~A): y is numeric, A is a factor (many levels)

H0 = are all groups from the same population?

18
Q

friedman test

A
  • randomised block design
    friedman. test(y~A|B): y numeric data values, a is a grouping factor, b is a blocking factor
  • potato yield (y) of types of potato plants (A), which have been measured across different fields (B)
19
Q

Corellation

A
  • is a measure of dependence between two variables
  • R provides the cor()
  • corellations are useful because they can indicate a predictive relationship that can be exploited in practice

correlation != causation

20
Q

types of corellation

A

pearson - no transformation : fast, but sensitive to outliers
spearman - rank based transformation: slower, but more robust

21
Q

multiple testing and what is accepted as significant

A
  • we test gene expression data for significant difference : does gene a significantly differ between conditions?
  • perform many of these tests (commonly 20.000 genes)
  • as the number of comparisons increases, it becomes more likely that the groups being compared will appear to differ in merman of at least one attribute
  • to preserve the 1 in 20 threshold compensation for the amount of tests we performed is needed
  • simples correction (bonferroni)
    p-value <0.05/#samples -> value < 2.5x10^-6
22
Q

different errors in multiple testing

A

type 1- error : calling a gene significantly changed, even if its just by chance - avoid by bonferroni correction

type 2 error: missing a significantly changed gene - avoid by benjamin i-hochberg false discovery rate procedure

23
Q

how to adjust the p-value

A

using the p.adjust function

p.adjust(0.0015, “bonferroni”, 10)
- p wert = 0.0015, # tests = 10
adjusted p-values below < 0.05 are considered sigificant

24
Q

how to get free microarray data

A
  • gene expressions Omnibus(NCBI)
  • > only storage and retrieval
  • array Express (EBI)
  • > has gene expression atlas, curated, re-annotaded archive data
  • > A storage, retrieval and analysis
  • > different biological conditions across experiments