V7 Flashcards

Question 1

Q

the principles of microarrays (general info being it)

Answer

A

spotted microarray: can measure the expression level of more than 20.000 genes in a single experiments

hybridisation by forming H-binds between complementary nucleotide base parts -> DNA spots attached to solid surface

samples are labeled with fluorescent dyes

Question 2

Q

how to set up a microarray plate

Answer

A

column are the different samples , every column is given the same sample all the way

every row is tested for a different gene
gene expression levels are then seen on plate

Question 3

Q

common microarray workflow

Answer

A

create oligo-arrays
acquire samples, extract RNA
RNa to DNA reverse transcription
PCR(optional), Cy3 and Cy5 labelling
hybridisation and scanning
data storage
extract expression levels
data normalisation
gene expression analysis
data interpretation

Question 4

Q

why normalise ?

Answer

A

remove technical variation from noisy data
assumption: global changes across samples are due to unwanted technical variability
to remove these differences has the potential to remove interesting biologically driven variation

Question 5

Q

different options for normalisation/standardization

Answer

A

mean centering: new = old - mean(of group)
standard screw/student’s T-statistic: new = (old - mean) / SD
quantile normalisation: rank based method

Question 6

Q

quantile normalization

Answer

A

normalises between the samples -> makes them very homogenous, even with samples from different tissues
> might remote differences between the samples that naturally occur

Question 7

Q

ratios

Answer

A

simplest way to look at differences
ratio = mean(tissue1)/mean(tissue2) -> the ratio is very bias, not advisable
log2 ratio = long2(mean(tissue1)/mean(tissue2)) - so the one ‘tissue’ doesn’t overtake the other - its more unbias

Question 8

Q

what defines statistical significance

Answer

A

if it has been predicted as unlikely to have occurred by chance alone
measured by probability value (p-value)
rejected if p < 0.5
the smaller the p value, the larger the significance

Question 9

Q

student’s t-test

Answer

A

compare the difference between two groups
assumes normal distribution of data
t.test function in r

single sided t-test

hypothesis groupa < group b
> is more powerful

two sided t-test

hypothesis groupa != groups
> always this unless you have a certain hypothesis/previous info

Question 10

Q

one sample thest

Answer

A

test the hypothesis(H0) that the population mean is qual to a specific value µ0

Question 11

Q

two sample t-tests

Answer

A

independents (unpaired) samples
> 100 people - 50 control, 50 treatment
> preferably two groups in equal size and variance

paired sample (“intervention” in the middle)

> generally preferred
> “repeated measures” t-test
> before and after treatment measurements
> reduces (or eliminates) the effects of confounding factors

Question 12

Q

normality assumptions - parametric tests

Answer

A

assume the input data follows a known distribution
each of the two populations being compared should follow a normal distribution
variance of the two populations are also assumed to be equal
samples should be random and independent

Question 13

Q

how to test for normality

Answer

A

shapiro-Wilk test : shapiro.test()

Question 14

Q

how to behave for variance between populations

Answer

A

if not : Welch’s t-test (default)

if yes: use t.test(,var.equal = TRUE)

Question 15

Q

normality assumptions - non-parametric tests

Answer

A

does not assume data follows a certain distribution
> often rely on rank methods

Tests:

Mann_whitney U test
wilcox. test
Kruskal Wallis test
friedman test

Question 16

Q

Mann-Whitney U test

Answer

Study These Flashcards

A

2 -group Mann-Whitney U test
- wilcox.test(y~A) : y = numeric measurements A= boolean factor (GroupA/GroupB)
-wolcox.test(x,y): x=numeric group A measurements y=numeric group B measurements
2-group wilcoxon signed rank test
- wilcox.test(x,y,paired=TRUE) : where x and y are numeric “repeated measurements”

Question 17

Q

Kruskal Wallis test

Answer

Study These Flashcards

A

one way anova by ranks
kruskal. test(y~A): y is numeric, A is a factor (many levels)

H0 = are all groups from the same population?

Question 18

Q

friedman test

Answer

Study These Flashcards

A

randomised block design
friedman. test(y~A|B): y numeric data values, a is a grouping factor, b is a blocking factor
potato yield (y) of types of potato plants (A), which have been measured across different fields (B)

Question 19

Q

Corellation

Answer

Study These Flashcards

A

is a measure of dependence between two variables
R provides the cor()
corellations are useful because they can indicate a predictive relationship that can be exploited in practice

correlation != causation

Question 20

Q

types of corellation

Answer

Study These Flashcards

A

pearson - no transformation : fast, but sensitive to outliers
spearman - rank based transformation: slower, but more robust

Question 21

Q

multiple testing and what is accepted as significant

Answer

Study These Flashcards

A

we test gene expression data for significant difference : does gene a significantly differ between conditions?
perform many of these tests (commonly 20.000 genes)
as the number of comparisons increases, it becomes more likely that the groups being compared will appear to differ in merman of at least one attribute
to preserve the 1 in 20 threshold compensation for the amount of tests we performed is needed
simples correction (bonferroni)
p-value <0.05/#samples -> value < 2.5x10^-6

Question 22

Q

different errors in multiple testing

Answer

Study These Flashcards

A

type 1- error : calling a gene significantly changed, even if its just by chance - avoid by bonferroni correction

type 2 error: missing a significantly changed gene - avoid by benjamin i-hochberg false discovery rate procedure

Question 23

Q

how to adjust the p-value

Answer

Study These Flashcards

A

using the p.adjust function

p.adjust(0.0015, “bonferroni”, 10)
- p wert = 0.0015, # tests = 10
adjusted p-values below < 0.05 are considered sigificant

Question 24

Q

how to get free microarray data

Answer

Study These Flashcards

A

gene expressions Omnibus(NCBI)
> only storage and retrieval
array Express (EBI)
> has gene expression atlas, curated, re-annotaded archive data
> A storage, retrieval and analysis
> different biological conditions across experiments

V7 Flashcards

statistical tests (24 cards)