Lecture #9 Flashcards
Case control study
convert the # of genotypes into # of alleles
T/T * 2 + T/C = observed # of T
P>0.1
no presumption against the null hypothesis (no significant association)
0.05<P<0.1
low presumption against null hypothesis (marginal association)
P<0.05
strong presumption against null hypothesis (significant association)
P<0.01
very strong presumption against null hypothesis (very significant association)
P value does not measure
strength of an association relationship
can be affected by sample size - bigger the size, lower the P value
can be affected by allele freqeuncy
Measure of the strength
odds ratio: increased risk for a phenotype by carrying a specific genotype/allele compared to the pts without carrying
OR = odds of phenotype in an individual with the genotype/allele / odds of phenotype in an individual without the genotype/allele
hazard ratio is a similar concept, but is mainly discussed in survival data
OR = 1
no association
OR > 1
potentially increases the risk
OR < 1
potentially decreases the risk
95% confidence interval
over 95% of probability that the association is confident
like the p value for x^2, 95% CI is a statistical probability for OR (standard error of OR)
if 95% CI is greater than 1
there is a significant risk effect
if 95% CI contains 1
no statistical significance
if 95% CI is less than 1
there is a significant protective effect
Correction for P values
due to large # of tests for many SNPs vs the single phenotype, there will be much higher probability to have many SNPs associated with the phenotype just by chance (false positive) so called multiple testing
more SNPs tested, higher the probability for false positive
Bonferroni correction
corrected P=0.05/N (total # of SNPs tested)
5x10^-8 as corrected significant GWAS P value
Doing experiments to test your hypothesis
control is the key (positive control, negative control)
human clinical trials often don’t have negative control due to ethical reasons (compare to standard of care)
know the dynamic range of your data set (upper limit and lower limit)
to have reliable results (especially for the human clinical trials, a large sample size is essential)
Know the distribution of your data
majority of the dataset we are dealing with follows normal (gaussian) distribution
clinical study often uses median but not mean - two reasons, faster and pt data may not be normally distributed)
test one concept in mutliple biological systems (molecular, cellular, rodents, human iPSCs) will make your conclusion more reliable