Lecture 9: Drug Discovery I Flashcards
Correlation
Testing genotype-phenotype correlation
Causal
Establishing the cause-effect relationship between the genetic variation and phenotypic variation
Null hypothesis
no association between any allele and drug resistance
Interpretation
The probability of this magnitude or larger when there is no association <0.00000000000000039
the association is unlikely to occur by chance
Conclusion: the finding is unlikely to occur if there is no association btwn T allele and virus persistence - thus we reject null, and accept the alternative. We conclude that there is an association btwn the T allele and virus persistence/drug resistance
Interpretation of p value
P>0.1 - no significant association
0.05<P<0.1 - low (marginal association)
P<0.05 - significant association
P<0.01 - VERY significant association
However, P value does not measure the strength of an association relationship
Measure of strength: odds ratio
Increased risk for a phenotype by carrying a specific genotype/allele compared to the patients w/o carrying
Odds ratio
OR:
Odds of phenotype in an individual with the genotype/allele OVER odds of phenotype in an individual without the genotype/allele
Hazard ratio
Similar concept to odds, but mainly survival data
Odds ratio interpretation
OR = 1 - no association
OR > 1 - Potentially increases the risk (“risk allele”)
OR <1 - Potentially decreases the risk (“protective allele”)
95% Confidence interval
95% CI is a statistical probability for OR (the standard error of OR)
95% CI interpretation
If 95% CI > 1 - significant risk effect
If 95% CI contains 1 - no statistical significance
If 95% CI < 1 - significant protective effect
How to approach candidate gene/SNP
Hypothesis
your best guess
Correction for P values
Due to a large number of tests for many SNPs vs the single phenotype, there will be much higher probability to have many SNPs associated w/the phenotype just by chance (false positive)
The more SNPs tested, the higher the probability for false positive
Bonferroni correction: Corrected P=0.05/N (total # of SNPs tested)
In general, we use 5x10^-8 as a corrected significant GWAS P value
Doing experiments to test hypothesis
Control is the key (positive control, negative control
Sham - negative control
Standard of care drug - positive control
Human clinical trials often don’t have negative control due to ethical reasons (compare to standard of care)
Know the dynamic range of your data set (upper and lower limit).
To have reliable results (especially for the human clinical trials, a large sample size is essential)
Distribution of experiment data
Know the distribution of data: majority of the dataset we are dealing with follows “normal distribution”
Clinical studies often use median but not mean
-Two reasons: faster, patient data may not be normally distributed
Test one concept in multiple biological systems (molecular, cellular, rodents, human, iPSCs) will make conclusion more reliable
Replication is important
A study in one population may not be able to represent other patient population
Even the p value is very low, it is still possible that the finding is by chance in a specific sample set
Association study usually requires independent replications in other sample sets to increase n number (sample size is very important)
A genotype-phenotype association does NOT mean
cause-effect relationship
correlation is NOT causal