Finding Disease Genes Flashcards
Candidate Gene Association Study
Hypothesis driven approach
Uses markers to test gene/causal variant indirectly
Depends on a priori biological or positional hypothesis (almost always wrong!)
Fatal flaws lead to false positives
Genome Wide Association Study
Hypothesis Free approach
Rather than look gene by gene (candidate gene association study) we could do whole genome at one time!
Search for SNPs with significantly different allele frequencies in cases verse controls
Genetic linkage study
Hypothesis free approach
Search for genome disproportionately coinherited along with disease in multiplex families
Assumes affected relatives within a family share disease susceptiblity genes “identical by descent”
Exome/Genome Sequencing Study
Sequence
Compare to reference
Look at common anomalies
Single gene sequencing
Sequence hypothesized gene
Most hypotheses wrong
What does genome mapping require?
Polymorphic DNA markers
Do we sequence the entire genome?
No… still too expensive
What do polymorphic DNA markers do for us?
They provide “Sign posts” where we can look at differences
Polymorphic DNA makers are surrogates for what?
Potential disease mutations;
What are three commonly used marker types?
Microsatellites
SNPs
CNVs
Gene Mapping: what are physical maps?
Maps that tell us absolute positions - this is here and this is here
Gene Mapping: what are genetic maps?
Relative maps based on recombination - across a whole population roughly how far apart are these two things/sequences from each other
Microsatellites
Simply a repeat sequence in the genome for which the copy number varies
Simple sequence repeats
Used in forensics
Multi-allelic
Single nucleotide polymorphisms
Bi-allelic
Used for association studies
Occurrence/allele frequencies differ in ethnic groups/populations
SNPs occur in local context (haplotype) of surrounding SNPs
How frequent are SNP?
1/50-300bp
SNP haplotypes
Recombination breaks macro-pattern of polymorphic genotypes on the same chromosome into blocks in which SNP alleles are in linkage disequilibrium (makers within blocks tend to be co-inherited because recombination within blocks is uncommon)
If you genotype enough SNPs to identify a haplotype you can impute other variation that wasn’t genotyped and use this to infer ?
causal variation took place in this haplotype, even though SNP may not be causal variant
Copy Number variants
Common genomic deletions Bi-allelic Multi-allelic Unique Most not causal for human disease
If we have a common disease allele that has a small effect, what studies are best suited to hunt for the disease gene?
Association
Candidate gene or GWAS
If we have a rare disease allele that has a large effect, which studies are best suite to hunt for the disease gene?
Linkage
Sequencing
(track genes through families using linkage)
To track things that are common but have relatively little effect we use which type of studies?
Association
To track big effect genes that are relatively rare we use which kind of study?
Linkage in families
Hypothesis Driven Studies
Candidate DNA Sequencing
Candidate Gene association
Candidate gene DNA sequencing
Where do we come up with our candidate?
biological or positional
“hit” from GWAS or other mapping method
When do Candidate DNA studies work?
Single gene Mendelian diseases
Candidate DNA sequencing, are most hypotheses correct?
NO! most are wrong!
Which type of study uses markers to test gene/causal variant indirectly?
Candidate gene association studies
Which genetic study is the most common?
Candidate gene association
What do candidate gene association studies depend on?
A prior hypothesis
What are fatal flaws of candidate gene association studies and what do they lead to?
- Multiple-testing correction impossible
- Ethnically matched impossible
False positives!
Concept behind Candidate Gene Association study?
Causal disease variation in candidate gene is tagged by local haplotype of polymorphic DNA markers in Linkage Disequilibrium
Depends on Linkage disequilibrium
Candidate gene association studies depend on linkage disequilibrium - in that?
DNA sequence variations close together on the same piece of DNA will tend to not be separated by recombination over long periods and so will be non-randomly co-inherited
Candidate gene association studies - approach
What kind of study design?
Case control
Candidate gene association studies - (2)
- genotype marker in candidate gene in cases nad controls
2. compare allele frequencies in cases and controls
G.A.S.
Study size?
Hundreds
G.A.S. stats?
Uses simply stats (chi sqaure, Fisher exact) p
Genetic association studies - because we test multiple variants - what must we do?
We must apply multiple-testing correction
G.A.S.
What does an association imply?
Not causation but does imply at least linkage disequilibrium with causal mutation
What is the issue with multiple testing correction in G.A.S.
Have to take into account every variant of every study ever done - unrealistic
Take that number and divide you P value by the number of variants to get new significance value, which will be much much lower.
G.A.S.
2 Fatal Flaws…
- True multiple testing correction must include all tests, even those done by others and perhaps never published
- Must ethnically match cases and control; otherwise, observed differences in allele frequencies may reflect different genetic backgrounds of cases and control, not true disease association - not possible to achieve
Why can’t we ethnically match cases and controls in G.A.S.?
Because even in homogenous population, occult population differences (stratification) can lead to false positives
What percent of published (3x confirmed) genetic association studies ultimately appear to be false positives due to stratification and publication bias?
96%
Is genetic linkage analysis hypothesis free?
Yes!
Search genome for segments disproportionately co-inherited along with disease in “multiplex” families
What is the underlying assumption in genetic linkage analyses?
Affected relatives within a family share disease susceptiblity genes
“identical by descent”
What traits are best suited for genetic linkage analysis?
Mendelian (uncommon alleles with strong effects)
Genetic linkage analysis for complex traits ?
Less powerful
What can be said to be a search across the genome for marker(S) that co-segregate with disease in families?
Genetic linkage analysis
What does genetic linkage depend on?
Principle depends on recombination - Loci close to each other (marker and gene) on a chromosome tend not to be separated by recombination vs. loci far apart
What is the unit of genetic linkage/recombination?
centiMorgan (cM)
1 cM = 1% recombination between two loci per meiosis
What is the statistical measure or linkage in genetic linkage analysis?
Log of odds score
LOD =
Log10 (likelihood of data if loci linked at _cM / likelihood of data if loci unlinked)
What is the significance level for genetic linkage analysis for LOD score?
> or equal to 3 is considered proof of linkage/gene localization
How do we localize a gene using genetic linkage analysis?
We can follow ancestral haplotypes of linked marker alleles in each family
Through the generations recombination evens prune the haplotype –> Localizing the gene
What are you looking for in genetic linkage analysis?
You are looking for a region of the genome where there is something that seems to be shared among affected relatives that you assume has been inherited from a common ancestor given the family structure
What kind of studies are GWAS?
Case-control
What do GWAS do?
Test hundreds of thousands / millions of markers (SNPs) across the entire genome
What are we looking for in GWAS?
SNPs with significantly different allele frequencies in cases vs. controls
Do we still need to match cases and controls ethnically?
Yes, and we are met with the same stratification problem, however, now we can accurately measure and correct for population stratification (whole genome)
GWAS are we still faced with multiple testing correction issue?
NO! we know the number of tests performed genomewide; so we can perform appropriate multiple testing corrections (usually assume one million tests, so p
Because we have a huge multiple-testing correction in GWAS, how big must our study be?
Usually at least 1000 cases and controls
What happens if we find a significant association in a GWAS study?
We require confirmation by independent replication by follow up association study of specific SNPs
When are GWAS most effective?
Common alleles with moderat effect sizes (ORs) 1.5 to 1.15
What limits GWAS?
Sample size
What is the hope of GWAS?
That we will be able to determine genetic architecture of disease - infer th ebiological pathway - and then separate/recategorize disease based on pathway it follows - which could significantly increase our odds ratio - because it would no longer be diluted by irrelevant pathways that cause same disease
We are trying to tease out genes specific to pathways
Are most GWAS findings coding?
No, most are regulatory in nature, which is good because it may be easier to target and treat
Which investigation combine hypothesis based and hypothesis free approaches?
Deep re-sequencing
What is Deep re-sequencing?
High throughput DNA sequencing
- of Biological candidate genes
- from GWAS signlas
- Full genome or exome
What is a problem of Deep Sequencing?
Difficult to distinguish potentially causal variants from non-pathologic
- Prioritize for follow-up functional analysis
Variants of unknown significance
Exome Genome Sequencing
How it works?
Pull down genome
Sequence
Reference
What’s different
Is noise an issue with Exome/Genome Sequencing?
Yes! There is a lot of noise
Exome/Genome Sequencing Filtering Schemes
Make assumption that disease populace share similar things -
Sequence effected patients / effected family
Look for something rare
Find something? Look at catalogs
Do LOD score