WEEK 8: COMPLEX DISEASES I Flashcards
What are the 3 main categories for non-mendelian characters/inheritance?
- Extranuclear/maternal inheritance (mitochondrial diseases)
- Parental inheritance (IMPRITING)
- Complex (Polygenic) diseases most of the 3 lectures
How many proteins does mitochondrial DNA code for?
13
Does mitochondria also code for tRNA?
- YES!
What are three features of mitochondrial maternal inheritence?
- Extranuclear
- Cytpolasmic
- Uni-parental (mitochondria from sperm discarded)
What are two examples of mitochondria diseases and what would a pedigree look like?
- LHON–> Leber’s Hereditary Optic Neuropathy
- Leigh Syndrome
- Pedigree would have males and females affected BUT only offspring from affected mother will be affected (i.e. father can inherit it but can’t pass it on)
What is Heteroplasmy?
- Mixed population of normal and mutant genes
Can heteroplasmy be inherited from mother to child?
- YES
Can heteroplasmy be tissue specific and evolve with time?
- YES
What effect does heteroplasmy have on the genotype/phenotype relationship?
- It complicates it!
- Extremely variable (complex) presentation
If an affected woman is heteroplasmic, what would ocur in the offspring in terms of what they would inherit and the phenotype?
- Offspring will inherit BOTH the mutant and normal mtDNA
- This results in the phenotype of the offspring showing HETEROGENEITY from SEVERE–> WT manifestations
What is the difference between Chron’s Disease and Ulcerative Colitis?
- Chron’s can affect ANY portion of the GI tract (from mouth–> anus)
- Ulcerative Colitis is ONLY IN THE COLON and is PROGRESSIVE into the whole colon
What are monogenic diseases?
- There is a DIRECT relationship b/w disease gene and disease status
- The genotype and phenotype CLOSELY CORRELATE–> High penetrance
- The Mutations cause the disease 1 disease, 1 gene
What are polygenic diseases?
- Show STRONG GENETIC PREDISPOSITION but individual genes only MARGINALLY affect disease status
- Genotype and phenotype POORLY CORRELATE–> Low penetrance
- Polymorphisms PREDISPOSE to the disease –> 1 disease, many genes
Which database is mainly used for monogenic diseases and how many approx.. does it contain?
- OMIM –> Approx. 7000 diseases
Which database is mainly used for polygenic diseases and how many associations does it approx.. have ?
- GWAS (genome Wide Association Studies) Catalog
- Approx >150 000 associations
What plays the most important role in the threshold zone for complex diseases?
- The environment
What is the phenotype a result of in complex diseases?
- The result of the SUM of all genes/alleles each with a TINY contribution to risk
What is the strongest risk factor in complex diseases?
- THE ENVIRONMENT e.g. Graph shows that the incidence of IBD tripled –> genes DO NOT evolve that quickly
- thus can’t be explained in genetics so must be from environment
WHICH TWO METHODS WERE COMPLEX DISEASES CLASICALLY FIRST DEMONSTRATED BY?
- Family Studies (Clustering)
2. Twin Studies (Concordance)
What is the formula for the risk ratio (lambda R)?
- Disease prevalence in relatives R of probands/disease prevalence in population
What does a risk ratio of more than 1 indicate?
- There is an increased risk in the family compared to the population
What is the risk ratio involved in?
- Familial Clustering
What does lamba R increase with? (2 things)
- Increasing genetic contribution
- DECREASING population prevalence
Does a rare disease have a higher or lower lambdaR?
- LOWER
Does a stronger gene have a higher or lower lambdaR?
- HIGHER
If familial aggregation is detected, does it always mean genetics is the explanation?
- NO
- Women who had hypertension had a 2-fold INCREASED RISK of hypertension–> had NOTHING to do with genetics; but more environment
What is familial clustering confounded by?
-SHARED environment
WHAT IS AN ALTERNATIVE APPROACH TO FAMILIAL CLUSTERING TO MINIMISE CONFOUNDING?
- Adoption studies –> Prevalence, RR, etc.
- In adoptees vs relatives vs population
What are dizygotic twins and what % of genes do they share ?
= fraternal twins
- From INDEPENDENT fertilizations
- 50% of genes in common
What are monozygotic twins and what % of genes do they share?
= identical twins
- From SINGLE fertilization
- 100% of genes in common
What are concordant twins?
-BOTH, two affected (+/+)
What are discordant twins?
- 1 affected and 1 unaffected (+/-)
What does a concordance ratio of >1 mean in terms of genetics?
- That genetics plays a role
What is the concordance ratio?
- Concordance in MZ/Concordance in DZ
What are the two main approaches to GENE MAPPING in genetic diseases?
- Linkage studies (families)
- Association studies (cases and controls, population)
What is linkage and association a property of respectively?
- LOCI and ALLELES
What does linkage and association identify respectively?
- LINKAGE: biological mechanism for transmission of a trait
- ASSOCIATION: To identify association b/w an allelic variant and a disease
What type of mapping is involved in Linkage and Association studies respectively ?
LINKAGE: - Coarse mapping (>1cM)
ASSOCIATION: Fine Mapping (<1cM)–> small region
What does linkage and association studies require in terms of resources?
- Family pedigrees (linkage)
- Case control (more common) or family approaches
What type of markers do linkage studies and association studies usually use respectively?
- Highly polymorphic markers
- Bi-allelic markers (association)
What does linkage analysis in complex diseases look for?
- Looks for genomic segments shared by affected family members
What allele frequency and penetrance do variants that cause mendelian disease have in the population?
- 0.001 allele frequency with HIGH penetrance
What allele frequency and penetrance do variants that are identifiable by resequencing have in the population?
- 0.01 allele frequency with INTERMEDIATE penetrance
What allele frequency and penetrance do variants identified by GWA studies have in the population?
- 0.1 allele frequency with LOW PENETRANCE
What is the formula for relative risk ratio?
- Disease prevalence in relatives R of probands/disease prevalence in the population
What is the standard lod score analysis also known as?
- Parametric analysis
What is the affected sib pair analysis AKA?
- Non-parametric analysis
What is standard lod score (parametric) analysis?
- Model requires the mode of inheritance, penetrance, etc..which is normally unknown!
Is parametric analysis adequate?
- NO
- It is inadequate
Does non-parametric analyis require a model, is there any fine mapping required, does it need many pairs, and what is the applicability of it?
- No model required (don’t have to have any prior knowledge of inheritance)
- NO fine mapping needed
- Needs many pairs
- Limited applicability
When affected individuals share a chromosome region MORE or LESS often than expected by chance, then is that region likely involved in causing the disease?
- YES
What is Identity By State (IBS)?
-Two or more individuals share an identical allele (DNA fragment) e.g. 2,4 and 2,5 (2 is common allele)
What is Identity By Descent (IBD)?
-Identical allele can be traced back to a common ancestor or relative (parent)
What are 3 ways in which IBD (Identity by Descent) analysis can be used and what data are all of these based on?
o To Quantify relatedness
o To characterize population structure
o Gene mapping
- (all of these are based on linkage disequilibrium data!)
What does association analysis in complex diseases look for?
- Co-occurence (association) of alleles and phenotypes comparing CASES (frequencies) and CONTROLS
e. g. Allele C is associated with disease
How do we measure association analysis in complex disease and why?
- Via an OR (odds ratio) –> bc. this does NOT require data on the incidence/prevalence from the population data (like RR)
What can the OR be calculated from in association analysis (complex disease)?
- From observations/results
What is the risk ratio in association studies and what does it requrie?
- Pr of event in exposed / Pr of event in unexposed
- requires population data
What is the odds ratio (OR) in association studies?
- Ratio of odds, i.e. the Pr of event to Pr of NO event in CASES and CONTROLS
What are 3 main tests/studies that look for association (analysis) in complex diseases?
- Candidate gene studies (individual genes, require biological insight/hypothesis)
- Transmission disequilibrium tests (TDT)–>not used that much now (association in the presence of linkage, heterozygous parents needed)
- Genome Wide Association Studies (GWAS)–>Hypothesis FREE, whole genome is tested, require VERY large numbers
What is the most important type of DNA change in our genome?
- SNPs
What is ‘1000 genomes’?
- A catalog of human genetic variation
How many SNPs are there per human genome?
- 4-5 million SNPs
How much of human DNA variation do SNPs account for
- > 99.9%
What is a main method for calculating LD (linkage disequilibrium & haplotypes?
Haplotype blocks (contiguous markers in high LD)
In terms of GWAS, what is performing an association scan involving 1 million variants in the genome and a sample of unrelated individuals MORE powerful than?
- More powerful than performing a linkage analysis with a few hundred markers
What was needed for GWAS to become a reality?
- discovery of hundreds of thousands of single nucleotide variants
- Quantification of the correlation (LD) structure of those markers in the human genome
- Ability to accurately genotype hundreds of thousands of markers in an automated and affordable manner
In terms of technology and population, what facilitated the ability to conduct GWAS?
- Production of dense SNP arrays (illumina) that could genotype many markers in SINGLE ASSAY
- Biobanks of EITHER population cohorts or case-control samples
What are different risk genes and variants often found to be involved in ?
- the same/shared biological pathways
What can risk pathways be targeted for?
- Therapeutic exploitation (e.g. IL23)
What are SNPs?
- 1bp change in the DNA sequence –> high frequency (every 300-1000bp)
- Can be a nucleotide change (C/T) or deletions and insertions (indels)–> (C/-), (-/C)
Are SNPs bi-allelic and if so, what does that mean?
- YES–> two different forms at each SNP site
- There are minor/major (less/more frequent) alleles
What does MAF stand for in terms of alleles?
-Minor allele frequency
What can the MAF not be more than?
- 0.5
What is the most important type of DNA change in our genome?
SNPs
Are SNPs in the population common or rare?
- Most in population are RARE
Are SNPs in individuals common or rare?
- Most are COMMON
What is linkage diequilibrium?
- The non-random sharing of combination of genetic variants
- Particular combinations of alleles at closely linked loci (SNPs) that occur MORE/LESS often than EXPECTED based on individual allele frequencies
What are haplotypes?
-“Phased” combination of alleles at loci (SNPs) in LINKAGE DISEQUILIBRIUM (two alleles go together more often than expected)
Are there different ways of measuring the strength of linkage disequilibrium b/w two alleles?
- YES!
- D’ and r2
What is D’ not good for?
-Not good with RARE alleles
If D’=1 then what does that mean in terms of haplotypes?
- Then there is at least 1 haplotype missing
What is r2 GOOD for?
-Better for COMPLEX diseases
If r2=0 what does it mean?
- That they could be on different chromosomes
What is a ‘tag’ SNP?
-A SNP representative of other Linkage Disequilibrium SNPs –> can be used to infer genotype at several other SNPs (Imputation) –> this reduces the complexity for other large-scale genetic studies (GWAS)
What is ‘tag SNP’ applicable to?
-Common variants
What is the Hapmap project?
-To produce a genome-wide map of common variation –>genotyping 6 million SNPs in Four populations in Two phases
What are the three types of association analysis in human complex disease?
- Candidate gene studies
- Transmission Disequilibrium Tests (TDTs)
- GWAS
What is a TDT (transmission disequilibrium test)?
- Association in the presence of linkage, done in heterozygous parents, both parents and probands (siblings)
- It is the COUNTING of alleles TRANSMITTED versus NOT TRANSMITTED
What can TDTs be used to scan in family studies ?
- The whole genome in family studies
What type of statistics is TDT based on?
- Chi squared
What is a more modern approach that has sort of replaced TDT?
- GWAS
What are GWAS?
- Genome-Wide Association Studies
- Compares the allele frequencies at hundreds of thousands of SNP markers at once –> pick SNPs THROUGHOUT the genome (like a candidate test nbut for 1000s of markers)
What is the FORMAL definition of a GWAS?
“hypothesis free method used to identify regions of the human genome that are associated with a disease or trait of interest, through the analysis of allele frequencies at hundreds of thousands of SNP markers, at once in LARGE populations and samples or two groups of cases and controls
Are GWAS hypothesis free?
- YES!
Once detected, what is association usually confirmed through in GWAS?
-Through replication in INDEPENDENT data sets and/or GWAS meta-analyses
What type of mapping does GWAS require?
-Requires FINE MAPPING and functional characterization for unequivocal identification of risk genes/variants
What are the two main companies using microarray chips in terms of GWAS?
- Illumina
- Affymetrix
What important factors are GWAS dependent on? (5 things)
- (un) relatedness of individuals (but almost impossible!)
- Genetic architecture–> one population might be different to another population
- Population stratification–> major issue–> If you have 50% Asian cases and 50% European controls there will be a bias of alleles!
- Ascertainment of samples
- Genetic model
What three things are involved in the power of GWAS?
- Sample size
- Effect size (beta/OR)
- Allele frequency
What is the most important factor in GWAS?
- The sample size –> can’t do 100 people must do 1 million tests
When you do 1 million tests in GWAS what correction do you need to do?
- The Bonferroni Correction to account for type 1 error
Why is sample size the most important factor in GWAS?
- 80% is the minimal level of statistical power
- P<0.00000005 (5E-8)
- OR=1.2
- CTRL prevalence= 09% (every control has been screened for the disease)
- Control:case ratio= 1 (do NOT want to go lower)
- Risk allele frequency= 0.07 –> Means 7% of chromosomes in population will carry allele
- Additive model–> two copies of the allele are worse than one copy of the allele which is worse than 0 copies of the diseased allele.
Are two genes that make you sick the same as those that make you MORE or LESS sick in terms of GWAS?
- NO
- Not necessarily the same
- 2 sets of genes can be responsible for the RISK of disease and OTHER genes are responsible for the COURSE/PRESENTATION
What is involved in the interpretation of GWAS (3 things)?
-Fine mapping (statistics)
-Coding versus regulatory variants
-Functional annotation (bioinformatics)
E.g. NOD2 first gene identified in IBD