Disease Gene Discovery (Complex Disorders) Flashcards
What are Genetic association studies?
Genetic association studies are used to find candidate genes or genome regions that contribute to a specific disease.
By testing for a correlation between disease status and genetic variation.
Generally achieved by testing cohorts of affected and unaffected individuals.

What is the key difference between linkage and association studies?
- Linkage analysis : Based on ‘within-family’ design analysing sibling pairs or large pedigrees.
- Association studies: In general, are based on ‘case-control’ design that analyses allele frequencies between groups of unrelated cases and unrelated controls.
What types of disease genes are identified in linkage vs association studies?
- Linkage: Individual genes of genes of major effect.
- Association: Genes that have less of a strong effect, and thus maybe multiple genes of lesser effect can be detected.
What is a ‘complex’ disorder?
- Conditions caused by many contributing factors are called complex or multifactorial disorders.
- Many common medical problems such as heart disease, diabetes, and obesity do not have a single genetic cause—they are likely associated with the effects of multiple genes of low impact in combination with lifestyle and environmental factors.
- Although complex disorders often cluster in families, they do not have a clear-cut pattern of inheritance.
What is the key concept that underpins association studies?
Linkage disequilibrium (LD)
LD is the non-random association of alleles at two or more loci with a frequency greater than expected by chance
Explain LD using a mathematical example
If the alleles at locus;
- A are a1 and a2 with frequencies of 0.7 and 0.3
- B are b1 and b2 with frequencies of 0.6 and 0.4
The expected frequencies of the four possible haplotypes would be
- a1b1, 0.42
- a1b2, 0.28
- a2b1, 0.18
- a2b2, 0.12
If a2b2 was found in a population at a frequency of 0.45, this is called linkage disequilibrium between a2 and b2.
What factors might cause LD?
- Linkage disequilibrium may result from selective forces (natural, reproductive ect) or by chance.
- When a new variant arises on a founder chromosome and not much time has elapsed since the mutational event, the new variant will be in linkage disequilibrium with alleles from loci close to the gene.
- If the new variant is disease causing then linkage disequilibrium can be a powerful tool for genetic mapping.
How can Recombination affect LD?
- Recombination – Over time recombination between loci will gradually reduce LD as alleles that were shared on an ancestral chromosome are separated. It can therefore be harder to find LD in older populations. Areas of the genome with lower recombination rate can maintain LD for longer.
How can Gene conversion affect LD?
Gene conversion – Regions of the genome with a low recombination rate or markers that are tightly linked can still lose LD via gene conversion. Markers either side of a gene conversion event may still show LD.
How can Selection affect LD?
Selection – If there is a selective advantage to two alleles coexisting LD between the alleles is more likely to be maintained. A negative selection pressure can remove LD. Selection enables loci on different chromosome to be in LD if loss of one gives a selective disadvantage when the other is still present. Selective sweeps can also lead to a higher than expected distribution of alleles.
How can population structure affect LD?
Population structure – population subdivision can create LD and also maintain LD due to the smaller effective population size. Inbreeding and non-random mating are also likely to alter the expected allele distributions.
How can New mutations affect LD?
New mutation – A high new mutation rate at a locus will make it hard to detect LD. Mutations will arise on different ancestral backgrounds, the phenotypic affect may be the same, but the underlying haplotypes will not.
How can genetic drift, gene flow and population history affect LD?
Genetic drift – Random genetic drift can create and remove LD
Gene flow – The greater the allele frequency differences between populations the greater the LD created when populations join.
Population history- the older the population the shorter the segments of LD
What is the difference between ‘genetic linkage’ and ‘linkage disequalibrium’?
- Loci in LD will often be genetically linked (on the same chromosome)
- But LD can occur even if loci are on different chromosomes (because for some reason the alleles of different chr have become non-randomly associated).
- It is also possible for loci to be linked, but not be in LD (become recombination between the loci has been unrestricted so the distribution of alleles in a population is as expected).
- Linkage and LD are separate phenomena.
What metric is used to represent LD?
- LD is often expressed as D.
- If D=0 there is no association between alleles and the distribution of alleles in the population is as expected and dependent on the allelic frequency.
- If D does not equal zero there is an association between alleles.
- D is usually calculated so that alleles in complete LD will have D=1.
How is D calculated?
- D = (PAB)-(PAxPB)
- D is the difference between;
- PAB, the frequency of gametes carrying the pair of alleles A and B at two loci PAB and
- The product of the frequencies PA and PB
- This current definition refers to AB being a haplotype with PAB being the haplotype frequency (calculated by phasing genotypes in a population via trio analysis).
What are the limitation of using LD to identify disease alleles?
- Calculating LD relies on the allele frequences of markers being out of kilter with the disease allele
- When the allele frequency approaches 1 or 0 it is statistically very difficult to find LD with a marker.
- Hence, a_ssociation studies will often not be able to detect rare disease causing variants,_ even if the effect on the phenotype is large.
What type of association study can be used to assess complex disorders?
- When several genes make small contributions to disease etiology linkage within families is no longer useful.
- Genome-wide association studies (GWAS) utilise technology that allows the assessment of markers accross the entire genome e.g. SNP array
- This method is a “hypothesis free” approach that enables the identification of all locations in the genome that are associated with disease (provided sufficient power).
Why are GWAS referred to as hypothesis free?
Prior to the study you don’t have to know anything about;
- Genetics: MOI, penetrance etc
- Pedigree information
- Approx. genomic location as markers are genome wise.
What are the main stages of performing a GWAS?
- Two groups of participants are recruited to study: people with the disease (cases) and similar people without (controls).
- Each participant is genotyped on genome-wide SNP-array
- If a variant is more frequent in people with the disease, the SNP is said to be “associated” with the disease.
- The associated SNPs are then considered to mark a region of the human genome which influences the risk of disease.
Why are GWAS referred to as phenotype-first studies?
The the participants are classified first by their clinical manifestation(s), as opposed to genotype-first approach.
Why are GWAS referred to as non-candidate-driven studies?
GWA studies investigate the entire genome as oppose to methods which specifically test one or a few genetic regions
When a SNP marker is found to be significantly associated with disease in a GWAS, what four explanations exist for the apparant association?
- The genetic variant measured in the study is indeed important in disease causation
- An association has been found by chance and there is no link at the level of disease causation
- Confounding bias due to population stratification caused by cases and controls being selected from genetically different subsets of a population
- The genetic variant measured in the study is not the true disease-causing variant but is instead in LD with the disease allele.
Role of HapMap project in enabling GWAS?
- The International HapMap Project was an organization that aimed to develop a haplotype map (HapMap) of the human genome (2005/07/09)
- Goal was to identify millions of new SNP loci accross the genome by performing ‘resequencing’ of dozens of trios from populations accross the globe
- This provided knowledge for array manufacturers to build affordable platforms to genotype these SNPs in GWASs
- Provided high resolution information on the common haplotypes in Humans.

