Disease Gene Discovery (Complex Disorders) Flashcards

1
Q

What are Genetic association studies?

A

Genetic association studies are used to find candidate genes or genome regions that contribute to a specific disease.

By testing for a correlation between disease status and genetic variation.

Generally achieved by testing cohorts of affected and unaffected individuals.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the key difference between linkage and association studies?

A
  • Linkage analysis : Based on ‘within-family’ design analysing sibling pairs or large pedigrees.
  • Association studies: In general, are based on ‘case-control’ design that analyses allele frequencies between groups of unrelated cases and unrelated controls.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What types of disease genes are identified in linkage vs association studies?

A
  • Linkage: Individual genes of genes of major effect.
  • Association: Genes that have less of a strong effect, and thus maybe multiple genes of lesser effect can be detected.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a ‘complex’ disorder?

A
  • Conditions caused by many contributing factors are called complex or multifactorial disorders.
  • Many common medical problems such as heart disease, diabetes, and obesity do not have a single genetic cause—they are likely associated with the effects of multiple genes of low impact in combination with lifestyle and environmental factors.
  • Although complex disorders often cluster in families, they do not have a clear-cut pattern of inheritance.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the key concept that underpins association studies?

A

Linkage disequilibrium (LD)

LD is the non-random association of alleles at two or more loci with a frequency greater than expected by chance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Explain LD using a mathematical example

A

If the alleles at locus;

  • A are a1 and a2 with frequencies of 0.7 and 0.3
  • B are b1 and b2 with frequencies of 0.6 and 0.4

The expected frequencies of the four possible haplotypes would be

  • a1b1, 0.42
  • a1b2, 0.28
  • a2b1, 0.18
  • a2b2, 0.12

If a2b2 was found in a population at a frequency of 0.45, this is called linkage disequilibrium between a2 and b2.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What factors might cause LD?

A
  • Linkage disequilibrium may result from selective forces (natural, reproductive ect) or by chance.
  • When a new variant arises on a founder chromosome and not much time has elapsed since the mutational event, the new variant will be in linkage disequilibrium with alleles from loci close to the gene.
  • If the new variant is disease causing then linkage disequilibrium can be a powerful tool for genetic mapping.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How can Recombination affect LD?

A
  • Recombination – Over time recombination between loci will gradually reduce LD as alleles that were shared on an ancestral chromosome are separated. It can therefore be harder to find LD in older populations. Areas of the genome with lower recombination rate can maintain LD for longer.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How can Gene conversion affect LD?

A

Gene conversion – Regions of the genome with a low recombination rate or markers that are tightly linked can still lose LD via gene conversion. Markers either side of a gene conversion event may still show LD.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How can Selection affect LD?

A

Selection – If there is a selective advantage to two alleles coexisting LD between the alleles is more likely to be maintained. A negative selection pressure can remove LD. Selection enables loci on different chromosome to be in LD if loss of one gives a selective disadvantage when the other is still present. Selective sweeps can also lead to a higher than expected distribution of alleles.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How can population structure affect LD?

A

Population structure – population subdivision can create LD and also maintain LD due to the smaller effective population size. Inbreeding and non-random mating are also likely to alter the expected allele distributions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How can New mutations affect LD?

A

New mutation – A high new mutation rate at a locus will make it hard to detect LD. Mutations will arise on different ancestral backgrounds, the phenotypic affect may be the same, but the underlying haplotypes will not.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How can genetic drift, gene flow and population history affect LD?

A

Genetic drift – Random genetic drift can create and remove LD

Gene flow – The greater the allele frequency differences between populations the greater the LD created when populations join.

Population history- the older the population the shorter the segments of LD

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the difference between ‘genetic linkage’ and ‘linkage disequalibrium’?

A
  • Loci in LD will often be genetically linked (on the same chromosome)
  • But LD can occur even if loci are on different chromosomes (because for some reason the alleles of different chr have become non-randomly associated).
  • It is also possible for loci to be linked, but not be in LD (become recombination between the loci has been unrestricted so the distribution of alleles in a population is as expected).
  • Linkage and LD are separate phenomena.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What metric is used to represent LD?

A
  • LD is often expressed as D.
  • If D=0 there is no association between alleles and the distribution of alleles in the population is as expected and dependent on the allelic frequency.
  • If D does not equal zero there is an association between alleles.
  • D is usually calculated so that alleles in complete LD will have D=1.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How is D calculated?

A
  • D = (PAB)-(PAxPB)
  • D is the difference between;
  • PAB, the frequency of gametes carrying the pair of alleles A and B at two loci PAB and
  • The product of the frequencies PA and PB
  • This current definition refers to AB being a haplotype with PAB being the haplotype frequency (calculated by phasing genotypes in a population via trio analysis).
17
Q

What are the limitation of using LD to identify disease alleles?

A
  • Calculating LD relies on the allele frequences of markers being out of kilter with the disease allele
  • When the allele frequency approaches 1 or 0 it is statistically very difficult to find LD with a marker.
  • Hence, a_ssociation studies will often not be able to detect rare disease causing variants,_ even if the effect on the phenotype is large.
18
Q

What type of association study can be used to assess complex disorders?

A
  • When several genes make small contributions to disease etiology linkage within families is no longer useful.
  • Genome-wide association studies (GWAS) utilise technology that allows the assessment of markers accross the entire genome e.g. SNP array
  • This method is a “hypothesis free” approach that enables the identification of all locations in the genome that are associated with disease (provided sufficient power).
19
Q

Why are GWAS referred to as hypothesis free?

A

Prior to the study you don’t have to know anything about;

  1. Genetics: MOI, penetrance etc
  2. Pedigree information
  3. Approx. genomic location as markers are genome wise.
20
Q

What are the main stages of performing a GWAS?

A
  1. Two groups of participants are recruited to study: people with the disease (cases) and similar people without (controls).
  2. Each participant is genotyped on genome-wide SNP-array
  3. If a variant is more frequent in people with the disease, the SNP is said to be “associated” with the disease.
  4. The associated SNPs are then considered to mark a region of the human genome which influences the risk of disease.
21
Q

Why are GWAS referred to as phenotype-first studies?

A

The the participants are classified first by their clinical manifestation(s), as opposed to genotype-first approach.

22
Q

Why are GWAS referred to as non-candidate-driven studies?

A

GWA studies investigate the entire genome as oppose to methods which specifically test one or a few genetic regions

23
Q

When a SNP marker is found to be significantly associated with disease in a GWAS, what four explanations exist for the apparant association?

A
  1. The genetic variant measured in the study is indeed important in disease causation
  2. An association has been found by chance and there is no link at the level of disease causation
  3. Confounding bias due to population stratification caused by cases and controls being selected from genetically different subsets of a population
  4. The genetic variant measured in the study is not the true disease-causing variant but is instead in LD with the disease allele.
24
Q

Role of HapMap project in enabling GWAS?

A
  • The International HapMap Project was an organization that aimed to develop a haplotype map (HapMap) of the human genome (2005/07/09)
  • Goal was to identify millions of new SNP loci accross the genome by performing ‘resequencing’ of dozens of trios from populations accross the globe
  • This provided knowledge for array manufacturers to build affordable platforms to genotype these SNPs in GWASs
  • Provided high resolution information on the common haplotypes in Humans.
25
Q

What is an LD map and where did they come from?

A
  • Data from the HapMap project enabled the production of LD maps.
  • LD maps is a diagram displaying the haplotype diversity of a chromomsal segment.
  • It can be used to visulate the D’ metric betwen any two SNPs on a given strech of chromsomal.
  • Contiguous runs of SNPs with high LD are often referred to as LD-blocks where the is evdence of limited haplotype diversity in the population.
26
Q

What is a tag SNP and what are they used for?

A
  • A tag SNP is a representative single nucleotide polymorphism (SNP) in a region of the genome with high linkage disequilibrium that represents a group of SNPs called a haplotype
  • i.e. A SNP which has very high LD to many other SNPs is knon as a tag SNP.
  • tagSNPs can be used as a proxy for those SNPs in high LD
  • If the other SNPs aren’t genotyped then as long as the genotype of the tagSNP is known one can predict (impute) with high confidence the genotypes of the other SNPs
  • Thus the HapMap project enabled researchers to impute many more genotypes from ther data to take forward into GWAS.
27
Q

How is the P-value in a GWAS calculated?

A
  • P is determined statistically based on the number of samples tested and the difference between data sets. The value represents the probability that the result was detected by chance.
  • The fundamental unit for reporting effect sizes is the odds ratio.
  • When the allele frequency in the case group is much higher than in the control group, the odds ratio is higher than 1, and vice versa
  • A P-value for the significance of the odds ratio is typically calculated using a simple chi-squared test.
28
Q

How is the P-value interpreted and adjusted?

A
  • In a typical significance test, if P=0.05 this would represent a 5% chance of observing the result by chance.
  • Results from a GWAS are only thought to be significant if P is very low due to the large number of SNPs tested at once.
  • For example if 1 million SNPs are genotyped and the cut off for significance was P= 0.05 you might expect 50,000 false positive results.
  • For loci to be significantly associated with disease P usually has be less than 5x10-8.
29
Q

What is a Manhatten plot and how are the used in GWAS?

A

A Manhatten plot represents the P-value of all SNPs accross the genome.

Each dot represents a SNP, with the X-axis showing genomic location and Y-axis showing association level.

30
Q

What is meant by the ‘power’ of a GWAS?

A
  • The power of a study is the ability of a design to pick up associations accurately.
  • The power if often referred to as a percentage of associations that should be detected when risk alleles are above a certain MAF and OR.
31
Q

What factors can affect the power of a GWAS?

A

The power is affected by

  • the frequency of the risk allele in the population,
  • relative risk conferred by the disease-associated allele,
  • LD between genotyped marker and true risk allele,
  • sample size and genetic heterogeneity of the sample population.
  • Most of these variables are not under the control of experimental design,
32
Q

How can the power of a GWAS be improved?

A

Most of these variables are not under the control of experimental design,

  1. Increasing the sample size
  2. Using carefully matched cases and controls will always improve the power of a study
33
Q

What is the common-disease common-variant hypothesis and why is this important?

A
  • The common disease-common variant (often abbreviated CD-CV) hypothesis predicts that common disease-causing alleles, or variants, will be found in all human populations which manifest a given disease.
  • The fudamental role of GWAS is to detect common disease-causing variantation
34
Q

What types of disease alleles are missed with GWAS?

A

Most of the associations found by GWAS studies are;

  • Associations of commons variants which have only a small increased risk of the disease, and have only a small predictive value.
  • In general common variants do not explain much of the heritable variation in diseases.
  • GWAS will not detect rare variants with large or small risks associations.
35
Q

What are the technical critisms of GWAS?

A
  • A major technical critisism of GWAS is the massive number of statistical tests performed presents an unprecedented potential for false-positive results
  • Lack of well defined case and control groups,
  • Insufficient sample size
  • No control for multiple testing
  • No control for population stratification
  • ALL are common problems leading to FP results in GWAS
36
Q

What are the more fundamental critisisms of GWAS and how might these issued be solved in the future?

A
  • GWA studies have attracted fundamental criticism because of their assumption that common genetic variation plays a large role in explaining the heritable variation of common disease.
  • Although it could not have been known prospectively, GWA studies were ultimately not worth the expenditure since they only identify common low risk alleles.
  • Alternative strategies utilising WGS to detect rare variants may be more effective strategies.