GWAS studies Flashcards

1
Q

What are Genome-wide association studies asking?

A

Is that variant more common in people with a disease or trait, or people without?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What do GWAS look for?

A

Genome-wide association studies (GWAS) involve testing genetic variants across the genomes of many individuals to identify genotype–phenotype associations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Describe basic outline of GWAS

A

First assemble a large set of study participants

Genotype them at all the genome locations youre interested in.

Calculate frequency of each genotype in cases and in controls (thus the Odds ratio)

The calculate astatistical significance for the Odds ratio for each variant to decide if it is involved or not.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Whats another term for Common variants

A

SIngle Nucleotide Polymorphisms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

100 genomes project, gnomAD and TopMed projects

A

Large population studies.
Have surveyed large numbers of individuals to identify all the places they vary.

They have settled on there being around 15 million common SNPs in the genome.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Why is there a problem with there being 15 million common SNPs?

What is the solution?

A

How do you genotype 15 million SNPs?
GZenome sequencing is too expensive.
Microarray technology is cheap, but common commercially avaliable arrays can only type 1 million variants.

Linakge Disequilibrium

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is linkage disequilibruim?

A

Phenomenon that alleles don’t assort randomly between generation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a haplotype?

A

A haplotype is a group of alleles in an organism that are inherited together from a single parent.

Alleles at variants close together on the same chromosome tend to occur together more often than is expected by chance. These blocks of alleles are called haplotypes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are haplotype blocks and what are they used for?

A

a haplotype block is a region of an organism’s genome in which there is little evidence of a history of genetic recombination

haplotype blocks have been used to increase the power of QTL (quantitative trait loci) detection in genome-wide association studies (GWAS) and the prediction accuracy with genomic selection

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are tag-SNP’s?

A

A single nucleotide polymorphism, or SNP, that is used to “tag” a particular haplotype in a region of the genome.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are recombination hotspots and what does this mean?

A

Recombination isnt random in humans, but happens mostly at recombination hotspots, so as alleles arise, they are rarely seprated from the alleles they are associated with.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What did the HapMap project do?
Why was this important?

A

Mapped these haplotype blocks across a range of human populations.

It’s now possible to genotype a million SNPs that could tag most of the SNPs in the human genome.

Commercial arrays created that allowed researchers to quickly and cheaply genotype these SNPs, effectively genotyping all SNPs present in more than 1% of the population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is genotype imputation?

A

Genotype imputation is a process of estimating missing genotypes from the haplotype or genotype reference panel. It can effectively boost the power of detecting single nucleotide polymorphisms (SNPs) in genome-wide association studies, integrate multi-studies for meta-analysis, and be applied in fine-mapping studies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What problem does this reveal with tag-SNPs?

A

If it’s found that a tag-SNP is associated with a disease, but anyone with the risk allele at the tag SNP, also has variants at all the other positions in LD with the tag-SNP, it can’t be known which of those SNPs is actually associated with the disease.

This is a big problem as haplotype blocks may cover many kb. They can contain multiple genes and potential regulatory regions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Once we have genotyped individuals at all common SNPs - what is the next problem?

A

It must be decided if an SNP is statistically associated with a trait of disease.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What was a problem in early human genetic studies?

A
  • Using statistical test to analyse counts of genotypes in cases and controls
  • Normally 0.05 threshold used.
  • 5% chance of happening by chance.
  • If looking at 1 million SNPs, expect 5% to happen by chance.
  • This means 50,000 flase positives.
17
Q

What is used now to reduce false positives in statisical studies?

A

A stricter threshold is used.
p< 10^-8.
Any SNP with a p value of less than this has achieved ‘genome wide significance’.

18
Q

What is the problem with this stricter threshold?

A

Just because we think that any SNP that makes the ‘genome-wide significance’ threshold is definitely associated with the disease, that doesn’t mean we think that all those that don’t definitelt aren’t associated.

For smaller P-values, many even most of them are real - we don’t know how many there are or which are real.

19
Q

What do we know?

A

Larger the effect, the smaller the p value. The larger the sample size, the smaller the p-value. This means GWAS studies require very large sample sizes to be able to detect genes.

20
Q

What are the two ways you can design a SWAS study?

A

Case-control

Cohort design

21
Q

Describe a case-control study

A
  • Collection of peoople with thw trait or disease and select equal number of people matched by age, gender, ethnicity, genetic ancestry
  • Make participants as homogeneous a possible, so that only differees are whether they have the disease or not.
22
Q

Describe the cohort study

A
  • Recruit very large, general-purpose study population and collect many different phenotypes about them as well as their genotype.
  • SUch populations can be prespective - that is you recruit participants before you even know if they will develop the disease or not.
  • Such a population can be used to study many different diseases and traits, but the frequency of any given trait or disease is probably low.
23
Q

Given an example of a Cohort study

A

BioBank - has half a million participants, all genotyped and with detailed phenotypic information recoreded.

24
Q

Whats a problem in cohort study?

A

How do you know that hose with the disease are well matched in other ways to those without?

25
Q

Example to illustate problem with cohort studies

A
  • Use of chopsticks
  • Two replicated studies found a strong, reproducible variant that affects a students ability to use chopsticks.
  • What was not accounted for was that some students were from east asian background where chopstick is culturally normal, while other from background where people do not use chopsticks regularly from young age.
  • A gene was found that distinguished east-asian genetic heritage from toher background, but had nothing mechanistically to do with chopstick usage, which is a culturally determined trait.
26
Q

What is population stratification?

A

Where the average genetic background og cases is different to those of controls and/or correlates with the trait being studied.

27
Q

What is PCA?

A

Tries to reduce genotype at 1 million loci to a smaller number of important dimensions.

28
Q

What does actual size of sample required depend on?

A

How big an effect the SNPs you are looking for have.

29
Q

What is meta-analysis?

A

a statistical process that combines the data of multiple studies to find common results and to identify overall trends.

Meta-analysis is a quantitative, formal, epidemiological study design used to systematically assess the results of previous research to derive conclusions about that body of research.

30
Q

What is DIAgram consortium

A

Consortium of groups studying the genetics basis of Type II Diabetes.

31
Q

What are haplotype blocks used for?

A

Haplotype blocks have been used to increase the power of QTL (quantitative trait loci) detection in genome-wide association studies (GWAS) and the prediction accuracy with genomic selection

32
Q

type 2 diabetes

A

Associated with resistance to the action of insulin and is associated with poor diet, BMI and obesity/ Heritability estimates for diabetes range from 30% to about 70% and may differ between age groups and countries.

WHile we know that there are very strong environmental contribustions to diabetes, it is also clear that it runs in families.

33
Q

What sis DIAgram find?

A

Identified 13 loci that reached genome wide significance.
These SNPS had odds ratios between 1.06 and 1.14 (that is, risk allele carriers had between 6% and 14% higher odds of having T2D than non-carriers).

Combined with other SNPs known to affect T2D at the time, accounted for only about 10% of the estimated heritability in diabetes/

34
Q

What are eQTLs

A

expression of quantitative trait loci

SNPs that change the expression of a gene

35
Q

Where are lead SNPs found?

A

found more often than expected by chance in enhancer regions that are active in all sorts of cell types you might expect to be associated with the disease - these are statistical patterns rather than hard rules.