association analysis Flashcards

1
Q

Genetic Association

A

presence of a variant allele in the genome at a higher frequency in unrelated subjects with a particular disease of interest/trait (cases) compared to those that do not have the trait/disease (controls)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

with disease =

A

cases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

without disease =

A

controls

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Controls vs case in association analysis

A

must be identical to the cases APART from not having the disease = well matched

e.g. same age, sex, ethnicity, location etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Difference in variant frequency between cases and controls

A

in cases, the gene variant is at a higher frequency than in the controls and is associated with the disease

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What confirms the strength of association between a gene variant and the disease?

A

Statistics e.g. p-value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Quality case-control genetic studies must have:

A
  • large numbers of well defined cases (1000s)
  • equal numbers of matched controls
  • reliable genotyping technology (SNP array)
  • standard statistical analysis (PLINK)
  • positive associations should be replicated
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What allows us to capture genetic diversity?

A

The use of genetic markers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Features of an ideal genetic marker (e.g. SNP)

A
  • polymorphic (more than 1 form)
  • randomly distributed across the genome
  • fixed location in genome
  • frequent in the genome
  • frequent in the population
  • stable with time
  • easy to assay (genotype)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How are SNPs generated?

A

though mismatch repair during DNA replication

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Possible location of an SNP

A

Gene (coding region)

  • no amino acid change (synonymous)
  • amino acid change (non-synonymous)
  • new stop codon (nonsense)

Gene (non-coding region)

  • promoter: mRNA and protein changed
  • terminator: mRNA and protein changed
  • splice site: altered mRNAm altered protein

Intergenic Region
-98% of SNPs in this region

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

dbSNP

A

online database of SNPs and multiple small-scale variations that include insertions/deletions, microsatellites and non-polymorphic variants

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Minor allele frequency (MAF)

A

the frequency of the less common variant in a population

SNP will have two alleles:

  • major allele
  • minor allele

they must add up to 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Genome Wide Association Studies (GWAS)

A

is the presence of an allele. at a higher frequency in unrelated subjects with a particular trait, compared to those that do not have the trait

  • recruit large numbers of cases and controls
  • SNP markers are used across the whole genome, and these are genotyped using SNP microarrays.
  • We look for association between disease and each marker by doing a chi-square test
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Steps of GWAS

A

Obtain DNA from people with disease of interest (cases) and unaffected people (controls)

Run each DNA sample on a SNP chip to measure genotypes at 300,000-1,000,000 SNPs in cases and controls

Identify SNPs where one allele is significantly more common in cases than controls
-the SNP is associated with disease

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How is GWAS data presented?

A

on a Manhattan plot

17
Q

what is a Manhattan plot in relation to the GWAS results

A

The end result from GWAS that shows the score for each SNP marker tested in the study, across each chromosome in the genome - a peak indicates that the location of that particular SNP may be very close to a disease-related locus - low p-value = high peak –> reject null hypothesis for that SNP - shows evidence of correlation of each marker tested with a disease phenotype across the genotype

*highest point is MOST significant SNP with disease state

X-axis is position of SNP on chromosome

Y-axis is -log10(p-value) of the association between each marker and the disease, calculated using a chi-squared test

18
Q

Why are chromosomes more solid at the bottom of the Manhattan plot?

A

because the vast majority of SNPs are not associated with disease

*associations are further up the graph

19
Q

What does the peak on a Manhattan plot show?

A

The peak does not identify the gene causing the disease, instead it only identifies the genomic REGION associated with the disease and this is usually very small (<100kb)

-if you zoom in on an SNP with strong disease association (high up in plot), you will find lots of other associated SNPs in the adjacent gene and therefore more work has to be done to work out which SNP is correct and where the actual disease-causing variant is

20
Q

Disadvantage of GWAS + what is meta analysis

A

Difficult to do very large association studies (>1K cases), and therefore meta-analysis of GWAS is done to combine statistical results from multiple smaller studies

Pre-experiment: consortium
Post-experiment: meta analysis

21
Q

Problems with GWAS

A

Expensive

requires large sample size,

only explains a small part of the disease phenotype, meaning their contribution to the genetic component of the disease is estimated to be low (<5%), could be because:

  • many common SNPs of small effect
  • rare SNPs
  • don’t look at copy number variation or epigenetic variation
22
Q

Common Obesity GWAS

A

Obesity is strongly genetic

Though GWAS we can clearly see genes associated with obesity