association analysis Flashcards

Question 1

Q

Genetic Association

Answer

A

presence of a variant allele in the genome at a higher frequency in unrelated subjects with a particular disease of interest/trait (cases) compared to those that do not have the trait/disease (controls)

Question 2

Q

with disease =

Question 3

Q

without disease =

Question 4

Q

Controls vs case in association analysis

Answer

A

must be identical to the cases APART from not having the disease = well matched

e.g. same age, sex, ethnicity, location etc.

Question 5

Q

Difference in variant frequency between cases and controls

Answer

A

in cases, the gene variant is at a higher frequency than in the controls and is associated with the disease

Question 6

Q

What confirms the strength of association between a gene variant and the disease?

Answer

A

Statistics e.g. p-value

Question 7

Q

Quality case-control genetic studies must have:

Answer

A

large numbers of well defined cases (1000s)
equal numbers of matched controls
reliable genotyping technology (SNP array)
standard statistical analysis (PLINK)
positive associations should be replicated

Question 8

Q

What allows us to capture genetic diversity?

Answer

A

The use of genetic markers

Question 9

Q

Features of an ideal genetic marker (e.g. SNP)

Answer

A

polymorphic (more than 1 form)
randomly distributed across the genome
fixed location in genome
frequent in the genome
frequent in the population
stable with time
easy to assay (genotype)

Question 10

Q

How are SNPs generated?

Answer

A

though mismatch repair during DNA replication

Question 11

Q

Possible location of an SNP

Answer

A

Gene (coding region)

no amino acid change (synonymous)
amino acid change (non-synonymous)
new stop codon (nonsense)

Gene (non-coding region)

promoter: mRNA and protein changed
terminator: mRNA and protein changed
splice site: altered mRNAm altered protein

Intergenic Region
-98% of SNPs in this region

Question 12

Q

dbSNP

Answer

A

online database of SNPs and multiple small-scale variations that include insertions/deletions, microsatellites and non-polymorphic variants

Question 13

Q

Minor allele frequency (MAF)

Answer

A

the frequency of the less common variant in a population

SNP will have two alleles:

major allele
minor allele

they must add up to 1

Question 14

Q

Genome Wide Association Studies (GWAS)

Answer

A

is the presence of an allele. at a higher frequency in unrelated subjects with a particular trait, compared to those that do not have the trait

recruit large numbers of cases and controls
SNP markers are used across the whole genome, and these are genotyped using SNP microarrays.
We look for association between disease and each marker by doing a chi-square test

Question 15

Q

Steps of GWAS

Answer

A

Obtain DNA from people with disease of interest (cases) and unaffected people (controls)

Run each DNA sample on a SNP chip to measure genotypes at 300,000-1,000,000 SNPs in cases and controls

Identify SNPs where one allele is significantly more common in cases than controls
-the SNP is associated with disease

Question 16

Q

How is GWAS data presented?

Answer

Study These Flashcards

A

on a Manhattan plot

Question 17

Q

what is a Manhattan plot in relation to the GWAS results

Answer

Study These Flashcards

A

The end result from GWAS that shows the score for each SNP marker tested in the study, across each chromosome in the genome - a peak indicates that the location of that particular SNP may be very close to a disease-related locus - low p-value = high peak –> reject null hypothesis for that SNP - shows evidence of correlation of each marker tested with a disease phenotype across the genotype

*highest point is MOST significant SNP with disease state

X-axis is position of SNP on chromosome

Y-axis is -log10(p-value) of the association between each marker and the disease, calculated using a chi-squared test

Question 18

Q

Why are chromosomes more solid at the bottom of the Manhattan plot?

Answer

Study These Flashcards

A

because the vast majority of SNPs are not associated with disease

*associations are further up the graph

Question 19

Q

What does the peak on a Manhattan plot show?

Answer

Study These Flashcards

A

The peak does not identify the gene causing the disease, instead it only identifies the genomic REGION associated with the disease and this is usually very small (<100kb)

-if you zoom in on an SNP with strong disease association (high up in plot), you will find lots of other associated SNPs in the adjacent gene and therefore more work has to be done to work out which SNP is correct and where the actual disease-causing variant is

Question 20

Q

Disadvantage of GWAS + what is meta analysis

Answer

Study These Flashcards

A

Difficult to do very large association studies (>1K cases), and therefore meta-analysis of GWAS is done to combine statistical results from multiple smaller studies

Pre-experiment: consortium
Post-experiment: meta analysis

Question 21

Q

Problems with GWAS

Answer

Study These Flashcards

A

Expensive

requires large sample size,

only explains a small part of the disease phenotype, meaning their contribution to the genetic component of the disease is estimated to be low (<5%), could be because:

many common SNPs of small effect
rare SNPs
don’t look at copy number variation or epigenetic variation

Question 22

Q

Common Obesity GWAS

Answer

Study These Flashcards

A

Obesity is strongly genetic

Though GWAS we can clearly see genes associated with obesity

association analysis Flashcards

(22 cards)