W3 - Association Analysis Flashcards

1
Q

What is Genetic Association?

A

Genetic Association is the presence of an allele at a higher frequency in unrelated subject to the particular trait, compared to those that do not have the trait.

Substituting the word “disease” for “trait” is how we determine whether variant in the genome are associated with a disease.

Disease = cases
Without disease = control

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is used in conducting a case control genetic study?

A

 Large numbers of well-defined cases (10 000s)

 Equal numbers of matched controls

 Reliable genotyping technology (SNP microarray)

 Standard statistical analysis (PLINK)

 Positive associations should be replicated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What features does an ideal genetic marker have?

A

 Polymorphic
 Randomly distributed across the genome
 Fixed location in genome
 Frequent in genome
 Frequent in population
 Stable with time
 Easy to assay (genotype)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a Single Nucleotide Polymorphism (SNP)?

A

 Common in the genome ~1/300 nucleotides
 ~12 million common SNPs identified in human genome
 Generated by mismatch repair during mitosis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How are SNPs formed?

A

CAAGTG is a regular set of bases.
They can be replicated to create another CAAGTG.

However, during replication, a mismatch could occur and create CA-A-GTG. There are mismatch repair systems happening all the time, therefore these can easily be corrected to match the original.

However, these repair systems won’t necessarily know which base is the wrong base from the two mismatched. This means the wrong base can be altered and now the sequence would read CAGGTG.

The A and the G would therefore be SNPs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Where might you find a SNP?

A

Gene (coding region)
 No amino acid change (synonymous)
 Amino acid change (non-synonymous)
 New stop codon (nonsense)

Gene (non-coding region)
 Promoter – mRNA and protein level changed
 Terminator - mRNA and protein level changed
 Splice site – Altered mRNA, altered protein

Intergenic region (98% of genome)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How do you find out what SNPs are?

A

Online database at NCBI - dbNCBI
rs number is a unique identifier given to each SNP.
The SNP will be shown with unique flanking sequence on either side.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is minor allele frequency?

A

SNPs have two forms eg. [C/G]
C 0.557
G 0.433
The less common allele is called the “minor allele”.

Major allele frequency + minor allele frequency = 1

These can be used to work out the alleles of populations!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the significance of SNP Minor Allele frequency?

A

 SNPs are chosen for genetic association studies on
the basis of their MAF
 Common diseases are likely to be caused by common
variants
 SNPs with MAF >0.05 (5%) are usually used in
association studies - GWAS
 Exceptions are known monogenic disease SNPs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the Genome Wide Association Study (GWAS) ?

A

 Recruit large numbers of cases and controls
 Genotype markers across the whole genome
 SNP Microarrays
 Look for association between disease and alleles of each marker –
chi-squared test
 Positive association is at p<5x10-8 (multiple testing correction)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a Manhattan Plot?

A

 GWAS results are presented as a single graph called a
Manhattan plot
 All results are plotted, typically for >1M SNPs
 X-axis is position of the SNP on the chromosome
 Y-axis is –log10(p-value) of the association

Green peaks indicate significant p-values. The peak of association does not identify the gene causing the disease. The peak identifies the genomic region associated with disease and this is usually smaller than 100kb.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is Regional association plot?

A

Chromosome 1 position (x axis) against recombination rate (y axis) and -log10P (other side of y axis).

Diamond shaped plots narrowing down on the region where the problematic gene is.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is Meta Analysis?

A

 Difficult to do large studies (>1K cases/controls)
 Easier to combine smaller studies
 Pre-experiment – Consortium
 Post-experiment – Meta-analysis
 Meta-analysis allows the statistical combination of
results from multiple studies

Used to expand knowledge and data across the world.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are some problems with GWAS?

A

 GWAS has identified associations that are statistically
strong and reproducible
 However, their contribution to the genetic component
of disease is estimated to be low (<5%)
 Possible answers:
 Many common SNPs of very small effect
 Rare SNPs
 Copy Number Variation
 Epigenetic variation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is linkage disequilibrium?

A

 LD is when two alleles are inherited together more often than expected by
chance
 This is usually because they are close together in the genome
 Alleles that are physically close together are rarely separated by
recombination

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is linkage disequilibrium?

A

 LD is when two alleles are inherited together more often than expected by
chance
 This is usually because they are close together in the genome
 Alleles that are physically close together are rarely separated by
recombination