Genetics of common disease Flashcards
Can linkage analysis only be used to track the inheritance of rare genetic variants that cause disease? Why or why not?
- No Linkage analysis can alos be used to track inheritance of common genetic variants that cause disease
- This is because genetic variants that cause common disease also segregate with variants which are tightly linked to their region of the chromosome.
What are the differences between common disease and mendelian disease in terms of their causes and their inheritance?
- Common diseases are caused mutations in multiple genes while mendelian disease is caused by a mutation in one gene
- Common diseases are also influenced by multiple environmental factors while mendelian disease isn’t affceted by environmental factors
- Inheritance patterns of common diseases are not clear unlike in Mendelian disease
What does the common disease common variant hypothesis state?
- It states that common diseases are likely to be influenced by genetic variation that is also common in the population.
- Also states that common variants that cause common diseases must have a small genetic effect on the phenotype, otherwise a larger proportion of the population would have these common diseases.
What does the fact that common disease is caused by common variants mean for the penetrance of those variants?
It means that the penetrance (effect size) for any single common variant must be smaller than that of any single rare variant.
What is heritability?
- A measure of how well differences in people’s genes (genotype) account for differences in their traits (phenotype).
- Heritability is assigned a score between 0 and 1
What does a heritability score of one mean?
A heritability score of one indicates that all of the variability in a trait comes from genetic differences, with no contribution from environmental factors.
What heritability score is generally considered high enough for a particular trait to be worth studying? Why is this?
- heritability score above 0.4
- This is because there’s enough variation in that trait to be detected
Why are twin studies used to calculate heritability scores?
Because twins have similar environments so theroretically any differences between the twins that you see in a trait will be due to differences within their genes.
Explain how to carry out a twin study?
- Measure the concordance of a treat in both monozygotic twins and then measure the concordance of that same trait in both zygotic twins
- You then plot these concordance measurements onto a graph
- A Heritability score, the difference between the concordance of monozygotic twins and dizygotic twins, can then be calculated from the study

In a twin study what does a trait having a high concordance mean?
It means that a trait is more similar between the individual twins that are studied
In a twin study, if there’s a large difference between the concordance of a trait between monozygotic and dizygotic twins what does this mean?
- The bigger the difference in concordance between dizygotic twins compared to monozygotic twins, the more that trait is determined by differences in genetics

Why does a large difference in concordance of a trait between monozygotic and dizygotic twins mean that there are differences in the genes associated with that trait between the 2 types of twins?
- This is because monozygotic twins share 100% of their genetics with each other while dizygotic twins only share 50% of their genetics with each other
- Also, both the monozygotic and dizygotic twins will have been exposed to very similar environmental factors
- This means that the only real reason for the difference in concordance of that trait is due to the difference in the amount of genetics that are shared between the 2 types of twins
What is the most common type of genetic variation within the human genome?
Single nucleotide polymorphisms/single nucleotide variants
What is a Genome-wide association study?
Study of common variants across the genomes of a number of individuals, both with and without a common trait (e.g. a disease) to see which common variants are associated with a particular trait
Apart from performing whole genome sequencing, what other technique can be used to perform a genome wide association study?
Genome wide SNP microarrays
How is SNP microarray data analysed inititially?
- The SNP microarry allows for the genotype of every SNP on the array to be identified
- This then allows for data to be produced which groups the genotype for every SNP into 3 distinct genotypes, homozgous for one SNP allele; homozygous for the other SNP allele or heterozygous, for every single person that’s sequenced

Once microarry data is used to place individuals into 3 distinct genotypes based on the alleles they possess for each SNP, how is the data analysed further?
SNP microarray data is then converted into a binary code

Below is an example of the binary code that SNP microarray data is converted into, what does each number mean?

- 0 means that a person doesn’t carry any copies of the rare allele at that position.
- 1 means that a person carries one copy of the rare allele
- 2 means that a person has 2 copies of the rare allele
For SNP microarray data is there any significance to the way the particular alleles for a SNP are presented?
The most common allele for any position where a SNP occurs is stated first while the less common allele is stated afterwards
Why is SNP microarray data able to identify the genotype for every single SNP within a person’s genome despite the fact that not every single SNP is assayed?
- Because we know that particular SNPs are inherited together so if a particular SNP is identified on the SNP microarray then you are able to say that a person’s genome also contains SNPs associated with the one found on the array
- This is known as SNP-SNP association, or linkage disequilibrium (LD)
Define linkage disequilibrium
The difference between the observed frequency of a particular combination of alleles at two loci and the frequency expected for random association (e.g. homologous recombination).
How is the lnkage disequilibrium between two SNPS affected by the physical distance of those SNPs on a chromosome?
Linkage disequilibrium between two SNPs decreases with physical distance on a chromosome as the further apart they are they more likely those SNPs will be seperated/inherited separately as a result of a recombination event between them.

Apart from physical distance between 2 SNPs what else can affect the linkage disequilibrium between 2 SNPs?
- Region of genome in which the SNPs are located
- E.g. recombination hotspots - where a large amount of recombination occurs
Where are most SNPs/SNVs located within the genome? Why is this?
- Most SNPs/SNVs are located within non-coding regions of the genome
- The reason for this is because there is a higher selection pressure within the exome (coding region) compared to the non-coding regions and the reason why that is is to make sure that harmful mutations aren’t incorporated into the exome as to not affect protein function.
Why is there a higher selection pressure on variants/mutations within the coding regions compared to the non-coding regions of the genome?
To make sure that harmful mutations aren’t incorporated into the exome as to not affect protein function
Since most SNPs/SNVs are found in non-coding regions what does this mean for the way they can affect the expression of a particular gene?
It means that although most SNPs can’t affect whether a particular gene is expressed or not they can affect how much a particular gene is expressed.
How can GWAS data be analysed?
- Once you Identify the genotypes of a SNP associated with a particular trait for all the individuals being studied, you convert each genotype into an allele dosage (encoding each genotype).
- You then plot a graph of the measurement of the observed trait against the allele dosage (genotype).

When analysing GWAS data a graph of the genotype for a particular SNP against the measurement of an observed trait can be produced. What can this graph be used to calculate?
- Can be used to calculate the effect size for that variant/allele - This tells you the effect of having the less common allele has on a particular trait per copy of the less common allele for a particular SNP
- Can also be used to calculate the P-value for that SNP - indicates the significance of the association between a trait and a SNP
What is a manhattan plot?
It’s a graph used to present GWAS results by plotting the -log10(p-value) for every SNP against the position in the genome to see if a particular SNP is significantly associated with a particular trait.

What is the Bonferroni correction?
- Used to produce a threshold significance value in a GWAS study that indicates that a SNP must be associated with the trait if its P value is above the threshold
- If the number of tests (SNPs genotyped) is n, we set the threshold to be 0.05/n
If you identify a SNP that is significantly associated with disease, what are the three possibilities for why that result was produced?
- There is a causal relationship between SNP and disease
- The marker is in linkage disequilibrium with a causal locus (linked to SNP that causes the disease)
- False positive
What method could you use verify the results of a GWAS study?
Repeat the study with the same/larger population size compared to the first study and see if the result was replicated.