Molecular Genetics - L4, L5, L6 Flashcards
Describe DNA
The DNA molecule consists of two strands that are held apart by pairs of four bases: A, G, C, T - Adenine, Cytosine, Guanine, Thymine. A always pairs with T, and G always pairs with C. The specific pairing of bases in these two-stranded molecules allows DNA to replicate itself and to direct synthesis of proteins.
if using TEDs, who should you cite?
Oliver and Plomin, 2007
describe SNPs
SNPs are by far the most abundant form of genetic variation in the human genome. As their name suggests, they involve a mutation in a single nucleotide.
SNPs typically have two alleles, meaning within a population there are two commonly occurring base-pair possibilities for a SNP location.
SNPs are used as markers of a genomic region, with the large majority of them having a minimal impact on biological systems. This is because most SNPs in coding regions are synonymous; they do not involve a change in amino acid sequence because the SNP involves one of the alternative DNA codes for the same amino acid. SNPs that do have functional consequences are called nonsynonymous, these can change amino acid sequences. However, some synonymous SNPS might have an effect by changing the rate at which mRNA is translated into protein.
why do synonymous SNPs not change amino acid sequence?
because the SNP involves one of the alternative DNA codes for the same amino acid.
what is Single nucleotide polymorphisms (SNP) heritability ?
Single nucleotide polymorphisms (SNP) heritability compares chance genetic similarity across hundreds of thousands of SNPs for each pair of individuals in a matrix of thousands of unrelated individuals. This chance similarity is then used to predict phenotypic similarity for each pair of individuals. This has been achieved thanks to SNP arrays that can genotype hundreds of thousands of SNPs quickly and inexpensively.
“SNP heritability” –doesn’t tell us which SNPs are important, it is a ‘whole genome’ measure
evaluations of GWAS
- Because of the small size of DNA required, microarrays make the method fast and less expensive. This is an advantage in the interim as we wait for whole-genome sequencing to become widely available.
- GWAS are unconstrained by prior hypotheses regarding genetic associations with disease and traits. However, the GWA approach can also be problematic because the massive number of statistical tests performed presents an unprecedented potential for false-positive results, (as well as may identify a number of candidate loci by screening the whole human genome) leading to new stringency in acceptable levels of statistical significance and requirements for replication of findings. Although, this is good as it ensures there are few false positives and more false-negatives (you get a negative test result, but you should have got a positive test result).
- For most of the traits studied, known SNP variants explain only a small proportion of heritability (Manolio et al., 2009), limiting the potential for early application to determine individual disease risk. Because current technology surveys only a limited subset of potentially relevant sequence variation, this should come as no surprise (McCarthy et al., 2008). Through measuring common genetic variants, those that are not well marked by SNPs are missed. Indeed, findings from GWAS have failed to identify SNPs accounting for the variance in intelligence (Benyamin, et al., 2013).
- The inability to account for the remaining heritability estimate reflects the ‘missing heritability problem’ (Maher, 2008). The missing heritability problem relates to the difference between heritability estimates and the variance explained, and is a prominent issue in genome-wide association (GWA) research – lead onto other methods. One of many possible reasons for the missing-heritability problem is that the common SNPs incorporated in commercially available DNA arrays miss the contribution of rare DNA variants. Another possibility is that heritability has been overestimated by twin and adoption studies.
- GWA studies rely on the “common disease, common variant” hypothesis, which suggests that genetic influences on many common diseases will be at least partly attributable to a limited number of allelic variants present in more than 1% to 5% of the population. Many important disease causing variants may be rarer than this and are unlikely to be detected with this approach.
- Moreover, the confirmed signals emerging from GWA scans and subsequent replication efforts are just that — association signals. Cannot infer causal variants. Thus, many of the greatest challenges to be faced in the years ahead lie not so much in the identification of the association signals themselves, but in defining the molecular mechanisms through which they influence disease risk and/or phenotypic expression.
The limited information available about environmental exposures and other non-genetic risk factors in GWA studies will make it difficult to identify gene-environment interactions or modification of gene-disease associations in the presence of environmental factors – Pearson and Manolio, 2008
What have GWAS told us: Intelligence
Davies et al., (2011) - conducted a genome-wide analysis of over 3500 unrelated adults with data on nearly 550, 000 single nucleotide polymorphisms (SNPs) and detailed phenotypes on cognitive traits, namely crystallized and fluid intelligence
=They show for the first time that a substantial proportion (approximately 40 to 50%) of variation in human intelligence is accounted for by linkage disequilibrium between genotyped common SNP markers and unknown causal variants. 40% of the variation for crystallized-type intelligence and 51% of the variation in fluid-type intelligence between individuals
The method used here does not attempt to test the effects of single SNPs; rather, it tests their accumulated effects. It estimates the joint effect of genotyped SNPs and that effect reflects their linkage disequilibrium (LD) with unknown causal variants
=In a subsample, they found that only 1% (approximately) of the variance was explained in the prediction analysis due to the individual SNP effects being very small.. Our finding that 40-50% of phenotypic variation is explained by all SNPs is fully consistent with the low precision of a predictor based upon a discovery sample of ~3,500 individuals
What is crystalised vs fluid intelligence?
Crystallized-type intelligence is typically assessed using tests of acquired knowledge, and most often through tests of vocabulary. Fluid-type intelligence tends to involve unfamiliar, sometimes abstract, materials, to involve on-the-spot thinking, to be completed under time pressure, and to rely relatively little on prior knowledge.
Meta-analysis of intelligence - what are the benefits/cons? then describe study
Joint (meta) analysis of data from comparable GWAS provide a low-cost approach to enhance power for detecting effects. - One problem with meta-analysis is that different studies use different SNP chips – they might not have all the same SNPs. To overcome this, we can use knowledge of the correlation structure between SNPs to predict the genotypes that are missing. This is known as imputation and is crucial for enabling meta-analysis in GWAS.
Sniekers et al., 2017 - report a meta-analysis for intelligence of 78,308 individuals from 13 cohorts.
8 out of the 13 cohorts consisted of children and 5 of adults. They first meta-analysed the children- and adult-based cohorts separately and subsequently calculated the rg using LD Score regression. The estimated rg was 0.89, indicating substantial overlap between the genetic variants influencing intelligence in childhood and adulthood, and warranting a combined meta-analysis.
=They identify 336 single nucleotide polymorphisms (SNPs) in 18 genomic loci, of which 15 are novel. Roughly half are located inside a gene, implicating 22 genes, of which 11 are novel findings.
=they calculated the variance explained in intelligence by the GWAS results in four independent samples and found that up to 4.8% of the variance in intelligence could be explained. These GWAS results finally broke the 1% barrier of previous GWAS of intelligence.
multi-trait analysis of genome-wide association (MTAG) - who? cite study too
Turley 2017
MTAG allows the meta-analysis of summary statistics from genetically-related traits and was used by Hill et al., (2018):
Hill et al., (2018) investigated intelligence in relation to its inter-correlated variable, education, by combining two large GWASs of education and intelligence. In doing so, the study produced the largest GWAS of intelligence to date on nearly 250,000 participants.
=187 independent loci and 538 genes were associated with intelligence and the estimated variance is 2% higher than the meta-analysis by Sniekers, et al., (2017).
Explanations of the diff between hill and sniekers and what they tell us
- One explanation for these differences could be that the smaller sample sizes of the GWAS within the meta-analysis by Sniekers et al., (2017) hampered the overall power to detect genetic effects.
- Alternatively, using education as a proxy phenotype for intelligence may have increased the chances of detecting genetic effects of intelligence (Rietveld, et al., 2014). This assumption is supported by the genetic correlation of 1 that was derived when the findings from Hill et al., (2017) were compared to the intelligence dataset from Sniekers et al., (2017). This correlation suggests that the results reflect intelligence findings reported elsewhere, and do not just reflect the average of education and intelligence combined.
- The GWAS for education used by Hill et al., (2017) was based on the number of completed schooling years (Okbay et al., 2016). This is a useful proxy as it is measured in large numbers.
- The overlap of genetic influences for education and intelligence suggest that the genetic effects contributing to the variance in intelligence are not specific to intelligence but are pleiotropic, meaning that each gene implicates different phenotypes. These pleiotropic effects in combination with the polygenic nature of intelligence have led researchers to predict the operation of generalist genes (Haworth, Kovas, Dale & Plomin, 2008). This generalist hypothesis is supported by findings from GCTA which have revealed high genetic correlations among g and other learning abilities (Trzaskowski, et al., 2013). Similarly, twin research has revealed that the genetic effects underlying cognitive abilities can be accounted for by a higher-level general intelligence construct (Panizzon, et al. 2014). These findings demonstrate that different measures can be used to investigate intelligence (Johnson et al, 2008). This is significant as it allows GWAS to combine studies with different measures to accumulate larger sample sizes.
Once these generalist genes are discovered, they will provide an opportunity for research into the pathways between the brain, genes and behaviour (Kovas & Plomin, 2006). Intelligence therefore stands as a good target phenotype for future gene-hunting (Plomin & Deary, 2015). The data derived from GWAS have allowed researchers to combine methods like GTAC and polygenic scores, to explain more of the variance in intelligence.
what are polygenic scores? cite
Because their effects are miniscule, a single common SNP is of little use for prediction. For this reason, the future of genetic prediction lies with polygenic scores that aggregate the effects of thousands of SNPs discovered by GWAS, including variants that do not achieve genome-wide significance (Dudbridge, 2013).
what is the Common Disease Common Variant Hypothesis ? cite
The Common Disease Common Variant Hypothesis is the idea that common disorders are likely influenced by genetic variation that is also common in the population and have a different underlying genetic architecture than rare disorders. For most common diseases, the CD/CV hypothesis is true, though it should not be assumed that the entire genetic component of any common disease is due to common alleles only (Bush & Moore, 2012).
linkage disequilibirum - what is it, what are the two possibilities, why could this be a limitation of GWAS?
Linkage disequilibrium (LD) is a property of SNPs on a contiguous stretch of genomic sequence that describes the degree to which an allele of one SNP is inherited or correlated with an allele of another SNP within a population. The presence of LD creates two possible positive outcomes from a genetic association study: 1) The SNP influencing a biological system that ultimately leads to the phenotype is directly genotyped in the study and found to be statistically associated with the trait. This is referred to as a direct association, and the genotyped SNP is sometimes referred to as the functional SNP. 2) The second possibility is that the influential SNP is not directly typed, but instead a tag SNP in high LD with the influential SNP is typed and statistically associated to the phenotype. This is referred to as an indirect association.
Because of these two possibilities, a significant SNP association from a GWAS should not be assumed as the causal variant and may require additional studies to map the precise location of the influential SNP.
what is a microarray?
Microarrays can genotype millions of SNPs in parallel quickly and inexpensively. A microarray is a tiny slide that is dotted with short single‐stranded DNA sequences called probes.
Microarrays can be customized to genotype DNA variants for disease
More than 80% of associations found in GWA studies fall outside coding regions (CITE)
Manolio et al., 2009
give 2 reasons why twin estimates may be higher than GCTA
The higher estimates of twin studies suggest the possibility that twin heritability estimates are inflated. One argument against this possibility is that twin-based heritability estimates for cognitive abilities are in line with estimates from adoption and family studies, even though the adoption and family designs have different assumptions than the twin design does (Plomin et al., 2013).
A specific reason why GCTA heritability estimates might be lower than twin-based estimates is that GCTA estimates only additive genetic effects, whereas twin estimates include nonadditive as well as additive effects of genes. Additive genetic effects are caused by the independent effects of alleles, which add up in their effect on a trait; nonadditive genetic effects are those that interact. Because GCTA adds up the effect of each SNP, it does not include gene-gene interaction effects; the twin method captures nonadditive as well as additive genetic effects because the DNA sequence of identical twins is virtually identical and thus they share all genetic effects, including nonadditive ones.
describe and cite study that compared twin and GCTA directly
Plomin et al., 2013 – directly compared GCTA estimates of the heritability of cognitive abilities with heritability estimates obtained with the classical twin design—using the same sample assessed at the same age with the same measures of diverse cognitive abilities including language, verbal, nonverbal, and general.
The sample included 12-year-old twins from TEDS (Oliver and Plomin, 2007) one member of each pair had been genotyped.
Composite scores were created for each ability (verbal and non-verbal cognitive ability and general cognitive ability), all of which were assessed via Web-based testing. Heritability was estimated from twin data using standard model fitting.
= the DNA array yielded GCTA estimates that accounted on average for .66 of the twin heritability estimates for language, verbal, nonverbal, and general cognitive abilities.
= If valid, this finding suggests that general cognitive ability is a good candidate for narrowing the missing-heritability gap using the common SNPs on current DNA arrays with much larger samples. This is fortunate because far more GWA data are available for general cognitive ability than for other cognitive abilities.