Molecular Genetics - L4, L5, L6 Flashcards

1
Q

Describe DNA

A

The DNA molecule consists of two strands that are held apart by pairs of four bases: A, G, C, T - Adenine, Cytosine, Guanine, Thymine. A always pairs with T, and G always pairs with C. The specific pairing of bases in these two-stranded molecules allows DNA to replicate itself and to direct synthesis of proteins.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

if using TEDs, who should you cite?

A

Oliver and Plomin, 2007

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

describe SNPs

A

SNPs are by far the most abundant form of genetic variation in the human genome. As their name suggests, they involve a mutation in a single nucleotide.

SNPs typically have two alleles, meaning within a population there are two commonly occurring base-pair possibilities for a SNP location.

SNPs are used as markers of a genomic region, with the large majority of them having a minimal impact on biological systems. This is because most SNPs in coding regions are synonymous; they do not involve a change in amino acid sequence because the SNP involves one of the alternative DNA codes for the same amino acid. SNPs that do have functional consequences are called nonsynonymous, these can change amino acid sequences. However, some synonymous SNPS might have an effect by changing the rate at which mRNA is translated into protein.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

why do synonymous SNPs not change amino acid sequence?

A

because the SNP involves one of the alternative DNA codes for the same amino acid.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is Single nucleotide polymorphisms (SNP) heritability ?

A

Single nucleotide polymorphisms (SNP) heritability compares chance genetic similarity across hundreds of thousands of SNPs for each pair of individuals in a matrix of thousands of unrelated individuals. This chance similarity is then used to predict phenotypic similarity for each pair of individuals. This has been achieved thanks to SNP arrays that can genotype hundreds of thousands of SNPs quickly and inexpensively.
“SNP heritability” –doesn’t tell us which SNPs are important, it is a ‘whole genome’ measure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

evaluations of GWAS

A
  • Because of the small size of DNA required, microarrays make the method fast and less expensive. This is an advantage in the interim as we wait for whole-genome sequencing to become widely available.
  • GWAS are unconstrained by prior hypotheses regarding genetic associations with disease and traits. However, the GWA approach can also be problematic because the massive number of statistical tests performed presents an unprecedented potential for false-positive results, (as well as may identify a number of candidate loci by screening the whole human genome) leading to new stringency in acceptable levels of statistical significance and requirements for replication of findings. Although, this is good as it ensures there are few false positives and more false-negatives (you get a negative test result, but you should have got a positive test result).
  • For most of the traits studied, known SNP variants explain only a small proportion of heritability (Manolio et al., 2009), limiting the potential for early application to determine individual disease risk. Because current technology surveys only a limited subset of potentially relevant sequence variation, this should come as no surprise (McCarthy et al., 2008). Through measuring common genetic variants, those that are not well marked by SNPs are missed. Indeed, findings from GWAS have failed to identify SNPs accounting for the variance in intelligence (Benyamin, et al., 2013).
  • The inability to account for the remaining heritability estimate reflects the ‘missing heritability problem’ (Maher, 2008). The missing heritability problem relates to the difference between heritability estimates and the variance explained, and is a prominent issue in genome-wide association (GWA) research – lead onto other methods. One of many possible reasons for the missing-heritability problem is that the common SNPs incorporated in commercially available DNA arrays miss the contribution of rare DNA variants. Another possibility is that heritability has been overestimated by twin and adoption studies.
  • GWA studies rely on the “common disease, common variant” hypothesis, which suggests that genetic influences on many common diseases will be at least partly attributable to a limited number of allelic variants present in more than 1% to 5% of the population. Many important disease causing variants may be rarer than this and are unlikely to be detected with this approach.
  • Moreover, the confirmed signals emerging from GWA scans and subsequent replication efforts are just that — association signals. Cannot infer causal variants. Thus, many of the greatest challenges to be faced in the years ahead lie not so much in the identification of the association signals themselves, but in defining the molecular mechanisms through which they influence disease risk and/or phenotypic expression.

The limited information available about environmental exposures and other non-genetic risk factors in GWA studies will make it difficult to identify gene-environment interactions or modification of gene-disease associations in the presence of environmental factors – Pearson and Manolio, 2008

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What have GWAS told us: Intelligence

A

Davies et al., (2011) - conducted a genome-wide analysis of over 3500 unrelated adults with data on nearly 550, 000 single nucleotide polymorphisms (SNPs) and detailed phenotypes on cognitive traits, namely crystallized and fluid intelligence
=They show for the first time that a substantial proportion (approximately 40 to 50%) of variation in human intelligence is accounted for by linkage disequilibrium between genotyped common SNP markers and unknown causal variants. 40% of the variation for crystallized-type intelligence and 51% of the variation in fluid-type intelligence between individuals
The method used here does not attempt to test the effects of single SNPs; rather, it tests their accumulated effects. It estimates the joint effect of genotyped SNPs and that effect reflects their linkage disequilibrium (LD) with unknown causal variants

=In a subsample, they found that only 1% (approximately) of the variance was explained in the prediction analysis due to the individual SNP effects being very small.. Our finding that 40-50% of phenotypic variation is explained by all SNPs is fully consistent with the low precision of a predictor based upon a discovery sample of ~3,500 individuals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is crystalised vs fluid intelligence?

A

Crystallized-type intelligence is typically assessed using tests of acquired knowledge, and most often through tests of vocabulary. Fluid-type intelligence tends to involve unfamiliar, sometimes abstract, materials, to involve on-the-spot thinking, to be completed under time pressure, and to rely relatively little on prior knowledge.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Meta-analysis of intelligence - what are the benefits/cons? then describe study

A

Joint (meta) analysis of data from comparable GWAS provide a low-cost approach to enhance power for detecting effects. - One problem with meta-analysis is that different studies use different SNP chips – they might not have all the same SNPs. To overcome this, we can use knowledge of the correlation structure between SNPs to predict the genotypes that are missing. This is known as imputation and is crucial for enabling meta-analysis in GWAS.

Sniekers et al., 2017 - report a meta-analysis for intelligence of 78,308 individuals from 13 cohorts.
8 out of the 13 cohorts consisted of children and 5 of adults. They first meta-analysed the children- and adult-based cohorts separately and subsequently calculated the rg using LD Score regression. The estimated rg was 0.89, indicating substantial overlap between the genetic variants influencing intelligence in childhood and adulthood, and warranting a combined meta-analysis.
=They identify 336 single nucleotide polymorphisms (SNPs) in 18 genomic loci, of which 15 are novel. Roughly half are located inside a gene, implicating 22 genes, of which 11 are novel findings.
=they calculated the variance explained in intelligence by the GWAS results in four independent samples and found that up to 4.8% of the variance in intelligence could be explained. These GWAS results finally broke the 1% barrier of previous GWAS of intelligence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

multi-trait analysis of genome-wide association (MTAG) - who? cite study too

A

Turley 2017

MTAG allows the meta-analysis of summary statistics from genetically-related traits and was used by Hill et al., (2018):
Hill et al., (2018) investigated intelligence in relation to its inter-correlated variable, education, by combining two large GWASs of education and intelligence. In doing so, the study produced the largest GWAS of intelligence to date on nearly 250,000 participants.
=187 independent loci and 538 genes were associated with intelligence and the estimated variance is 2% higher than the meta-analysis by Sniekers, et al., (2017).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Explanations of the diff between hill and sniekers and what they tell us

A
  • One explanation for these differences could be that the smaller sample sizes of the GWAS within the meta-analysis by Sniekers et al., (2017) hampered the overall power to detect genetic effects.
  • Alternatively, using education as a proxy phenotype for intelligence may have increased the chances of detecting genetic effects of intelligence (Rietveld, et al., 2014). This assumption is supported by the genetic correlation of 1 that was derived when the findings from Hill et al., (2017) were compared to the intelligence dataset from Sniekers et al., (2017). This correlation suggests that the results reflect intelligence findings reported elsewhere, and do not just reflect the average of education and intelligence combined.
  • The GWAS for education used by Hill et al., (2017) was based on the number of completed schooling years (Okbay et al., 2016). This is a useful proxy as it is measured in large numbers.
  • The overlap of genetic influences for education and intelligence suggest that the genetic effects contributing to the variance in intelligence are not specific to intelligence but are pleiotropic, meaning that each gene implicates different phenotypes. These pleiotropic effects in combination with the polygenic nature of intelligence have led researchers to predict the operation of generalist genes (Haworth, Kovas, Dale & Plomin, 2008). This generalist hypothesis is supported by findings from GCTA which have revealed high genetic correlations among g and other learning abilities (Trzaskowski, et al., 2013). Similarly, twin research has revealed that the genetic effects underlying cognitive abilities can be accounted for by a higher-level general intelligence construct (Panizzon, et al. 2014). These findings demonstrate that different measures can be used to investigate intelligence (Johnson et al, 2008). This is significant as it allows GWAS to combine studies with different measures to accumulate larger sample sizes.

Once these generalist genes are discovered, they will provide an opportunity for research into the pathways between the brain, genes and behaviour (Kovas & Plomin, 2006). Intelligence therefore stands as a good target phenotype for future gene-hunting (Plomin & Deary, 2015). The data derived from GWAS have allowed researchers to combine methods like GTAC and polygenic scores, to explain more of the variance in intelligence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what are polygenic scores? cite

A

Because their effects are miniscule, a single common SNP is of little use for prediction. For this reason, the future of genetic prediction lies with polygenic scores that aggregate the effects of thousands of SNPs discovered by GWAS, including variants that do not achieve genome-wide significance (Dudbridge, 2013).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what is the Common Disease Common Variant Hypothesis ? cite

A

The Common Disease Common Variant Hypothesis is the idea that common disorders are likely influenced by genetic variation that is also common in the population and have a different underlying genetic architecture than rare disorders. For most common diseases, the CD/CV hypothesis is true, though it should not be assumed that the entire genetic component of any common disease is due to common alleles only (Bush & Moore, 2012).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

linkage disequilibirum - what is it, what are the two possibilities, why could this be a limitation of GWAS?

A
Linkage disequilibrium (LD) is a property of SNPs on a contiguous stretch of genomic sequence that describes the degree to which an allele of one SNP is inherited or correlated with an allele of another SNP within a population.
The presence of LD creates two possible positive outcomes from a genetic association study:
1)	 The SNP influencing a biological system that ultimately leads to the phenotype is directly genotyped in the study and found to be statistically associated with the trait. This is referred to as a direct association, and the genotyped SNP is sometimes referred to as the functional SNP. 
2)	The second possibility is that the influential SNP is not directly typed, but instead a tag SNP in high LD with the influential SNP is typed and statistically associated to the phenotype. This is referred to as an indirect association. 

Because of these two possibilities, a significant SNP association from a GWAS should not be assumed as the causal variant and may require additional studies to map the precise location of the influential SNP.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what is a microarray?

A

Microarrays can genotype millions of SNPs in parallel quickly and inexpensively. A microarray is a tiny slide that is dotted with short single‐stranded DNA sequences called probes.
Microarrays can be customized to genotype DNA variants for disease

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

More than 80% of associations found in GWA studies fall outside coding regions (CITE)

A

Manolio et al., 2009

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

give 2 reasons why twin estimates may be higher than GCTA

A

The higher estimates of twin studies suggest the possibility that twin heritability estimates are inflated. One argument against this possibility is that twin-based heritability estimates for cognitive abilities are in line with estimates from adoption and family studies, even though the adoption and family designs have different assumptions than the twin design does (Plomin et al., 2013).
A specific reason why GCTA heritability estimates might be lower than twin-based estimates is that GCTA estimates only additive genetic effects, whereas twin estimates include nonadditive as well as additive effects of genes. Additive genetic effects are caused by the independent effects of alleles, which add up in their effect on a trait; nonadditive genetic effects are those that interact. Because GCTA adds up the effect of each SNP, it does not include gene-gene interaction effects; the twin method captures nonadditive as well as additive genetic effects because the DNA sequence of identical twins is virtually identical and thus they share all genetic effects, including nonadditive ones.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

describe and cite study that compared twin and GCTA directly

A

Plomin et al., 2013 – directly compared GCTA estimates of the heritability of cognitive abilities with heritability estimates obtained with the classical twin design—using the same sample assessed at the same age with the same measures of diverse cognitive abilities including language, verbal, nonverbal, and general.
The sample included 12-year-old twins from TEDS (Oliver and Plomin, 2007) one member of each pair had been genotyped.
Composite scores were created for each ability (verbal and non-verbal cognitive ability and general cognitive ability), all of which were assessed via Web-based testing. Heritability was estimated from twin data using standard model fitting.
= the DNA array yielded GCTA estimates that accounted on average for .66 of the twin heritability estimates for language, verbal, nonverbal, and general cognitive abilities.
= If valid, this finding suggests that general cognitive ability is a good candidate for narrowing the missing-heritability gap using the common SNPs on current DNA arrays with much larger samples. This is fortunate because far more GWA data are available for general cognitive ability than for other cognitive abilities.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

what is GCTA? cite

A

Genomewide complex-trait analysis (GCTA), can be used to estimate genetic variance accounted for by all the SNPs genome-wide. The objective of the analysis is to estimate genetic variation captured by all the SNPs, just as GWAS does for single SNPs (Yang, et al., 2011).

Using genome-wide SNPs you can estimate the overall degree of genetic relatedness between all of the pairs of participants in your sample on genetically unrelated individuals. This genetic relatedness can then be compared to phenotypic similarity.

The logic is exactly the same as the twin design but in the twin design we estimate genetic relatedness based on zygosity. In contrast to the twin method, which estimates heritability of identical and fraternal twin pairs, whose genetic similarity is roughly 1.00 and .50, respectively, GCTA relies on comparisons of pairs of individuals whose genetic similarity varies from .00 to .02.

20
Q

phenotypic and genetic correlations between education and intelligence - cite

A

intelligence is that years of education is highly correlated phenotypically (0.50) and genetically (0.65) with intelligence (Rietvald et al., 2014)

21
Q

what is the missing heritability problem?

A

The missing heritability problem relates to the difference between heritability estimates and the variance explained.

22
Q

Decribe the process of DNA to protein

A

Genetic information in cells flows from DNA to messenger RNA (mRNA) to protein. The DNA contains a linear message consisting of four bases (A, T, G, C). The message is decoded by two steps: transcription of DNA into different sort of nucleic acid called ribonucleic acid/RNA, and then translation of RNA into proteins. Within the transcription process, the sequence of bases in one strand of the DNA double helix is copied to RNA, specifically to mRNA because it then relays the DNA code. It also replaces thymine (T) with uracil (U)

23
Q

what is a gene, allele and genotype?

A

Gene: A sequence of DNA bases that codes for a particular product.

Allele: alterative form of a gene

Genotype: an individual’s combination of alleles at a particular locus

24
Q

candidate gene studies of general cog have proven difficult and hard to replicate (CITE)

A

Chabris et al., (2012)

25
Q

The first wave of GWA studies improved our understanding of many complex traits (CITE)

A

Wellcome Trust Case Control Consortium, 2007

26
Q

Discuss population stratification

A

• There are often known differences in phenotype prevalence due to ethnicity, and allele frequencies are highly variable across human subpopulations, meaning that in a sample with multiple ethnicities, ethnic-specific SNPs will likely be associated to the trait due to population stratification. To prevent population stratification, the ancestry of each sample in the dataset is measured using EIGENSTRAT – this computes principle component analysis (PCA) for SNPs to identify population structure.

27
Q

what is the underlying rationale for GWAS? cite

A

The underlying rationale for GWAS is the ‘common disease, common variant’ hypothesis, positing that common diseases are attributable in part to allelic variants present in more than 1–5% of the population (Manolio et al., 2009)
They have been facilitated by the development of commercial ‘SNP chips’ or arrays that capture most, although not all, common variation in the genome.

28
Q

why should we conduct GWAS on African ancestry populations?

A

Most GWAS have been conducted in European ancestry populations. Genetic variation is greater in African ancestry due to smaller regions of LD. This is because their genomes have had more time to recombine, causing less LD between alleles at different SNPs.
Thus, studies of African ancestry may increase the yeild of rare variants and narrow the large chromosomal region of association in the ‘younger’ population due to extended LD.
Manolio et al., 2009

29
Q

what is a tag SNP?

A

A SNP in high LD

30
Q

who to cite for missing heritability problem?

A

Maher 2008

31
Q

Benjamin et al 2014

A

first GWAS of childhood intelligence (age 8 - 16)
6 cohorts
=no SNP assoicated at genome-wide level
=aggregated effects of common SNPs in 3 largest cohorts and found that variance explained was 22-46% - suggests intellgience is heritabile and polygenic

However, sample was only 17,900
Lead onto Sniekers et al., 2017

32
Q

what is the pheno and genetic correlation between intelligence and years of education? (CITE)

A

years of education is highly correlated phenotypically (0.50) and genetically (0.65) with intelligence (Rietvald et al., 2014)

33
Q

what are the sample sizes of
Benjamin et al., 2014
Sniekers et al., 2017
Hill et al., 2018

A

17,900 - children
78,000 - children and adults
nearly 250,000

34
Q

who to cite for years of education?

A

Okbay et al., 2016

35
Q

who to cite for GCTA?

A

Yang et al 2011

36
Q

GCTA on cog ability to confirm generalist genes hypothesis was conducted by..?

A

Trzaskowski et al., 2013

37
Q

Trzaskowski et al., 2013 study

A

used GCTA to test the generlaist genes hypothesis onf cognitive ability. They compared genetic correlation estimates using sample people, at same age, using same measures (TEDs).
Twin analyses used both twins, GCTA used just one twin (unrelated individual analysis)
=correlations between g and lang = 0.81, g and maths = 0.74, g and reading = 0.89.
=these were highly similar to twin estimates
Provide support for generalist genes hypothesis

38
Q

what is the difference between additive and non-additive genetic effects?

A

Additive genetic effects are caused by the independent effects of alleles, which add up in their effect on a trait; nonadditive genetic effects are those that interact.

39
Q

high intelligence represents the quantiative extreme of the same genetic influences as normal distribution - cite study that found this and one to support

A

Zabaneh et al., 2017

Plomin, Haworth & Davis, 2009

40
Q

Describe the study that combined polygenic scores and GTAC

A

Zabaneh 2017
Using GCTA to study individuals of high intelligence was guided by the assumption that these individuals possess more ability-enhancing alleles, and thus would increase the power of GCTA to detect genetic effects.
GWA findings from individuals at the top 0.03% of the intelligence distribution from the Talent Identification Program (TIP) to generate polygenic scores. These were then used to predict the variance in normal range intelligence in a sample of the Twins Early Development Study
=polygenic scores accounted for 1.6% of the variance in intelligence in TEDS. This polygenic prediction from TIP is stronger than from all current IQ GWAS, only being exceeded by very large studies of the partially correlated phenotype of educational attainment
= The proportion of variance explained by all common variants using GCTA was 0.33; using LD score regression (with unconstrained intercept), heritability was 0.42, this is close to estimates from twin research on normal range cognitive abilities (Haworth, et al., 2010) and is the highest thus far for GWAS of cognitive abilities (Benjamin et al., 2013), providing support for the use of combining methods and studying those at the high end of the spectrum

41
Q

who to cite when first discussing polygenic scores?

A

Dudbridge, 2013

42
Q

Evaluations of polygenic scores

A

polygenic scores are normally distributed, meaning there is a positive tail as well as a negative. This opens avenues for considering positive genetics, focusing on how individuals flourish and focusing on resilience rather than vulnerability (Plomin, Haworth & Davis, 2009). The normality also merits emphasis because it illustrates that common disorders can be considered as extremes of the common polygenic liability spectrum, which has far-reaching implications for diagnosis, treatment and prevention.

43
Q

who predicted that soon polygenic scores will explain more than 10% of variability in intelligence?

A

Plomin and von stumm, 2018

44
Q

a bottum-approach to intelligence focusing on specific genes will be difficult for 3 reasons:

A

1) genetic effects are pleiotropic
2) many hits are intergenic regions - stretch of DNA sequences between genes - they are a subset of noncoding DNA meaning no ‘genes’ to trace
3) biggest effects are miniscule

Therefore provides support for aggregating effects of common SNPs and polygenic scores

45
Q

who suggests that to close the missing heritability gap we should combine methods with whole-genome sequencing data?

A

Plomin, 2012