Genomics and Evolution Flashcards

1
Q

Chromosome number changes via which two mechanisms?

A

Fusion (reduction)- Muntjacs, for example have just 4 chromosomes due to fusion,
Fission (increase).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Chromosome structure changes via which mechanisms?

A

Inversions,
Translocations,
Segmental duplications.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a ‘pseudoautosomal region’?

A

Small region of homology between sex chromosomes. Humans have two, one at each end of chromosomes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Platypus has 5 pairs of X and Y chromosomes, how does such an arrangement emerge?

A

Translocation of regions between sex chromosome and autosomes, this creates a small region of the sex chromosome which is homologous to an autosome and vice versa. During meiosis, these regions chain together, causing chromosomes to segregate as a group.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do sex chromosomes typically evolve?

A

From a pair of autosomes that acquire a sex determining gene. Recombination is suppressed in the region surrounding the gene in the heterogametic sex. The non-recombining region can expand by inversions, resulting in nearly entire Y(or W)-chromosome becoming non- recombining

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the impact of non-recombination on the Y chromosome?

A

Rapid (almost instantaneous in an evolutionary sense) degeneration and gene loss, with only a few indispensable genes remaining functional.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What do the loss of the ability to synthesise vitamin C in primates and the loss of teeth in birds have in common.

A

Both represent the general tendency for genes which become unnecessary to be lost.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Unnecessary genes become lost or defunctionalised all the time, but where do new genes come from?

A

Exon shuffling- exons are recombined into genome at new positions,
Gene duplication,
Retroposition- genes are reverse transcribed into new positions,
gene fusion/fission,
De novo origination.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is alternative splicing?

A

When the same pre-mRNA is spliced in an alternative way to produce a new protein.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How did alternative splicing emerge?

A

Alternative splicing is thought to be a by-product of splicing noise – imperfect or incorrect splicing that occasionally occurs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the two main theories of intron evolution?

A

Introns early- evolution of introns in RNA-world, and gradually lost in prokaryotes.

Introns late- introns evolved in the ancestor of eukaryotes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the 2R hypothesis?

A

Vertebrates originated following two rounds of whole genome duplication

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How does colour vision in primates illustrate evolution by gene duplication and sub-functionalisation?

A

Evolution of trichromatic colour vision in primates occurred as a result of gene duplication: the L- gene (for Long wave length) was duplicated and the resulting genes diverged little bit, resulting in L- and M-genes (for Medium wave length).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the C-value paradox?

A

Why do larger genomes not correlate with higher complexity in eukaryotes, as they do in viruses and prokaryotes?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How is the C-value paradox resolved?

A

The high abundance of non-coding DNA in eukaryotic genomes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How can genome size be determined for extinct animals such as dinosaurs?

A

The size of pores inside the bones – the larger the genome, the larger the cell and therefore the larger the pore.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How does rate of DNA loss impact genome size?

A

The frequency and size of deletions occurring in the genome determines how efficient is genome downsizing. For example: * This study demonstrated that half-life of a piece of junk DNA (e.g. a pseudogene [broken gene]) in Drosophila is only 14 million years, while in the cricket it is over half a billion years – effectively junk DNA is never removed from Laupala genome. This results in hugely different genome sizes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Why is mtDNA useful for phylogenetic reconstructions in humans

A

mtDA mutates more frequently than nuclear DNA and it does not have recombination.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

When was the mitochondrial “Eve” likely to have lived?

A

Molecular clock suggests mitochondrial Eve lived somewhere in Africa ~170,000 years ago.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

When are humans likely to have first migrated out of Africa

A

75kya

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

How can we distinguish between migration and selective sweeps when using mtDNA to understand human origins?

A

It is impossible using only mtDNA. So it is important to look at other parts of the genome unlinked to mtDNA to reconstruct an unbiased picture of human pre-history.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Why is it challenging to use nuclear autosomal genes when reconstructing the history of human populations?

A

They recombine, which makes them poorly suited for phylogeny reconstructions. Instead, Principle Component analysis (PCA) is used for the analysis of polymorphism in autosomal DNA sequences.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Why does analysis of autosomal genes provide a more robust history of human populations?

A

Recombination leads to independence of evolutionary histories of different genes, as such, analysis of recombining nuclear autosomal genes provides a more complete picture compared to non-recombining markers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

How are traditional land inheritance practices reflected in human evolutionary genetics?

A

Comparison of mtDNA (mother-to-daughter) and Y-linked markers (father-to-son) reveals a lot more isolation by distance for the Y-linked markers, indicating much lower mobility of men compared to women, reflecting daughters moving to marry into different families.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What point serves as the limit on how far back in time a phylogeny can look?

A

In any phylogeny one can go back in time until the most recent common ancestor (MRCA) is reached, but no deeper than that. MRCA is effectively a ‘horizon’ for evolutionary genetic inference one cannot look beyond as no information is present about older lineages.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What evidence is there of human-neanderthal interbreeding

A

No evidence for interbreeding in mtDNA. However once nuclear Neanderthal genome was sequenced, it was estimated that ~4% of our genes are of Neanderthal origin.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Where did human-neanderthal interbreeding take place?

A

The signal of hybridisation between humans and Neanderthals was found only in Europeans and not in Africans, which makes sense given Neanderthals lived in Europe and were absent in Africa.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

How did differences in human skin pigmentation provide adaptive advantages to humans in different environmental conditions?

A

There is still no clarity in what exactly is advantageous in having lighter or darker skin. It is thought that darker skin reduces photolysis of folic acid in high-UV environment, while lighter skin helps production of vitamin D3.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What are two key signatures of selection?

A
  1. Loss of genetic variation (polymorphism) around the target of selection.
  2. The new mutations accumulate in the region of low diversity all start at very low frequency (1/population size), meaning that after a sweep genetic diversity present in the region is likely to be represented by polymorphisms at unusually low frequency. This can be detected by several statistics, the most common of which is Tajima’s D.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What determines the size of the low variation region around the locus of a selective sweep?

A

The size of the region affected by the sweep depends on:

  1. local recombination rate- greater recombination results in a shorter region of low variation.
  2. Speed of sweep- slow sweeps allow for more mutation and recombination during the course of the sweep, resulting in a shorter low variation region.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

How can population differentiation (the degree to which populations are subdivided) be quantified?

A

Population differentiation can be quantified using Fst statistic (=[Ht – Hs]/Ht, where Ht is total heterozygosity across all populations and Hs is average heterozygosity within populations).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

What are genetic markers?

A

Genome regions (from single nucleotides to whole chromosomes) that are useful for measuring and investigating genetic variation in populations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

How have genetic markers used by population geneticists changed over the course of the field’s history?

A

The quantity and resolution of genetic markers has improved (exponentially) with the development of genetics:

  1. Proteins:
    i. blood groups (1900),
    ii. allozymes (electrophoretically-distinct proteins; 1966)
  2. DNA (from 1970):
    i. Sequence variations (SNPs, insertions/deletions of nucleotides)
    ii. Structural variations (gene duplications/losses, chromosomal arrangements)
    iii. Ever-increasing array of techniques to analyse DNA: PCR, gel electrophoresis, sequencing technologies
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

When is a population considered polymorphic at a specific genetic locus?

A

If more than one allele is commonly found (typically > 1- 5%) at that locus

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

How is the proportion of variable or “segregating” sites defined in population genetics?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

What is ‘h’ in this equation?

A

Equation means: h = 1 – the sum of frequency^2 of all alleles

Heterozygosity (h) is the fraction of individuals in a population that are expected to be heterozygous

h is equivalent to the probability that any two alleles randomly sampled from the population are different. It is greatest when there are many alleles, all at equal frequency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

The Hardy-Weinberg principle almost never reflects the reality of a population, why are they still useful?

A
  1. Predicts genotype frequencies based on allele frequencies, when stable across generations in a stable population.
  2. The H-W Principle is an example of a null model. It describes the state of population when nothing interesting is happening.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

What are the assumptions of the Hard-Weinberg principle?

A
  1. Diploid organism with sexual reproduction (random and independent chromosome transmission to offspring)
  2. Non-overlapping generations
  3. Infinite population size (no random genetic drift)
  4. Random mating (no inbreeding)
  5. Males and females have equal allele frequencies
  6. A closed population (no migration)
  7. No mutation
  8. No selection
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

What does the Hardy-Weinberg principle teach us about genetics in the absence of evolutionary forces?

A
  1. Genotype frequencies are in equilibrium, i.e. they remain unchanged indefinitely
  2. This equilibrium is reached after only one generation of random mating
  3. If genotype frequencies are different from those predicted, then at least one evolutionary force is acting
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

What is linkage disequilibrium?

A

Linkage disequilibrium (LD) arises between genes on the same chromosome: their transmission is not independent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

What are two forms of non-random mating?

A

Inbreeding- individuals mate with relatives more often than would occur by chance

Positive assortative mating- individuals breed preferentially based on a similar phenotype.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

How does effective population size differ from census population size?

A

The effective population only includes those individuals who contribute to reproduction in a given generation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

How can effective population size be calculated based on sex ratio

A

If a population has an unequal sex ratio, the rarer sex will contribute more offspring per capita

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

Why can migration between subpopulations result in lower heterozygosity than would occur in a single population ander H-W equilibrium?

A

If subpopulations A and B are at Hardy-Weinberg equilibrium with different allele frequencies, the average heterozygosity will always be lower than the equivalent in a mixed population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

What is the fixation index and how is it calculated?

A

The fixation index is the fraction of total genetic diversity that is due to differences between subpopulations (demes)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

What is the selection coefficient of an allele?

A

The increase or decrease in fitness conferred by that allele compared to another

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

How can the change in the frequency of an allele between two generations be expressed for haploid organisms?

A

∆q ≈ spq

p, q are the frequencies of the alleles P and Q, respectively
The Q allele has a fitness of 1 + s (selection coefficient
∆q is the change in q from one generation to the next.

q increases when s is positive, and decreases when s is negative

The rapidity of the allele frequency change is proportional to the absolute value of s

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

How can the change in the frequency of an allele between two generations be expressed for diploid organisms?

A

∆q ≈ spq [ph + q(1-h)]

In diploids, fitness is influenced by the degree of dominance (h) of an allele, as follows:

PP = 1 ; PQ = 1 + hs ; QQ = 1 + s

h ranges from 0 to 1

(s= selection coefficient)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

How does the degree dominance (dominant, additive, recessive) of an allele affect the rate of allele frequency change in diploid organisms?

A

Dominance

  1. If the selected allele is dominant (orange line), change is initially rapid but very slow as it nears fixation
    - A new rare allele initially creates mostly heterozygotes. Selection can only favour these if the allele is dominant
    - Near fixation, dominance allows the less-fit allele to hide in heterozygotes, making it difficult to remove
  2. If the selected allele is recessive (amber line), change is very slow initially but accelerates near fixation

Additivity

Change is initially rapid and reaches fixation very rapidly (green line). This is because less-fit alleles are more effectively selected against (they cannot hide)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

What forms of selection help to maintain genetic variation in a population?

A
  1. Balancing selection: Both alleles stably coexist with frequency that is proportional to the relative fitnesses of the two homozygotes. Typified by the case of heterozygote advantage.
  2. Frequency dependent selection: allele fitness is high when the allele is rare, low when common
  3. Fluctuation selection: allele fitness depends on an aspect of the environment that is rapidly and constantly changing
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

In molecular phylogenetics, how are orthology, homology and paralogy defined?

A
  1. Orthologous sequences are from different species
  2. Homologous sequences are from the same species
  3. Paralogous sequences are different genes in the same genome
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

What is the difference between a transition mutation and a transversion mutation?

A

Transition- purine-to-purine or pyramidine-to-pyramidine (A -> G, C -> T etc.)

Transversion- purine-to-pyramidine or vice versa
(A -> C, G -> T etc.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
53
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
54
Q

Outline the most common method of choosing the best alignment between two sequences in Molecular phylogenetics.

A
  1. Assign differing costs to each type of sequence difference (i.e. insertions, deletions, transitions, transversions)
  2. Add up these costs for each possible alignment, and identify the alignment with the lowest cost. (Applications such as Clustal and Muscle do this)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
55
Q

What is the issue with simply using the proportion of sites that are mismatched (p-distance) when measuring the genetic distance between two genetic sequences?

A

These two scenarios would appear identical.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
56
Q

What is the multiple hits problem?

A

When the divergence between sequences is high, the number of differences between them will underestimate the true distance due to convergence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
57
Q

How do nucleotide substitution models aim to solve the issues of the multiple hits problem?

A

Estimate the true genetic distance by mathematically representing the stochastic process of sequence evolution over time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
58
Q

What is the Jukes-Cantor nucleotide substitution model, and how does it differ from the HKY and GTR models?

A

The simplest nucleotide substitution model. It differs from the HKY and GTR models because it assumes the rate of all forms of mutation are the same.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
59
Q

What do the letters a-f represent in this visualization of a nucleotide substitution model?

A

The relative rates of different types of mutation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
60
Q

How does an amino acid substitution model differ from a nucleotide substitution model?

A

Models sequence variation at the level of amino acids rather than individual nucleotides. 20 possible states each amino acid can move between, rather than 4, and so a 20x20 matrix is used. The rate of movement between amino acids is obtained through large surveys of protein variation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
61
Q

What assumptions are made by nucleotide substitution models?

A
  1. Evolution at each site occurs at the same rate.
  2. Nucleotide base frequencies are the same for all sequences.
  3. Evolution is independent at each site.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
62
Q

The assumption that evolution occurs at the same rate at all sites is a major inaccurate assumption of nucleotide substitution models. How can this be corrected?

A

Using models of among-site rate heterogeneity such as the gamma-distribution model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
63
Q

What is the difference between a rooted and unrooted phylogenetic tree?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
64
Q

What three groups of methods are used in constructing phylogenetic trees?

A

ALGORITHMIC METHODS: These methods begin with a genetic distance for each pair of sequences. A ‘clustering algorithm’ then transforms the genetic distances into a tree.

OPTIMALITY METHODS: These methods define some kind of score for each possible tree.An optimisation algorithm is then used to find the tree with the highest score.

STATISTICAL METHODS: These methods calculate a probability for each possible tree.They frame phylogeny estimation as a formal statistical problem

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
65
Q

How does an UPGMA algorithmic method of phylogeny construction work?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
66
Q

How does an UPGMA algorithmic method of phylogeny construction work?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
67
Q

What are the three optimality methods of phylogeny construction?

A
  1. Maximum Parsimony: The tree which requires the fewest evolutionary changes to explain the observed sequences is the best tree. Fast, but inapplicable to fast-evolving or highly-divergent sequences.
  2. Maximum Likelihood: The tree which is probabilistically most likely to have given rise to the observed sequences is the best tree.
    Slower.The probabilities are given by a nucleotide substitution model. Most common approach for sequence data.
  3. Bayesian Inference: Each tree has a probability given the data. We should consider the whole probability distribution, not just focus on the single most probable tree. Slowest. Closely related to Maximum Likelihood. Most useful for testing evolutionary hypotheses.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
68
Q

How is a phylogeny constructed using maximum parsimony?

A
  1. For any given tree and set of characters, the parsimony score is the minimum number of evolutionary changes required to explain the observed characters.
  2. The most parsimonious tree is that with the lowest parsimony score. However, there may be very many trees that share this distinction.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
69
Q

How is a phylogeny constructed using maximum likelihood?

A

Nucleotide (or amino acid) substitution models enable us to calculate P(seqs|T,B,Q), that is, the probability of the observed sequences given:

  • a tree topology (T)
    *a set of branch lengths (B), each of which represents a genetic distance
  • rate parameters of the substitution model (Q)

The tree likelihood is proportional to this probability*. Calculating this requires some fairly heavy-duty maths.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
70
Q

When constructing a maximum likelihood phylogeny, a tree search is used to determine the tree with the highest likelihood. If the tree has many taxa, how can a search be conducted without searching through every possible tree?

A

Hill climbing: searches through trees via iterative trial and error. Does not search through all possible trees, and isn’t guaranteed to find the most likely tree.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
71
Q

How is the uncertainty of a phylogenetic tree most commonly measured?

A

Bootstrapping:

  1. Bases are grouped by nucleotide position, and added to a ‘pot’
  2. A random group is chosen for each nucleotide position, with replacement, meaning the same group can be drawn multiple times
  3. This is repeated 100s or thousands of times to produce many pseudo replicates
  4. Generate a tree (usually NJ or ML) from each bootstrap replicate. The frequency with which a cluster occurs in these replicates is a measure of its reliability.
  5. Tree which appears in >70% is considered robust.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
72
Q

On what observation did Zuckerkandl and Pauling base their original molecular clock model?

A

Number of amino acid differences between animal hemoglobins was proportional to species divergence time, as defined by the fossil record.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
73
Q

What are the Neutralist and Selectionist models of molecular evolution? How are they reconciled?

A

Now understood that these two are not mutually exclusive: molecular evolution is driven by different forces in different regions of the genome and under certain conditions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
74
Q

How does substitution rate differ from mutation rate?

A

The substitution rate is the rate at which sequences in different populations diverge through time. The mutation rate is the rate at which individuals incorporate errors during replication.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
75
Q

How does the fitness impact of a mutation alter its substitution rate and the overall substitution rate of the gene its part of?

A

The overall substitution rate of a gene will depend on the proportion of sites in each of these categories.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
76
Q

How does population size alter substitution rates?

A

Ns- Population size x selection coefficient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
77
Q

How does generation time (time between germ line replications) impact substitution rate?

A

For neutral mutations, substitution rate is significantly impacted by generation times, as faster generation times provide more opportunities for mutations to accumulate.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
78
Q

How can different generations of the same species impact substitution rate differently?

A

Sometimes there is variation in the number of cell divisions (and thus opportunities for
mutation) per organismal generation. In some species there may be more cell division events in the male germ line than the female, leading to faster Y chromosome evolution than in the X chromosome

79
Q

What is a commonly proposed cause of the higher substitution rates observed in small animals, and what makes proving it difficult?

A

Proposed explanation is the higher basal metabolic rate observed in small animals: increased oxygen free radicals produced by aerobic respiration, which can generate mutations.

Difficult to prove association due to the large number of compounding variables (e.g. small animals have shorter generation times etc.)

80
Q

What techniques can be used to calibrate the timescale of a phylogeny?

A

If we know at least one divergence time (T) then we can “calibrate” the timescale of the phylogeny using that time.

There are four ways to calibrate a phylogeny:

  1. Fossils
  2. Biogeography- for example using the age of landmasses to estimate divergence time.
  3. Co-evolution- using known timescale from one species to calibrate for a coevolving species.
  4. Measurably evolving populations

Maximum likelihood methods are used to estimate the phylogeny, with the addition that calibrated nodes are fixed to a timepoint. Branch lengths are then in units of time, not genetic distance.

81
Q

What are the two techniques by which fossils can be used to calibrate the timescale of a phylogeny?

A
82
Q

What are two methods of using measurably evolving populations to calibrate the timescale of a phylogeny?

A
83
Q

What are the three approaches to depicting variable evolutionary rates using a phylogenetic molecular clock?

A
84
Q

What are two benefits of coalescent theory as a way of looking at population genetics?

A
  1. Focuses on the properties of small samples from large populations - so is easy to apply to empirical data
  2. Explains patterns of shared ancestry, not mutations. This makes it applicable to any kind of sequence data.
85
Q

What is a coalescent event in coalescent theory?

A

When two selected lineages converge on a single common ancestor.

86
Q

How is the rate of coalescence (probability that two lineages will coalesce in the previous generation) calculated? How does this differ between dipoid and haploid populations?

A

r is inversely proportional to population size:

rate of coalescence = r
N= population size
n= number of sampled genes
i= number of sampled lineages in a specific generation

r= (1/N) x i(i-1)/2

r= i(i-1)/2N

In a diploid population, 2N becomes 4N, as each node is a set of chromosomes, not an individual

87
Q

What is a “serially sampled” coalescent?

A

A version of coalescent model in which samples from the past are incorporated, this effectively replenishes the number of lineages after coalescent events, resulting in an increase in r (i is larger)

r= i(i-1)/2N

88
Q

How can population size be incorporated into coalescent model?

A

rt = (i(i-1))/2Nt

Moving back in time, the population size increases, therefore increasing rate of coalescence.

89
Q

What is θ (theta) and how does it relate to coalescent theory?

A

θ denotes sequence diversity- it is related to the number of mutations (dots) in the history of the sample.

Under neutral theory θ = 2Nμ

(μ is rate of mutation per generation)

If mutations occur randomly on branches: θ equals average pairwise genetic distance between sampled sequences.

Coalescent events can be linked to mutations: mutations present in one lineage and not another must have occurred after the point at which those lineages coalesced.

90
Q

If the population size in a coalescent model is assumed to be constant, how will different population sizes effect the genealogy of the population over time?

A
91
Q

How will the rate of population growth in a coalescent model effect the genealogy of the population over time?

A
92
Q

How can past population size be estimated based on sequence data and phylogenies?

A
93
Q

How can Coalescent Theory be applied to subdivided populations?

A
94
Q

What is incomplete lineage sorting in coalescent theory?

A

When species tree and coalescence tree are incongruent due to coalescences predating speciation events.
More likely when ancestral effective population is large, due to high coalescence rate.

95
Q

How can coalescent theory be used in genome wide associations studies searching for disease linked genotypes?

A

Allows population structure to be reconstructed, allowing researchers to distinguish between genes which are disease associated, and genes which are simply associated with a population which is disease associated.

E.g. genotype 0 appears incorrectly to be associated with the disease, when in fact it is associated with Population 2, and Population 2 is associated with the disease.

96
Q

How can we detect selection from gene sequences?

A
  1. Look for differences in genetic diversity, tree shape, or mutation frequency among genes or along chromosomes. i.e. suppressed genetic variation around site of selective sweep.
  2. Compare silent and replacement changes within a gene
  • dn/ds methods (typically used for sequences from different populations or species)
  • McDonald-Kreitman test
97
Q

What are the three types of selective sweep

A

Hard sweep: One beneficial mutation sweeps through population rapidly, so almost all individuals in population carry mutation physically linked with advantageous one.

Multiple mutation soft sweep: Multiple beneficial mutations are present. Individuals generally carry mutations physically linked with whichever beneficial mutation they possess.

Soft sweep: Slow sweep means chromosome carrying advantageous mutation has undergone subsequent mutations meaning population contains multiple distinct sets of linked mutations.

98
Q

What is dN/dS?

A

The ratio of replacement fixations to the number of silent fictions.

Replacement fixations are assumed to be neutral or advantageous (probability of disadvantageous mutation going to fixation ≈ 0).

Allows us to determine whether observed fixations are the result of positive selection or drift, as positively selected mutations will get fixed much faster than neutral ones.

99
Q

Why is dN/dS usually much less than one when applied to whole genes, but much higher when applied to specific parts or individual codons?

A

Only a few codons in a gene are positively selected, whilst the rest are selectively constrained, so have a dN/dS ≈ 0

100
Q

When are silent mutations not neutral?

A
101
Q

How can the presence of positive selection be detected using a McDonald-Kreitman Test?

A
102
Q

What are two forms of viral genome organization, and how do they evolve in different ways?

A
  1. Single genome- new diversity evolves gradually through mutation and recombination

2.Segmented genome- new diversity evolves through mutation and reassortment (i.e. when a cell is infected by two viruses, the viruses produced by that cell can contain segments from both.)

103
Q

What are two forms of viral genome organization, and how do they evolve in different ways?

A
  1. Single genome- new diversity evolves gradually through mutation and recombination

2.Segmented genome- new diversity evolves through mutation and reassortment (i.e. when a cell is infected by two viruses, the viruses produced by that cell can contain segments from both.)

104
Q

How does Baltimore Classification divide viruses?

A
105
Q

Why do smaller genomes have higher mutation rates than large ones?

A
  1. Large genomes with high mutation rates would accrue so many mutations they would be non-viable.
  2. Small genomes have less scope for regulation of mutation than large genomes, due to possessing a limited number of genes.
106
Q

What are the two scales at which viral evolution occurs?

A
107
Q

How is viral evolution tracked at the within-host and between-host scales?

A
  1. Within-host- Multiple sequences from one individual, at different times
  2. Between-host- Consensus sequences from multiple individuals at different times, with different individuals sampled at each time point.
108
Q

What are the two next-gen sequencing approaches typically used in viral genomics? What is the key problem with both, and how can this be resolved?

A
  1. Targeted sequencing: target portions of the genome are sequenced and aligned.
  2. Whole-genome sequencing: whole viral genomes are fragmented, sequenced and aligned.

Problem: only small portion of genomes are sequenced, making it difficult to establish linkage between mutations at different ends of the genome.

Solutions: Use limited ‘window’ for phylogenetic analysis, or use population genetics approach (less focus on linkage)

109
Q

What are the three classifications of viral infection, and what are the characteristics of viral evolution associated with each?

A
109
Q

What are the tree classifications of viral infection, and what are the characteristics of viral evolution associated with each?

A
110
Q

What patterns of mutation are typically observed during chronic viral infections (i.e. HIV) ?

A

1.Selected mutations usually involve evasion from host immunity.

  1. Specific to the individual- mutations which are selected for in some individual are selected against in others.
111
Q

What is ‘toggling’ in viral genomes?

A

Rapid mutation and subsequent reversion as a result of purifying selection removing mutations once they become non-adaptive.

112
Q

How can ‘toggling’ or ‘adapt and revert’ explain why viral evolution is more rapid at the within-individual scale than at the between-individual scale?

A
  1. Mutations are not adaptive in all individuals, so as viruses infect new hosts, mutations frequently revert to the wild-type through purifying selection.
  2. These reversions slow the overall rate of evolution at a between-individual scale.
113
Q

Acute viral infections can become chronic in immunocompromised individuals. How can this impact viral evolution?

A

Provides viruses which usually have very little opportunity for adaptation with ample time to mutate

This is the leading hypothesis as to the source of the high divergence between the ‘variants of concern’ which emerged during the COVID.

114
Q

The resolution of the C-value paradox relies on the abundance of non-coding DNA in eukaryotic genomes. What are the possible reasons that non-coding regions are retained?

A
  1. Non-coding DNA performs essential functions, such as the global regulation of gene expression.
  2. Non-coding DNA is useless “junk”, carried passively by the chromosome simply because it is linked to functional genes.
  3. Non-coding DNA has a structural or nucleoskeletal function (i.e. related to cell volume, but not to carrying information).
  4. Non-coding DNA is a functionless “parasite” that is in a selective battle with the host (“selfish DNA”).
115
Q

What potential structural role has been suggested as an adaptive function of non-coding DNA by the “skeletal DNA” hypothesis? What is the problem with this hypothesis for multicellular organisms?

A
  1. More genomic DNA is required to make bigger cells. DNA mass (and its folding pattern) directly determines nuclear volume and there must be a constant ratio of nucleus to cell volume in order to maintain a balance between rates of synthesis of RNA in the nucleus and proteins in the cytoplasm.
  2. Evidence for the theory is that the DNA content in cryptomonad algae is scaled with nuclear volume, but that there is no such relationship in symbiont form of the same species (which wouldn’t need to maintain cell volume in this way)
  3. Issue is with multicellular organisms, in which cell volume can be highly variable between cell types.
116
Q

What potential structural role has been suggested as an adaptive function of non-coding DNA by the “skeletal DNA” hypothesis? What is the problem with this hypothesis for multicellular organisms?

A
  1. More genomic DNA is required to make bigger cells. DNA mass (and its folding pattern) directly determines nuclear volume and there must be a constant ratio of nucleus to cell volume in order to maintain a balance between rates of synthesis of RNA in the nucleus and proteins in the cytoplasm.
  2. Evidence for the theory is that the DNA content in cryptomonad algae is scaled with nuclear volume, but that there is no such relationship in symbiont form of the same species (which wouldn’t need to maintain cell volume in this way)
  3. Issue is with multicellular organisms, in which cell volume can be highly variable between cell types.
117
Q

What explanation did Lynch and Conery propose for the large amounts of non-coding DNA in eukaryotic genomes?

A

1.Non-coding DNA is non-adaptive

  1. Effective population sizes are too small to allow natural selection to effectively remove non-coding DNA from eukaryotic genomes (i.e. Effective population size (Ne) x selection coefficient (S) < 1 so that genetic drift dominates evolutionary dynamics).
  2. Conversely, the huge effective population sizes in bacteria may be a significant barrier to their evolution of genomic complexity.

3.It is unclear whether is paper will stand-up to future scrutiny. The paper has already been contested.

118
Q

What are the 3 main classes of Tandemly Repeated DNA?

A
  1. Satellite DNA: 2bp-40Kbp long. Mainly located in heterochromatin
  2. Minisatellites: Also known as variable number of tandem repeats (VNTR) loci. Consist of G-rich core of 11-60bp
  3. Microsatellites: Also known as short tandem repeat polymorphisms (STRs). 2-5 bp, with many CA repeats.
119
Q

What are the two types of Repetitive Non-Coding DNA in Eukaryotes?

A
  1. Tandemly repeated DNA
  2. Transposable elements ( inc. endogenous retroviruses)
120
Q

Minisatellites and micro satellites have very high mutation rates, and so serve as powerful molecular markers for population genetics and disease studies. How is genetic variation created in these loci?

A
  1. Point mutations
  2. Unequal crossing over and DNA slippage (when DNA strands mispair during replication and recombination so that short stretches of sequence slip against each other creating loops of DNA which can be either lost or gained during DNA repair).
121
Q

How do the ways satellite DNA and transposable elements replicate differ?

A

Satellite DNA is passively replicated due to errors in DNA replication. Transposable elements actively jump around the genome and can replicate themselves whilst doing so.

122
Q

What are the three types of transposable elements

A

Class I: Retroelements

-Transposes via an RNA intermediate (DNA -> RNA -> DNA) using reverse transcriptase (“retrotransposition”)
-2 forms: LTR and non-LTR

Class II: DNA elements

Class III: Miniature inverted-repeat transposable elements (MITEs)

123
Q

What are the three possible consequences of endogenous retrovirus activity?

A
  1. Disease (e.g. Mouse Mammary Tumor Virus)
  2. Co-option (e.g. placental morphogenesis)
  3. Recombination (e.g. 16% of HERV-K(HML2) family elements may have been involved in large scale human genome reorganization)
123
Q

What are the three possible consequences of endogenous retrovirus activity?

A
  1. Disease (e.g. Mouse Mammary Tumor Virus)
  2. Co-option (e.g. placental morphogenesis)
  3. Recombination (e.g. 16% of HERV-K(HML2) family elements may have been involved in large scale human genome reorganization)
124
Q

What is ectopic exchange?

A

When transposable elements lead to chromosomal rearrangement through homologous recombination between distant lo

125
Q

What is the ectopic exchange model? How can the model be tested what are the results of these tests?

A
  1. Ectopic exchange is likely to be highly deleterious
  2. Selection against transposable elements that cause ectopic exchange is the major force limiting TE copy numbers in genomes.
  3. Prediction: TEs likely to be found in regions that undergo lower levels of meiotic recombination, and hence lower levels of ectopic exchange.
  4. Test: Do TE element densities correlate negatively to rates of meiotic recombination within genomes, in different species?
  5. Results: Very variable pattern- inconclusive
126
Q

What are the possible explanations for the variable patterns observed when testing the ectopic exchange model?

A
  1. C. Elegans and A. Thaliana self-fertilising; perhaps ectopic exchange less important in such species.
  2. Humans and flies are outcrossing; but ectopic exchange does not apply to HERVs!
  3. LINEs are more numerous than HERVs; perhaps ectopic exchange more important for larger TE groups as these are more likely to lead to genomic rearrangements.
  4. Conclusion: the persistence of TEs is likely to depend on a complex interplay of factors specific to TE biology and the biology of the host. More genomes & studies needed to understand this.
127
Q

What are the two alternative (but complementary ) approaches to analyzing whole genome data?

A
  1. Mapping (reference assembly): Reads are aligned with a reference genome, with variants between the reference genome and the reads being called at each base
  2. Assembly (de novo assembly): genome is reconstructed from raw data.
128
Q

What are the advantages and disadvantages of Mapping (reference assembly) when analyzing genomic data?

A
129
Q

What are the advantages and disadvantages of Mapping (reference assembly) when analyzing genomic data?

A
130
Q

What is sequence coverage in genome sequencing?

A

The number of reads which contain a given nucleotide sequence.

131
Q

What are the two main types of genome assembly method?

A
132
Q

How are de Bruijn graphs used in modern de novo genome assembly?

A
  1. de Bruijn graphs are made up of nodes and edges.
  2. Eulerian cycle is the path that visits every node without moving along the same edge twice.
  3. Short reads are divided into even smaller, overlapping K-mers, each with a suffix and prefix (e.g. ATG would have prefix AT and suffix TG)
  4. Each prefix and suffix is a node, meaning that each edge corresponds to a K-mer.
  5. The corresponding Eulerian cycle will provide the sequence
133
Q

What are the limitations of short read sequencing?

A
133
Q

How are long reads typically used in genome sequencing?

A
  1. Long reads such as those produced by Oxford Nanopore are more error prone than short reads
  2. this makes them most useful as a scaffold onto which more accurate short reads can be mapped.
  3. This allows ambiguities, such as those created by large low complexity regions, to be resolved.
134
Q

What three pieces of information about a gene are incorporated into genome annotation?

A
135
Q

What is a pangenome?

A

In bacteria, a pangenome is the core genome and all possible accessory genes.

Core genome: vital genes shared by all strains of a species. In E. coli this just 2000 genes or ~50%

Accessory genome: Genes found in just some strains of a species, encompassing alternative metabolic pathways, antibiotic resistance etc.

136
Q

What is a potential explanation for the higher GC content in the genomes of free-living bacteria?

A

May be to do with the fact that G-C is stronger (3 bonds) and so free living species possess more G-C as this makes genome more resilient

137
Q

What are the two main theoretical approaches for explaining patterns of molecular evolution in bacteria.

A
  1. Neutral diversification: model emphasizes that most of the genetic variation can be explained by genetic drift (neutral diversification).
  2. Ecotypes: highlights selection for adapted lineages in a given environment (ecotypes).
138
Q

Which two processes dominate bacterial evolution?

A
  1. DNA replication errors (or DNA damage), which generate point mutations, rearrangements or deletions of various sizes
  2. horizontal gene transfer (HGT), through which genetic material is acquired from an external source (that is, a distinct bacterial strain) and incorporated into the chromosome by recombination.
139
Q

When can hyper-mutation become beneficial in bacterial populations?

A

Under strong selective pressures, such as antibiotics or a change in host niche.

140
Q

Bacterial populations exist on a spectrum between which two extremes of population structure?

A
  1. Strictly clonal: (Almost) no recombination e.g. M. tuberculosis
  2. Fully non-clonal/panmictic: constant recombination at almost all loci. e.g. Helicobacter pylori
141
Q

How do homologous and non-homologous recombination differ in the context of bacterial genetics?

A

Homologous recombination: replacement of alleles

Non-homologous recombination: addition or loss of a gene via recombination

142
Q

When examining bacterial populations, neutrality is the most parsimonious explanation, as such adaptation must be evidenced when we invoke it. How can adaptation be identified?

A
  1. dN/dS
143
Q

What are the the limitations of dN/dS for detecting positive selection in bacterial populations?

A
  1. Selection operates not only to maintain protein-coding sequence but also on features such as gene order, distribution of coding sequences on leading and lagging strands, GC skew and codon usage, none of which would necessarily affect dN/dS.

2.Complex traits such as host adaptation will involve multiple genes and, most likely, interactions at multiple levels of genome arrangement, which are not likely detectable by analysing dN/dS ratios. E.g polymorphisms can be in strong LD with other loci owing to common ancestry; all variants linked within a haploblock would be associated with host adaptation and may have similar dN/dS whether they conferred a direct functional advantage or not.

  1. Frameshifts and incorrect interpretation of start codons can lead to non-synonymous single nucleotide polymorphisms (SNPs) being interpreted as synonymous, leading to inaccurate dN/dS estimates and non-detection of positive selection.
  2. dN/dS estimates are not accurate if polymorphisms are not fixed between independent lineages, and segregating variation in the population is likely weakly deleterious and destined to be purged in the future. For example, isolates from different physically separated populations may have segregating polymorphisms that were inherited from the respective founding strains, and the effect of selection on dN/dS will not necessarily be identical in each, making population-wide estimates error prone.
144
Q

What is the quantitative model of variation (used to explain phenotypes controlled multigenically and subject to additive variation)?

A
145
Q

What is heritability, and what is the simplest way to express it mathematically?

A

Heritability is the proportion of the phenotypic variation that is due to genetic causes. In its simplest expression Heritability = VG / VP

146
Q

What are the three general approaches to estimating heritability?

A
  1. Parent offspring correlations: Use correlation between average parent (mid-parent) phenotypic values and offspring phenotypic values. (Difficult to use because requires comparable data, so same conditions, age etc.)
  2. Sib analysis: Uses relationship among individuals as the basis to calculate genetic variances. The genotypic variation is modeled based on variance among these groups.
  3. Twin pairs: Uses phentypic variance among Paris of twins. (Makes assumptions which have frequently been challenged.)
147
Q

Describe a progeny test using sib-analysis to measure trait variation amongst saplings from several full-sib families.

A
148
Q

How does trait variation in the broad sense differ from trait variation in the narrow sense?

A
149
Q

What are some of the caveats and limitations associated with heritability estimates.

A
150
Q

What is Genetic Gain in a selectively bred population? How is it calculated

A

The population’s response to selection.

151
Q

What are some of the challenges and potential solutions associated

A
152
Q

What is the selection differential?

A
  1. The selection differential (S) measures total selection acting directly and indirectly on a trait including via selection on genetically correlated characters
  2. In the figure, S is the difference between the population mean (the thin continuous line) and the distribution weighted with the differential fitness of phenotypes (the thin dashed line), resulting in the product shown by the thick line.
153
Q

What are the two stages in a whole genome duplication?

A
  1. Polyploidization: multiple chromosome copies are received from one or both parents.
  2. Chromosome reshaping: Similar chromosomes undergo recombination. Reshuffling reduces chromosome number in order to produce a new stable genome. Without this process polyploids become non-viable and go extinct.
154
Q

Is whole genome duplication sufficient to explain plant genome sizes?

A

No

155
Q

What role do long terminal repeat retrotransposons play in plant genome size?

A

Long terminal repeat retrotransposons are the most abundant transposable element in nearly all plants and can significantly increase genome sizes:

Conifer genomes are 60-85% LTR retrotransposons.
Monocot genomes are 30-70% LTR retrotransposons.

156
Q

How can the age of a long terminal repeat retrotransposon be determined?

A

Measure the divergence between the long terminal repeats- these are necessarily identical when the transposon is copied and inserted, but diverge over time

157
Q

Why are conifers vulnerable to colonization by gypsy and copia LTR retrotransposons?

A

They are deficient in key repair mechanisms to remove LTR-RT copies.

158
Q

What are the two main types of polyploidization?

A
159
Q

How do the timescales of polyploidization and whole genome duplication (chromosome reshaping)?

A

Polyploidization is much shorter term (0.1 -10 MYA) than WGD (5-200 MYA)

160
Q

What role has polyploidization played in the evolution of bread wheat?

A
  1. Allohexaploid from ancestral Triticum and Aegilops species.
  2. Three “homeologous” genomes are maintained (3 diploid genomes descended from the same ancestor)
  3. Minimal chromosome reshaping has occurred
160
Q

What role has polyploidization played in the evolution of bread wheat?

A
  1. Allohexaploid from ancestral Triticum and Aegilops species.
  2. Three “homeologous” genomes are maintained (3 diploid genomes descended from the same ancestor)
  3. Minimal chromosome reshaping has occurred
161
Q

What are the factors which influence plant genome sizes?

A
162
Q

What are the potential fates of duplicated genes after whole genome duplication

A
  1. Most often, one copy is lost due to redundency.
  2. Sub and neo-functionalisation- Changes in gene function may occur in one or both copies.
163
Q

How do the homeologous gene triads in bread wheat effect gene expression?

A

Different homeologous variants can be transcribed at different levels in different tissues, underpinning the hexaploid’s phenotype.

164
Q

What form of non-mendelian inheritance is characteristic of mitochondrial and chloroplast genetics?

A

Uniparental/cytoplasmic inheritance: character / gene is inherited from one parent only. This is because the egg contributes the bulk of cytoplasm to the zygote, hence the term “cytoplasmic inheritance”

165
Q

Why are mitochondria and chloroplasts considered to be genetically semi-autonomus?

A
  1. Endosymbiotic organelles contain double-stranded DNA molecules, called:

mtDNA
cpDNA (or ptDNA)

  1. These DNAs do encode proteins (they are functional genomes)
  2. However, most (>90%) of the 1000s of proteins present in modern organelles are encoded by nuclear genes, hence, semi-autonomous.
166
Q

How can mtDNA and cpDNA be visualised? Why can this be misleading?

A

mtDNA and cpDNA can be represented by circular DNA maps Although presented like this, they do not necessarily exist in this state in vivo

167
Q

Why are mitochondrial and chloroplast genomes much smaller than in their ancestral free-living form?

A

Genes required for free-living were lost, and many others were transferred to the nuclear genome

168
Q

What are some of the features of organelle genomes which distinguish them from nuclear genomes?

A
  1. They are small, gene-dense, and typically represented by circular DNA maps
  2. They lack nuclear chromosome features (centromere, telomeres, histones, etc.), and instead exist as nucleoids
  3. There are multiple DNA copies per organelle, and (often) multiple organelles per cell; DNA replication is not tightly coupled to cell division

4.The transcription and translation machineries are prokaryotic in character

  1. Some genes are transcribed together to form polycistronic RNAs
  2. Introns may exist, but they are of a different type (group I or II, instead of spliceosomal)
  3. The genetic code (codon usage) may deviate from the standard code
  4. Organelle transcripts can be subject to RNA editing
169
Q

How is mtDNA transcription initiated in mammals?

A
  1. TFAM (mitochondrial TF A) interacts with a high-affinity binding site just upstream of the transcription start site and introduces a 180° bend in the DNA
  2. POLRMT (mtDNA-directed RNA polymerase) is recruited by both TFAM and sequence-specific interactions with DNA
  3. POLRMT undergoes a conformational change, enabling binding of TFB2M (mitochondrial TF B2) and formation of the initiation complex
  4. Elongation requires mitochondrial transcription elongation factor, TEFM
170
Q

Describe the transcription of mammalian mtDNA

A

1.Transcription is initiated in the non- coding control region, NCR
Transcription proceeds in both directions, from two promoters:
- light-strand promoter, LSP
- heavy-strand promoter, HSP
2. Two transcripts spanning almost the entire genome are formed
3. These polycistronic primary transcripts are processed to yield mRNAs, tRNAs and rRNAs- “tRNA punctuation model”
4. All mammalian mtDNA genes lack introns (plant organelle DNAs do have introns)

171
Q

How is mammalian mtDNA packaged into nucleoids?

A
  1. TFAM molecules (green) bind to mtDNA in short patches
  2. TFAM bends the mtDNA, and bridges neighbouring mtDNA stretches (arrows) by cross-strand binding
  3. In combination, mtDNA bending and cross-strand binding compact the mtDNA to form the nucleoid
  4. The final, tightly packaged mtDNA in the nucleoid (f) is inaccessible to the transcription and replication machineries.
172
Q

How can TFAM exert nuanced control over mammalian mtDNA expression

A

TFAM is necessary for translation, but also makes DNA inaccessible at high concentrations..

173
Q

Why do organellar DNA encoded RNAs often require C-to-U editing after transcription?

A

To produce translatable mRNA (e.g., by creating an AUG start codon, or eliminating a premature stop codon)

174
Q

What are some of the features of plant mtDNA which makes them distinct from the the mtDNA of animals?

A
  1. Plant mtDNA is much larger than that of animals
  2. The number of mtDNA genes varies little between species.
  3. Plant mtDNAs do have some extra genes, and several genes have introns, but most of the genome in large mtDNAs is non-coding DNA that is not conserved across species.
  4. Some of the DNA in plant mtDNAs can be recognized as being derived from chloroplastic, nuclear or viral DNA
  5. Some of the DNA seems to have been acquired by horizontal gene transfer from other plants
  6. Most non-coding DNA in plant mtDNA is of unknown origin
175
Q

What is the role of the many repeated sequences in plant mtDNAs?

A

These enable homologous recombination, leading to highly variable structural organization:

176
Q

Describe the structure of plant mtDNAs, why do they rarely occur as circles (as presented in textbooks) in vivo?

A
  1. Subgenomic circles derived from the master circle may form (a)
  2. More recently, deep sequencing analysis suggests an even more complex organization of plant mtDNAs:
  • Overlapping linear sub- genomic fragments, and linear head-to-tail concatemers (b)
177
Q

Describe the structure of cpDNAs

A
  1. cpDNAs typically encode ~100 proteins, and the identity and sequence of these are highly conserved between species
  2. Most cpDNAs have the following organization:
    - A long single-copy region, LSC
    - A short single-copy region, SSC
    - Two inverted repeats (IRs) of ~20-25 kb
  3. In some species, one or both of the IRs may be lost
  4. Other species have radically different cpDNA arrangements
    (e.g., multiple single-gene minicircles in dinoflagellates)
178
Q

How does maternal spindle transfer differ from pronuclear transfer (both are forms of mitochondrial replacement used to treat inherited mitochondrial diseases)

A

Key distinction is whether the treatment takes place before or after fertilisation.

179
Q

Describe Sanger sequencing

A
  1. Input DNA is fragmented and cloned into bacterial vectors for in vivo amplification.
  2. Reverse strand synthesis is performed on the obtained copies starting from a known priming sequence and using a mixture of deoxy‐nucleotides (dNTPs) and dideoxy‐nucleotides (ddNTPs). The dNTP/ddNTP mixture randomly causes the extension to be non‐reversibly terminated, creating differently extended molecules.
  3. Subsequently, after denaturation, clean up of free nucleotides, primers, and the enzyme, the resulting molecules are sorted using capillary electrophoresis by their molecular weight (corresponding to the point of termination) and the fluorescent label attached to the terminating ddNTPs is read out sequentially.
180
Q

Describe Illumina Sequencing

A
  1. DNA binds to complimentary primer on slide. Other end also binds, creating a bridge with a fixed geographical location. PCR is conducted to create local amplification of sequences.
  2. Sequencing reaction is run across the slide, using fluorescently labelled dNTPs which can only extend the sequence by one base, due to the presence of a terminating group. Each site of local amplification appears as a dot, with a colour corresponding to the last dNTP which bound.
  3. Dye and terminating group are cleaved and the slide is washed.
  4. Cycle is repeated with a new set of dNTPs
181
Q

When Illumina sequencing was first developed, the four bases were denoted by 4 separate dyes, how are the four bases denoted now, using just 2?

A

Bases are denoted by the following combinations of dyes:

  1. Absence of dye
  2. Presence of dye 1
  3. Presence of dye 2
  4. Presence of both dyes
182
Q

What are the benefits of Illumina sequencing over Sanger sequencing>

A
  1. In vitro library preparation and clonal amplification (No in vivo steps).
  2. Highly parallel as limited only by size of sequencing features and imaging limitations.

3.Low reagent volume ratios per sequencing feature.

183
Q

What is the highest throughput next generation sequencer currently produced by Illumina?

A

The NOVAseq6000 is currently the highest throughput sequencer: Can sequence up to 6 Tbp (40 human genomes) in six days, equating to many billions of reads.

184
Q

What are the three most common applications of next generation sequencing data?

A
  1. Genome resequencing: Purpose is to detect variation and inform on mechanism underpinning phenotype (disease/effects of selection etc)
  2. De novo genome assembly: Assemble entirely new genome using sequence data (NGS produces short reads, long read technologies are needed to resolve ambiguities.)
  3. Targeted genome sequencing: the goal is to re-sequence just a region of the genome of interest (at great depth)
  4. Querying different levels of Genome regulation
185
Q

What is array/solution-based enrichment?

A

Oligos are attached to array or in solution- DNA fragments of interest bind, then fragments are eluted and sequenced. Used in Targeted genome sequencing to produce very high sequencing depth.

186
Q

How can next generation short read sequencing be used to detect changes in genome structure?

A
  1. Chromatin is fragmented using primary restriction enzymes.
  2. Cross linked regions of the genome are ligated together producing mate pairs.
  3. Mate pairs: circularized fragments of >1 kb pieces join distant parts of the genome together, with the ends appearing in the same sequenced fragment
  4. When sections of DNA aren’t in the mate pairs you would expect among all reads this indicates structural change
187
Q

How can short read sequences from S-phase cells be used to locate their replication origins?
What was discovered in the extremophile Haloferax volcanii using this technique

A

1.Add Bromodeoxyuridine (BrdU) DNA tag to cell, this is incorporated into DNA produced during replication
2.Rescue of BrdU labelled DNA during DNA replication (S-phase), followed by sequencing.
3.Map to genome after high coverage sequencing, see peaks of reads at replication origins.

Paper using this technique on Haloferax volcanii, demonstrated that origins are not essential for replication, without them, replication starts roughly evenly across genome. Could origins be selfish elements, rather than essential components of the genome?

188
Q

What is ATAC-seq and how is it used to assess the structure of chromatin?

A
  1. Assay of Transposon Accessibility of Chromatin
  2. The Tn5 transposase is used to insert sequencing compatible sequences into the genome
  3. “Transposome” disassociates leaving insertion sequences
  4. Use PCR to amplify between inserted sequenced to generate fragments for sequencing
  5. Tn5 inserts more frequently into “open chromatin”
189
Q

How can the degree of methylation of a region of DNA be assessed using sequencing of bisulfite-treated DNA? (this method has been supplanted by less damaging techniques)

A
  1. DNA is treated with Bisulfite, which converts cytosine to uracil.
  2. Only remaining cytosine will be methyl-cytosine, which is protected from deamination.
  3. Sequencing DNA will reveal which cytosines are methylated.
  4. NGS technology allows this to be done at a high throughput.