Genomics and Evolution Flashcards
What are the main 2 types of genome?
The nuclear genome and the mitochondrial genome.
What is high level and low level in genome organisation?
High level is the chromosomes and low level is all the junk DNA.
What is chromosome fusion?
It is the fusing of two chromosomes into one.
What are segmental duplications?
They are when a section of the chromosome are duplicated.
What are inversions?
It is where there is flipping of the genes on the chromosome.
What are translocations?
They are where there is movement of genes across chromosomes.
What do pseudoautosomal regions do?
They make sure the sex chromosomes pair and are separated correctly.
Sex chromosomes are ____ and what happens after they pair?
Sex chromosomes are homologous and pair, then recombine in male meiosis.
What species has an interesting case of chromosomal fusion?
Muntjac deer
What happens in male meiosis?
Sex chromosomes form chromosomal chains and then split after meiosis. This chain is used through translocation of autosomal regions leading to pairing.
What did sex chromosomes originate as?
They originated as a pair of autosomes.
How did sex chromosomes arise?
There was stopping in recombination in a pair of autosomes and this is where the sex-determining gene arose. From here, they non-recombining regions expanded, creating an evolutionary state on the sex chromosomes.
What happens once a region becomes non-recombing?
There is an accumulation of deleterious mutations which leads to it becoming degenerate.
Sex chromosomes evolved _________?
The evolved multiple times independently in different groups.
The process of degredation isn’t what, and what happens after degradation?
The process isn’t linear and after degredation, it stays at the “base level”.
There is a general tendency to lose what genes?
Genes that become unecessary.
What are 3 examples of gene loss?
The gene loss in the Y chromosome.
The loss of the Vitamin C producing gene multiple times over multiple lineages.
The loss of teeth in birds and turtles.
How often are genes gained and lost?
All of the time.
There is a what between genes being lost and gained?
There is a dynamic equilibrium between genes being lost and gained.
What are some mechanisms by which new genes arise?
Exon shuffling, gene duplication, retroposition, gene fusion, gene fission, and de novo origination.
Many proteins contain what, and how are new proteins with new functions made?
Many proteins contain “borrowed” domains, and old and new domains combine to create new proteins with new functions.
What is an example of where exon shuffling was used?
It was used in the origin of the jingwei gene in Drosophila.
What happens if a minor splice form does something useful?
It will be selected to increase its abundance in the cell, resulting in the evolution of major alternative gene isoforms.
What is the introns early theory?
The theory that introns are ancient and are gradually lost.
What is the introns late theory?
The theory that introns evolved in early eukaryotes and keep spreading.
What is a common process behind the evolution of new genes?
Evolution by duplication, ranging from single genes to the whole genome.
The is the 2R hypothesis?
It’s that there was two rounds of genome duplication in vertebrates.
What are whole genome duplications more common in?
Plants.
What happens when a gene is duplicated?
Some functional redundancy is created, which reduces purifying selection and allows the copies to accumulate mutations and diverge in function.
What is an example of gene duplication and sub-functionalisation?
The evolution of colour vision in primates, where the ancestral state is dichromatic, the S-gene and L-gene, and the L-gene duplicated and diverged. There was sub-functionalisation where the copies diverged to have different light sensitivities.
The size of the genome has little to do with what?
The size of a genome has little to do with the organism’s complexity.
What is the C-value paradox?
The idea that larger genomes don’t lead to higher complexity in eukaryotes.
In what does the number of genes and genome size show a pretty good correlation?
Viruses and prokaryotes.
How can genome sizes for extinct animals be measured?
The genome sies are measured from the size of the cells inside the bones. It is well-known that genome size correlates with a bigger nucleus, and so a bigger cell. To measure cell size, they measure the size of the pores that are in the bones.
What do transposable elements play a major role in?
Increasing genome size.
What is the only way to downsize a genome and what determines its efficiency?
Deletions are the only way to decrease the genome size, and the efficiency of downsizing depends on the frequency and size of the deletions.
What is an example of extreme genome reduction?
Buchnera is a mutualistic intracellular symbiont of aphids and since the revolutionary process started, there has been a massive reduction in genome size where only essential genes remain. The genome is currently essentially in genome stasis.
What was major in answering many questions regarding human evolution and why was it used?
mtDNA was used as it is more frequently mutating than DNA and has no recombination.
Where is human genetic diversity highest?
In Africa.
Why might looking at a single locus to explore the migration of humans be misleading?
It may tell a story of spread of an advantageous mutation and not the story of the migrations of humans, meaning it’s important to look at other parts of the genome.
Why is the human Y chromosomal tree used for phylogenetic reconstructions?
Because it is paternally inherited and the phylogeny is aligned with mtDNA.
Why is using gene trees problematic for autosomal markers?
It is problematic due to the recombination.
On what basis are principal component analysis plots created?
They are created on the basis of individual genotypes for autosomal markers.
Why could human DNA show a lower global mobility in men?
This could reflect the fact that males inherit the land from their father and stay whereas women are married off to other families.
What genes are good for going really far back in human history?
Nuclear genes.
What is one way to learn about the ancestors of humans?
To use remenants of DNA in the ancient skeletal tissues.
Where was there hybridisation between Neanderthals and humans?
Only in Europe.
What has ancient DNA been used to study?
Ancient DNA has been used to study whether the ancestors of modern humans interbred with Neanderthals and other archaic hominids.
How much of our genome is believed to be Neanderthal?
4%.
What is the oldest genome that’s been sequenced?
A Denisovan genome from a finger bone and a tooth, showed they are separate from Neanderthals and humans.
Closely related species were doing what when meeting?
Hybridising.
Different species on different islands in early Asia have suggested what?
That Homo.erectus had the ability and skill to travel across open ocean.
What does the adaptation of human skin pigmentation do?
It has a strong positive correlation with UV intensity, so dark skin has more protection against UV, which reduces the photolysis of folic acid. Light skin leads to more production of Vitamin D.
Selection leaves what in DNA polymorphisms?
Distinct footprints.
What is a result of the spread and fixation of an adaptive allele?
The loss of genetic variation around the target of selection.
What is an example of a footprint of recent adaptive evolution?
The genetic diversity around the gene involved in adaptation to milk adaptation in adult humans. The distribution of lactase persistence correlates with historical centres of dairy farming.
What happens when there is adaptation to contrasting conditions?
There is spread and fixation of different locally adaptive alleles in the population, which creates the signal of population differentiation at the genes under selection.
What is a good example of adaptation to local conditions?
An example is an adaptation to life at high altitudes where interspecies hybridisation was advantageous as Tibetans; they can breathe more easily at high altitudes due to having introgression of Denisovan-like DNA.
What is population genetics?
The study of genetic diversity in biological populations and of the processes that cause genetic diversity to change.
Genetic diversity is synonymous with what?
Intra-specific diversity.
What is the major process that differentiates intra- and inter-specific diversity?
Gene flow.
When did population genetic arise and from where?
Population genetics arose in the 1930s/1940s from the Modern synthesis of Mendelian Genetics and Darwinian Natural Selection.
Population genetics ultimately underpins what?
It underpins all phenomena in evolutionary biology.
What is a phenotype?
It is any observable or quantifiable characteristic of organisms that vary within or among populations.
What are genetic markers?
Genome regions that are useful for measuring and investigating genetic variation in populations.
What makes a population polymorphic at a specific genetic locus?
If more than one allele is commonly found.
The quality and resolution of genetic markers has improved with what?
The development of genetics, from proteins to DNA.
What is a genotype?
The allelic make-up of an individual.
What is the most common type of genetic marker used today?
DNA sequence variation.
What do we look at when looking at DNA sequence variation?
You can count the number of distinct sequences and the proportion of variable sites. You can also measure the average pairwise difference.
What are pairwise differences?
The number of differences between each pair of sequences.
What is heterozygosity?
The fraction of individuals in a population that are expected to be heterozygous.
What is heterozygosity equivalent to?
It is equivalent to the probability that any two alleles randomly sampled from the population are different.
What is average heterozygosity?
The proportion of loci observed to be heterozygous in an average individual, and it is obtained by averaging h across many loci.
What is the Hardy-Weinberg equilibrium?
It predicts genotype frequencies based on allele frequencies, when stable across generations in a stable population.
What assumptions are made regarding the Hardy-Weinberg principle?
- it’s a diploid organism with sexual reproduction
- there are non-overlapping generations
- there’s an infinite population size
- there’s non-random mating
- males and females have equal allele frequencies
- it’s a closed population
- there’s no mutation
- there’s no selection
The Hardy-Weinberg principle is an example of what model and why?
It is an example of a null model as it describes the state of population when nothing interesting is happening.
The Hardy-Weinberg theorem extends to what?
It extends to more than 2 alleles and to multiple loci that segregate independently.
What are the ultimate driving force of diversity and natural selection?
Mutations.
What is linkage disequilibrium?
The fact that the inheritance of genes on the same chromosome is not independent.
How can linkage disequilibrium be decreased?
It is decreased due to recombination and random assortment
What does non-random mating mean?
It means that individuals mate at random with respect to a particular genotype, it doesn’t mean that there’s absolutely no choice.
What are exceptions to non-random mating?
Exceptions are inbreeding (mating with relatives more often by chance), and positive assortive mating (mating occurs with individuals with similar phenotypes).
What is identity be descent?
It is where offspring are more likely to inherit the same allele from both parents.
What is the inbreeding coefficient used for?
To measure the level of recent inbreeding
What does inbreeding depression result in?
It results in reduced fitness, and it often arises from homozygosity in recessive deleterious alleles.
What is genetic drift?
The idea that chance alone can result in changes in genetic variation over time.
What is fixation?
When an allele’s frequency reaches 100% in the population.
When does genetic drift typically occur and what does it cause?
It usually occurs when populations go through bottlenecks and it causes substantive changes in allele frequencies.
Why is a founder effect observed?
It is observed due to genetic drift and inbreeding in a subpopulation.
What are examples of the founder effect?
Human diseases, some wild felid populations, and in captive breeding.
What can migration result in?
A reduction in overall heterozygosity.
What is the Fixation Index?
The fraction of total genetic diversity is due to differences among populations.
What is normally used instead of census population size in population genetics and why?
Effective population size is used instead of census population size as it takes into account that not all individuals in all generations have an equal propensity to reproduce.
What is effective population size?
The size of an idealised population that would experience the same rate of genetic drift as the real population, due partly to the limited proportion of breeding individuals.
What can cluster algorithms identify, and using what?
They can identify subgroups within a species using genetic marker data from multiple loci.
What does repetition of natural selection lead to?
It leads to the positive selection of beneficial alleles, which eventually results in their fixation.
Selection acts on what?
The whole organism.
What are examples of selection acting at a single locus?
Positive, negative, and balancing selection.
What are examples of selection acting at a multi-locus?
Directional, disruptive and stabilising selection.
What is relative fitness?
The average number of offspring produced by the individuals with a particular genotype compared to the number of produced by individuals with another genotype.
What is fitness of a new allele expressed as in population genetics?
It is expressed as a selection coefficient.
What does the selection coefficient represent in population genetics?
The increase or decrease in fitness conferred by that allele compared to another.
Changes in allele frequency occur more rapidly in what, and why?
The occur more rapidly in haploids than diploids, which is due to the fact that the relationship between genotype and phenotype is similar.
What shape are the plots of allele frequency against time?
Sigmoidal.
What is fitness influence by in diploids?
Allele interactions.
What is the range for the degree of dominance?
0-1.
What does the degree of dominance have a large effect on?
The rate of allele dominance.
A new rare allele that is created is normally what, and when does selection act on it?
A new rare allele initially created is mostly heterozygous, and selection can only favour the of the allele is dominant.
What dominance allow when an allele is near fixation?
The domainance allows for the less-fit allele to hide in heterozygotes, which makes it difficult to remove.
What case if balancing selection often typified by, and what is the case?
Balancing selection is often typified by the case of heterozygote advantage, where both alleles will stably coexist with a frequency that is proportional to the relative fitness of the two homozygotes.
What are two other types of selection that can maintain genetic variation in a population?
Frequency-dependent selection, and fluctuation selection.
What is frequency-dependent selection?
It is where allele fitness is high when the allele is rare, and so when the allele is common, the allele fitness is low.
What is fluctuation selection?
It is where allele fitness depends on an aspect of the environment tat is rapidly and constantly changing.
What happens when you go from a single locus to multiple loci?
You get a phenotypic curve.
What is the taxonomic domain to do with?
It is to do with describing, naming, identifying, and classifying species.
What is phylogenetics to do with?
It is to do with reconstructing patterns of shared ancestry among organisms.
Where was phylogeny first depicted?
In On the Origin of Species.
Where can phylogeny be seen?
It can be seen in hierarchal tables, the ladder of nature, and representing a process.
When are characteristics of organisms homologous?
If they are similar and have descended from a common ancestor.
When are characteristics of organisms analogous?
When they are similar but have descended from different ancestors.
What information do molecular sequences contain, and what is the problem?
Molecular sequences contain information about the evolutionary processes that produce them, but they are often scrambled, fragmentary, hidden, or lost.
How do modern methods recover and interpret challenging molecular sequences?
They use mathematical, statistical and computational methods.
What are orthologous molecular sequences?
They are sequences from different species.
What are homologous molecular sequences?
They are sequences from the same species.
What are paralogous molecular sequences?
They are sequences from different genes in the same genome.
When did molecular characters appear in science?
The arrived during the molecular biology revolution of the mid-20th century.
What are the advantages of using molecular characters over morphological ones?
- they are very common.
- they are objective.
- they are easy to quantify.
- they are available when morphology is uninformative.
- it is cheap and fast.
- it can be obtained without specialist training.
What is the only significant disadvantage of molecular characters?
It is unavailable for extinct species.
What is a transition mutation?
It is a mutation of a purine-to-purine, or pyrimidine-to-pyrimidine.
What is a transversion mutation?
It is from a purine to a pyrimidine.
What is another way to refer to a silent mutation?
As a synonymous mutation.
What is another way to refer to a replacement mutation?
As a non-synonymous mutation.
What principle is phylogenetics based on?
The principle of parsimony.
What concept is molecular sequence alignment based on?
The concept of positional homology.
When do nucleotides exhibit positional homology?
If they exist at equivalent position in their respective sequences.
Good alignment is essential for what?
It is essential for good phylogenies.
How do alignment methods often work?
Most alignment methods start by assigning a different “cost” to each type of sequence difference. Each possible alignment, therefore, has a total cost. Algorithms then identify the alignment with the lowest cost.
When are alignment programs more prone to mistakes?
When the sequences are diverse or contain long insertions or deletions.
What is the multiple hits problem?
It is a problem seen when looking at how different two sequences are. When divergence is low, the observed number of changes is similar to the true number, but when divergence is high, the observed number underestimates the true genetic distance.
What are nucleotide substitution models used for?
They are used to estimate the true genetic distance from the observed changes.
What do nucleotide substitution models mathematically represent?
They represent the stochastic process of sequence evolution through time.
What matrix do protein models need?
A 20x20 matrix.
What important biological assumptions do nucleotide substitutions make?
- evolution at each site occurs at the same rate.
- nucleotide base frequencies are the same for all sequences.
- evolution at each site is independent.
What are statistical models used to do?
They are used to capture the variation in evolutionary rates among sites.
What distribution is most commonly used in statistical models?
The gamma-distribution.
What do all the lines represent in an unrooted tree?
All lines represent genetic distance.
In a rooted tree, what direction does it have and what do the line represent?
A rooted tree has evolutionary direction, and only horizontal lines represent genetic distance.
What does a clustering algorithm do?
It transforms genetic distances into a tree.
What do optimality methods define?
They define some kind of score for each possible tree.
What are statistical methods?
They are methods that calculate a probability for each possible tree and frame phylogeny estimation as a formal statistical problem.
What is maximum parsimony?
The tree which requires the fewest evolutionary changes to explain the observed sequences is the best tree.
When is maximum parsimony most useful?
It is most useful when it applies to morphological character data.
When is maximum parsimony inapplicable?
When there are fast-evolving sequences.
What is maximum likelihood?
The tree which is probabilistically most likely to have given rise to the observed sequences is the best tree. It is slower and the probabilities are given by nuclear substitution models.
What is Bayesian inference?
Where each tree has a probability given the data, and the whole probability distribution is considered, not just the one most likely.
What is a parsimony score?
It is the minimum number of evolutionary changes required to explain the observed characteristics.
What is a tree search used for?
To find the topology with the highest likelihood.
What are the best ways to do tree searching?
- To use an exhaustive search which tries every possible tree and is only feasible with small numbers of taxa.
- To do hill climbing which searches through trees by iterative trial and error, and it doesn’t check all possible trees and isn’t guaranteed to find the optimal one.
What is the most common technique to test phylogenetic uncertainty and what does it involve?
The most common technique to do is bootstrapping, which involves permutations of the original data to create large number of pseudoreplicates.
What do most phylogenetic methods provide?
They provide a single estimate of a ‘true’ tree.
How is the reliability of bootstrapping measured?
The generated trees from each replicate have clusters and it’s the frequency of these clusters that is a measure of its reliability.
How was the idea of the molecular clock formed?
Zuckerandl and Pauling in 1962 compared the LCA of the Hepatitis C virus as a time scale and then compared the number of mutations compared to humans and this showed correlation, which led to the formation of the idea of the molecular clock.
What can molecules estimate that’s related to the molecular clock?
They can estimate the date of a common ancestor for which no fossils are known, and the divergence dates when there is no obvious morphological change.
What happened as amino acid and gene sequence data accumulate?
It became obvious that there was much sequence variation at the molecular level, and that the amount of molecular diversity varied within genes, among genes, and among species.
What are the two competing perspectives on the process of molecular evolution?
The neutralist approach and the selectionist approach.
What 2 different ways are molecular clocks used?
To understand why some genes/species/genomic regions evolve at different rates, and to estimate a timescale for phylogenies and evolutionary history.
What is the substitution/fixation rate?
It is the rate at which sequences in different populations diverge through time.
What is the mutation rate?
It is the rate at which individuals incorporate errors during replication.
What does the probability of fixation determine?
The difference between the substitution/fixation rate, and the mutation rate.
When are mutations caused by drift?
When Ns is between 1 and -1.
What happens with mutations when N is small?
When N is small, slightly deleterious mutations are controlled by drift and can occasionally become fixed.
What happens with mutations when N is large?
When N is large, the slightly deleterious mutations are controlled by negative selection and never get fixed.
What happens to substitution rates in small populations, and what may cancel out the effect?
Substitution rates can increase in smaller populations, but organisms in small populations tend to have longer generation times, which may cancel out this effect.
What is generation time?
It is the time between germ line replications.
What is generation time a particularly important factor for?
It is a particularly important factor for selectively neutral polymorphisms.
What might explain why mtDNA genomes tend to evolve faster than nuclear genomes?
Higher concentration of oxygen radicals.
What is a good example for non-equal generation times?
The X and Y chromosomes are a good example as there may be more cell division events in some species in the germ line than the female, which leads to faster Y chromosome evolution.
Why do smaller-bodied vertebrates tend to have higher substitution rates than larger-bodied ones?
It is thought to be due to a higher basal metabolic rate, which is then caused by increased oxygen free radicals produced by aerobic respiration, which can generate mutations. However, there is no clear association that has been found due to there being too many confounding variables.
How do you calculate genetic distance?
Genetic distance = evolutionary rate x (2 x divergence time)
What is an example of different mutation rates due to different replication?
RNA viruses and ratroviruses have mutation rates many times higher than those of eukaryotes as they replicate using different polymerases.
How were phylogenetic timescales previously calculated, and what is it known as?
They were calibrated by assuming that all lineages/species evolve at the same rate, and this is known as a strict clock.
When can phylogenies be calculated using the tips of the trees?
When the sequences are from evolutionary different points in time.
What is an example of where co-evolution was used to calibrate a phylogeny?
An example is where the phylogeny of cats was used to date the evolution of feline papillomaviruses.
What is phylodynamics?
It is a study of how population processes shape phylogenies, and it includes changes in population size, migration, speciation and extinction.
Who coined the term phylodynamics, when and what was it used to describe?
It was first coined by Grenfell et al. in Science in 2004 where it was used to describe how epidemiological, immunological and evolutionary processes can shape viral phylogenies.
How does coalescent theory work?
It works backwards in time and traces ancestry given a set of sampled sequences. It typically considers intra-specific processes.
How does the birth-death model work?
It works forwards in time, and it is where given a population process and it determined what the resultant phylogeny would look like, and it considers inter and intra-specific processes.
Where has coalescent theory gained importance and where is it used?
Coalescent theory has gained importance in population-level sequencing and has become widespread in anthropology, association mapping, conservation biology, epidemiology, global warming, and cancer biology.
What is the ‘Wright-Fiser’ model and what does it assume?
It is an ideal population, and it assumes that individuals have equal propensity to reproduce, that generations are non-overlapping, and that there is a constant population size.
What is coalescent theory in reverse?
Coalescence theory is genetic drift in reverse, and vice-versa.
What is r in terms of coalescence theory?
It is the probability that two lineages coalesce in the previous generation, and move back in time, it is the rate of coalescence.
How is r calculated in terms of coalescence theory?
r = (probability that a pair of sampled lineages share the same parent) x (the number of possible pairs of sampled lineages)
What does a “serially sampled” coalescent include?
It includes sequences from the past.
What happens in coalescent theory when population changes are taken into account?
Moving back in time, the population size decreases and the rate of lineage joining increases.
What does theta denote in terms of coalescent theory and what is it related to?
Theta denotes sequence diversity. It is related to the number of mutations in the history of the sample.
What does theta equal when mutations occur randomly on the branches?
Theta equals the average pairwise genetic distance between sampled sequences.
What happens to gene trees where there is a large population size?
There are often long internal/near root branches, which means there are many mid-frequency polymorphisms.
What happens to gene trees when there is slow population growth?
There are long terminal branches, which means there are many low frequency polymorphisms.
What information do sequences contain?
Sequences contain information about demographic history.
What is Tajima’s D?
A statistic that measures whether mutations are mostly high/medium/low frequency.
What methods can be used to study the demographic history through sequences?
Methods used include: Tajima’s D, skyline plots, and the sequentially markovian coalescent model.
What are skyline plots?
They are plots that use the mathematical relationship between r(t) and 2N(t) to estimate past population size.
What is the sequentially markovian coalescent model?
It is a complex approach used for human genomes.
What lineages can coalesce?
Only lineages in the same deme/subpopulation can coalesce.
When does incomplete lineage sorting occur and when is it more likely to occur?
Incomplete lineage sorting occurs when coalescences predate multiple speciation times, and this is more likely to occur when ancestral effective population sizes are large.
What are examples of where skyline plots have been used?
When looking at the population sizes of Beringian bison over time, and the origins of HIV.
What do Genome Wide Association Studies try to find?
They try to find genotypes associated with human diseases like diabetes.
What is coalescent theory used to interpret?
Coalescent theory is used to interpret large-scale human genomics data.
What does a phylogenetic tree using the birth-death model show?
A complete population tree displays the full population dynamics and displays the dynamics giving rise to individuals at time T.
What has the birth-death model been used to study?
It has been used to study the diversification of mammals after the extinction of dinosaurs, and to study the spread of Ebola in Sierra Leone in 2014.
What is one of the most important tasks of evolutionary genetics?
One of the most important tasks is to understand the selective forces acting on individual genes, gene regions, and codons.
What process can generate similar trees?
Demographic and selective processes can generate similar trees.
How can you detect selection from gene sequences?
You can look for differences in genetic diversity, tree shape, or mutation frequencies among genes or along chromosomes, compare silent and replacement changes within a gene, and look for parallel/convergent evolutionary changes.
What is dN/dS?
It is the ratio of the number of replacement fixations to the number of silent fixations, and it is not a differential.
What does it mean if dN/dS = 1?
It means that all replacement mutations are neutral.
What does it mean if dN/dS = 0?
It means that all replacement mutations are deleterious.
What does it mean if dN/dS > 1?
It means that at least som of the replacement fixations are beneficial.
Why does dN/dS usually equal much less than 1 when applied to whole genes?
Because only a few codons are positively selected and most codons are selectively constrained and therefore dN/dS = 9.
When does the power of the dN/dS ratio greatly increase?
When the ratio is applied to parts of genes or individual codons.
When are silent changes not neutral?
When they are in: overlapping genes, alternate reading frames, regulatory sequence elements (they affect the stability of RNA/mRNA/DNA structure), and where codons for the same amino acids differ in fitness.
Where is there a high dN/dS found in codons?
In the codons that form the active site of the gene, so the antigen recognition site.
What can the McDonald-Kreitman Test be adapted to study and how?
It can be adapted to study adaptation in measurable evolving populations and ratherthan using an outgroup to
What is the McDonald-Kreitman Test?
It is a simple method to contrast the patterns of within-species polymorphism and between-species divergence at synonymous and nonsynonymous sites in the coding region of a gene.
What should you expect to see in the McDonald-Kreitman test if polymorphism and divergence at both types of sites are due to neutral mutations?
You would see that the ratio of replacement to synonymous differences between species should be the same as the ratio of replacement to synonymous polymorphisms within species.
What are viruses?
They are very small infectious agents that replicate inside living cells.
What is the key property of viruses?
The high mutation rates.
What are the 2 scales at which viral evolution occurs?
The within-host scale and the between-host scale.
What are the most studied viruses in terms of molecular evolution?
Human pathogenic viruses.
Can you describe the HIV-1 genome?
It is a single genome where new diversity is generated by mutation and recombination, and there is gradual evolution.
Can you describe the Influenza genome?
It is comprised of 8 genome segments, each encoding 1 or more genes. New diversity is generated by mutation and reassortment between segments can also occur.
When and who created the classification of viruses?
David Baltimore created the classification of viruses in 1971.
What is the Baltimore classification of viruses based on?
It is based on the route of information transmission from the genome for mRNA, from which virus proteins are translated.
What do organisms with smaller genomes have?
Higher mutation rates.
What do organisms with higher mutation rates have?
Higher substitution rates.
What means that viruses evolve on an ecological timescale?
The high substitution rates.
What are acute infections usually caused by?
Acute infections are usually caused by RNA viruses which have a high mutation rate.
How do evolution and selection act on acute infections?
There is limited opportunity for within-host evolution and it is expected that selection for transmission plays a relatively large role.
What are latent persistent infections usually caused by?
Latent persistent infections are usually caused by DNA viruses, where there is a short burst of replication followed by long periods of latency.
How do evolution and selection act on latent persistent infections?
One expects to see little within-host evolution and to see selection for transmission play a relatively large role.
What are chronic persistent infections usually caused by?
Chronic persistent infections are usually caused by RNA or DNA-RT viruses.
How do evolution and selection act on chronic persistent infections?
There is ongoing rapid evolution and one expects to see within-host selection playing a relatively large role in determining adaptive evolution at the host-population scale.
What is the selection pressure at the within-host scale?
There is selection pressure to maximise within-host fitness.
What is the selection pressure at the population scale?
There is selection pressure to maximise between-host fitness, normally seen as transmission.
How do you research viruses at the within-host scale?
You take multiple sequences from the same individual at different times.
How do you research viruses at the between-host scale?
You take consensus sequences from different individuals at different times.
What has been observed with regards to selection in chronic HIV-1 infections?
Data showed that selected mutations typically involve evasion from host immunity and mutations that are selected for in some individuals are selected against in others.
Where is adapt and revert commonly seen and what could it explain?
Adapt and revert is commonly seen in viruses and it could explain why we see high mutation rates within the individual.
Where can acute infections become chronic?
In immunocompromised individuals.
Why isn’t an arms race between the virus and the host immune system in acute infections?
Because there is little opporunity for adaption so it is unlikely to see the arms race, and selection will be driven by intrinsic transmissibility and immune escape.
How can viral origins be understood?
They can be understood by using phylogenetic data.
Why does the phylogeny of SARS-Cov-2 have long branches and what does this lead to?
The long branches lead to ‘variants of concern’. The leading hypothesis is that these long branches are a consequence of evolution during chronic infection, and these are characterised by many nonsynonymous mutations in Spike.
How are antigenic maps constructed and what are they used for?
They are constructed from immunologial assay data and are used to choose vaccine strains.
What happens in changes in Influenza antigens?
In Influenza, genetic divergence is continuous, but antigenic change is punctuated, with switches among discrete antigenic types being observed.
Where is there common cross-species transmission of Influenza?
It is commonly seen between humans, birds and pigs.
Where was HIV-1 establishment found to be and what molecular methods were used?
Reconstructing the phylogeny found that HIV-1 is most diverse in Central Africa and the phylogeographic and molecular clock methods place common ancestor in the captical of the DRC in the 1920s. The virus is thoguht to have spread to humans from chimps in Cameroon but the origins before that were unknown.
What are the zoonotic origins of HIV?
Using phylogenies, it shows there was direct transmission from chimps to humans, however, there wasn’t just one transmission event but rather the virus jumped between lots of different species and then jumped from chimps to humans.
What are the zoonotic origins of Swine flu and what techniques were used to work it out?
Scientists took 8 segments from the genome and each genome segment was telling a different evolutionary story due to the reassortment. The best evidence shows it emerged in Mexico from pigs.
What is cluster busting?
It is where networks are generated of similar consensus sequences from different individuals.
When did Watson and Crick discover the structure DNA?
1953
Bacteria are one of what?
Bacteria are one of the earliest forms of life.
When was Sanger sequencing first done?
1976
What was the first free-living organism to be whole-genome sequenced?
The bacterium Haemophilus influenza in 1995.
When was commercialised pyrosequencing first done?
1998
When was Illumina sequencing first done?
In 2009.
What are the main steps in Illumina sequencing?
Sample prep, cluster generation, sequencing, and data analysis.
What did Illumina sequencing lead to?
The cost of genome sequencing plummeting.
What was the outcome of first generation sequencing?
Complete, assembled genomes with annotation.
What was the outcome of second generation sequencing?
Archival short-sequence data.
What are the two approaches for short read analysis?
First method is mapping where reads are aligned to a reference genome.
Second method is assembly, where genomes are reconstructed from raw read data using de novo assembly.
What is the k-mer approach?
It is a reference-free assembly and comparison that is independent of biological information.
What are the two main types of genome assemblers?
The overlap-layout-consensus method, and the De Bruijn method.
What are the steps in assembly done with De Bruijn graphs?
- Start with sequences.
- Divide all possible k-mers and look for all possible overlapping 4-mers.
- Spades is the most common assembler.
What are paired end sequences?
Two sequences that have a defined, known gap between them.
How long are the DNA sequences that short read sequencing technologies produce?
100-300 bps.
What is SNP calling?
It compares short reads to a high-quality reference, particularly used in comparing very closely related isolates.
What are the advantages of mapping?
- rapid
- accurate, even with ‘low coverage’ samples
- comparable
- reproducible
- problems are easy to visualise to help with identifying problems and errors.
What are the disadvantages of mapping?
- requires high-quality reference genome
- can only identify variants relative to the reference genome
- repeat high regions are problematic and can lead to induced error or under-reporting of variants
- can’t be reliably used to report large genomic events.
What is the overlap-layout-consensus method?
Where all of the overlaps between reads are determined then the reads and overlaps are all laid out on a graph and consensus sequences are identified, and a ‘String Graph Assembler’ (SGA) does this.
What is the De Bruijn graph method?
A graph that is constructed from a set of k-mers, where the vertices represent the k-mers and the edges represent the relationships between them.
What are the advantages of assembly?
- referene free so novel sequences can be constructed and identified.
- can be used to identify large genomic sequence variants.
What are the disadvantages of assembly?
- struggles to solve repetitive or very similar regions.
- computationally expensive
-time consuming - no clear ‘ground truth’ as the output can be variable based upon input parameters.
What are the limitations of Illumina sequencing?
- short reads do not contain enough information to resolve low complexity regions that are larger than the length of the short read, leading to gaps in the assembly.
- the assembled genome is fragmented into multiple contiguous sequences.
- some regions will not be assembled.
How do long reads solve assembly problems?
By spanning the entire length of low complexity regions, or resolving intermittent identical repeats.
What does long read sequencing include?
- Pacific Biosystems (not used lots)
- Oxford Nanopore (portable and used in Ebola outbreak and COVID).
How is the problem of long read sequencing methods being error prone overcome?
They are combined with second generation sequencing reads for an accurate hybrid assembly.
What are hybrid assemblies?
They are assemblies that combine the bae calling accuracy of short read sequencing with the scaffolding power of long reads to solve genomic features that are unresolvable by short reads alone.
What sort of things are including in bacterial genome annotation?
- Location, e.g. which sequence, where on the sequence, and which strand it’s on
- Feature type, e.g. protein coding, or tRNA, or repeat region
- Attributes, e.g. products, enzyme code, cellular location.
What is Prokka?
A gene-by-gene annotation.
What is EggNOG?
A database of orthology relationships, functional annotation, and gene evolutionary histories.
What do the size and features of bacterial genomes depend on?
Their biology, so where they are and what they do, e.g. if they are free-living or obligate or facultative.
What is the core genome of bacteria?
It is the genes that are the same in all bacterial individuals of a species.
How much of the bacterial genome is different between individuals?
They tend to have about 1/4 of their genomes different to each other.
What is the accessory genome?
All the different genes, so the variable genome content.
What is the pangenome?
The core and accessory genome added together.
What are large genomes of soil-inhabiting bacteria rich in and why?
They tend to be G and C rich, with it being unknown why this is, but it is potentially related to temperature, which increases stability under high temperatures.
What is large-scale genomic rearrangement and where is it seen?
It is where the genomes and order of genes are all shuffled, and this is seen in prokaryotes.
Why is there large variation in the genotypes and phenotypes of bacteria?
There is large variation due to bacteria having been around for a very long time.
What is the typical way to analyse population genetic structure?
It is to construct a phylogenetic tree from DNA sequences of bacterial strains with different phenotypes.
What is neutral diversification?
A model that emphasises that most of the genetic variation can be explained by genetic drift.
What are ecotypes?
They highlight selection for adapted lineages in a given environment.
Where are adaptive explanations for variation seen?
Where there is a genetic mutation which effects the fitness/survival of an individual.
What processes are bacterial evolution dominated by the relative rates of?
-DNA replication errors
- horizontal gene transfer
What are DNA replication errors?
Where there is generation of point mutations, rearrangements, or deletions of various sizes.
What is horizontal gene transfer?
Genetic material that is acquired from an external source and incorporated into the chromosome by recombination.
What is the only thing that can properly lead to innovation?
Mutation.
When may greatly elevated mutation rates occur?
Under strong selective pressure.
Why are different levels of clonal signals observed in different bacterial populations?
It is thought that it is a consequence of differing relative rates of recombination to mutation, although other forces may play a role.
What is a genetic bottleneck not the same as?
It is not the same as a selective sweep.
What is one method for quantifying selective pressure from sequence data?
One method is to compare the frequency of substitutions at synonymous sites.
What does it mean if dN/dS is less than 1?
It is associated with negative or purifying selection, which supresses protein changes.
What does it mean if dN/dS is more than 1?
It is associated with positive selection, promoting protein sequence changes.
What is positive diversifying selection associated with?
Host immune evasion or antimicrobial resistance.
Where my purifying selection be weaker?
Within host populations, where isolation from the ancestral population results in greater genetic drift and less time to purge deleterious mutations.
What are the limits to the utility of dN/dS estimates?
- Selection operates on features other protein-coding sequences which don’t necessarily affect dN/dS.
- dN/dS ratios do not detect complex traits such as interactions between genes.
- Frameshifts and incorrect interpretation of start codons can lead to non-synonymous single nucleotide polymorphisms being interpreted as synonymous.
- the estimates aren’t accurate if polymorphisms are not fixed between independent lineages, and segregating variation in the population is likely weakly deleterious and destined to be purged in the future.
What type of organisms are pathogens principally?
Most pathogens are principally commensal organisms.
How can scientists identify genomic changes resulting in pathogen emergence?
One can compare the genomes of pathogens with other genomes of the ancestors and related non-pathogens.
What is the strongest evidence of adaptation?
Convergent evolution, also known as homoplasy.
What does it mean to say that bacterial genomes are interactive?
It is where the effect of one allele depends on another, which is also known as epistasis.
What happens as genes get closer together?
There is a higher linkage disequilibrium.
What does recombination promote and harm?
Recombination promotes adaptation by introducing novel functionality; on the other hand, it risks creating disharmonious gene combinations that are likely to be selected against.
How can induced genes be accommodated?
- potential variation, which can set the stage for subsequent genetic changes that can result in beneficial adaptations.
- compensatory change, which adjusts the recipient genome to minimise potential disruptions, facilitating transition between fitness peaks.
While the number of genes varies greatly among species, it is not sufficient to account for what?
The differences in genome size.
What does the vast majority of genomic DNA code for in prokaryotes?
The vast majority of genomic DNA codes for protein in prokaryotes.
What are the ideas around why we carry so much non-coding DNA?
- non-coding DNA performs essential functions.
- Non-coding DNA is useless “junk”, carried passively by the chromosome simply because it is linked to functional genes.
- Non-coding DNA has a structural or nucleoskeletal function.
- Non-coding DNA is a functionless “parasite” that is in a selective battle with the host.
What is the best evidence that genome sizes are correlated with a variety of phenotypic traits?
- size of cell nucleus
- duration of mitosis and meiosis
- metabolic rate in birds and mammals
- minimum generation time
- seed size
- response of annual plants to CO2
- embryonic development time in Salamanders
- morphological complexity in the brains of frogs and salamanders.
What is the “skeletal DNA” hypothesis?
The hypothesis claims that cell size is adaptively important so that more genomic DNA is required to make bigger cells. So, DNA mass directly determines nuclear volume and there must be a constant ratio of nucleus to cell volume to maintain a balance between RNA synthesis and protein in the cytoplasm.
What is the evidence for the “skeletal DNA” hypothesis?
The evidence for the theory is in cryptomonad algae, where DNA in the nucleus performs a skeletal function.
What is one limitation in the “skeletal DNA” hypothesis?
While it is seen in unicellular eukaryotes, scaling it up to multicellular eukaryotes is challenging.
What affect did a study in 2003 determine that effective population size has on natural selection of non-coding DNA?
It suggested that effective population sizes are too small to allow for natural selection to effectively remove non-coding DNA from eukaryotic genomes.
Why do bacteria have very little non-coding DNA?
Probably because they have a single origin of replication and need to replicate quickly.
What is tandemly repeated DNA?
It is non-coding repetitive DNA consisting of short sequence motifs repeated 100s to 1000s of times in tandem.
What are the 3 major classes of tandemly repeated DNA?
- Satellite DNA (2-40Kb)
- Minisatellites (11-60bp)
- Microsatellites (2-5bp)
What are minisatellites and microsatellites a powerful set of molecular markers for population genetics and disease studies?
They have very high mutation rates, meaning that their loci are extremely variable.
What are transposable elements?
“Selfish” DNA sequences which are able to increase their copy number by jumping around the genome and making additional copies of themselves as they do so.
What are transposable elements known as in bacteria?
They are called insertion sequences.
What are the 3 groups of transposable elements?
- Class I elements (retroelements)
- Class II elements (DNA elements)
- Miniature Inverted-Repeat Transposable Elements (MITES).
How do retroelements transpose?
They transpose via an RNA intermediate using the enzyme reverse transcriptase.
What are the 2 major groups of retroelements?
- LTR retrotransposons
- non-LTR retrotransposons
What is one major group of the non-LTR retrotransposons?
Long Interspersed Nuclear Elements (LINEs), which are very common in eukaryotes.
What does the insertion of retroelements into genes cause?
It can cause deleterious mutations.
What are SINEs?
Short Interspersed Nuclear Elements , which do not encode their own reverse transcriptase like LINEs and they are very common in eukaryotic genomes.
What is the possible high rate that transposable elements can accumulate?
Copy number could increase by 20-100 copies in a single generation.
What is an example of Class II transposable elements?
Some drosophila species have P elements which are Class II elements. Wild flies carry them while lab flies don’t. The insertion of P elements can lead to hybrid dysgenesis.
What is hybrid dysgenesis?
It is an increased infertility due to chromosome breakage.
What are the stages in the endogenous lifestyle?
- Retroviral infection of the germline
- fixation
-amplification
-inactivation through mutations - loss through recombinal deletion
- decay into junk
-co-option.
What are the 3 groups of endogenous retroviruses?
Class I, II, and III.
What can be the consequence of endogenous retroviral activity?
Endogenous retroviruses cause diseases in a range of mammals, but there is no definitive link with disease that has been seen in humans.
What is an example of co-option of endogenous retroviruses?
Evidence has shown that a captive protein from an ancient endogenous retroviral insertion is involved in placental morphogenesis.
What does the ectopic exchange hypothesis predict?
It predicts that transposable elements will be preferentially found in regions with low recombination.
How can endogenous retroviruses and other transposable elements lead to chromosomal rearrangement?
It can happen through homologous recombination between distant loci.
What is the major force limiting transposable element copy numbers in genomes?
Selection against transposable elements that cause ectopic exchange.
What is the persistence of transposable elements likely to depend upon?
A complex interplay of factors specific to transposable element biology and the biology of the host.
How much of the human genome codes for genes?
1.5% codes for genes.
What is a major goal of comparative genomics?
To identify gene coding regions and determine their biological function.
What could be indicative of rapid adaptive evolution?
Regions of “dark matter” that show accelerated evolution in one species but not others.
What is the human genome near identical to in terms of gene coding sequences?
It is near identical to the chimpanzee.
What is the ENCODE project?
It is a project which aims to delineate all functional elements encoded in the human genome.
What are functional elements?
They are discrete genome segments that encode a defined product or display a reproducible biochemical signature.
How much of the human genome is under purifying selection?
3-8%.
How much of the human genome is functional in at least one cell type?
80.4%.
How much of the human genome is transcribed?
74.4%.
How much of the human genome is associated with modified histones?
56.1%.
How much of the human genome is found in open chromatin?
15.2%.
How much of the human genome binds transcription factors?
8.5%.
How much of the human genome consists of methylated CpGs?
4.6%.
What are large genomes found in in plants?
- pterophytes
- gymnosperms
- angiosperms (mainly the monocots).
How does whole genome duplication arise?
It arises from polyploidization events followed by chromosome reshaping.
Where is whole genome duplication best known, and how long for?
It is best known in flowering angiosperms, and they have been seen up to 400 million years ago in seed plants, then in ancestral angiosperms, and then in specific clades.
What does whole genome duplication underpin?
Innovations and adaptation in angiosperms.
What is whole genome duplication not sufficient to account for?
Very large genome sizes.
What is the outcome of whole genome duplication?
It doubles the genome size and gene number.
What is most stable to return to when whole genome duplication has occurred?
A return to a diploid state is most stable and has profound effects on the evolution of genome architecture.
What are a major class of transposable elements?
Retrotransposons.
What are the 2 super-families of plant LTR-retrotransposons?
Ty1/copia and Ty3/gypsy.
When are LTR-retrotransposons activated in plants?
While most LTR-retrotransposons are degenerate and inactive, stress tends to activate the movement of intact copies.
How much of monocot genomes are LTR-retrotransposons?
30-70%.
Why do LTR-retrotransposons make genome sequence analysis very challenging?
Because they tend to be highly nested in the genome.
What is the relationship between repeated sequences and genome size in plants?
The repeated sequences tend to drive up genome size, but the relationship is dynamic and changes in larger plant genomes.
How much of conifer genomes are LTR-retrotransposons?
60-85%.
How can retrotransposons be used as a molecular clock for plants and why?
Sequence divergence between the terminal repeats of a single retrotransposon can be used as a molecular clock. LTRs are initially identical and then their sequences decay due to random mutation.
What are conifers efficient at doing to LTR-retrotransposon copies?
They are efficient in key repair mechanisms to remove LTR-retrotransposon copies.
What is autopolyploidy?
It is where multiple chromosome sets derived from a single taxon. It comes from no chromosome disjunction during meiosis or spontaneous, somatic genome doubling.
What is allopolyploidy?
It is where multiple chromosomes derived from two or more diverged taxa.
What are polyploids abundant in?
They are abundant in crop plants.
What are examples of crop plant polyploids?
- Triploids include bananas, citrus, and some apples.
- Tetraploids include wheat, cotton, potato, canola and rapeseed.
- Hexaploids include chrysanthemum, bread wheat, oat, and kiwi.
- Octaploids include strawberry and sugar cane.
What do changes in gene function result in after whole genome duplication?
It results in sub and neo-functionalisation which facilitates evolutionary change including adaptation.
What happens to duplicated genes after whole genome duplication?
Duplicated genes are initially redundant and most often, one copy is lost.
What can gene duplication be amplified by?
Natural selection.
What is the certainty of large reference genomes highly influences by?
The sequencing method and assembly method used.
How did mitochondria and chloroplasts evolve?
They both evolved through endosymbiosis from prokaryotic organisms, which were alpha-proteobacteria, and cyanobacteria.
What is non-Mendelian inheritance?
It is where a character/gene is inherited from one parent only. This often takes the form of maternal inheritance as the egg contributes the bulk of the cytoplasm to the zygote.
What did Carl Correns and Eriwn Bauer independently study and find?
They independently studied inheritance of leaf colour in variegated plants and found that inheritance of the trait could not be explained according to Mendel’s laws of heredity.
What was important in the identification of organellar DNA?
- Genetic analysis
- Biochemical analysis
- Imaging.
What was the study on cytoplasmic inheritance in yeast?
The group of Boris Ephrussi studied yeast petite mutants and they were unable to grow on sugar-poor medium due to defective oxidative phosphorylation, and so formed small colonies. Sometimes this character was not inherited in Mendelian fashion, and it was later correlated with defective mitochondria.
How do you visualise mitochondrial DNA?
You stain DNA with ethidium bromide, and mitochondria with CiOC6. Where yellow is seen, it is mitochondrial DNA.
What do endosymbiotic organelles contain?
They contain double-stranded DNA molecules called mtDNA and cpDNA/ptDNA, meaning they are semi-autonomous.
What are mtDNA/cpDNAs?
They are small DNAs that do not encode for proteins and can be represented by circular DNA maps.
How do mtDNAs show reductive evolution?
They are much smaller than the ancestral genome, which is due to the genes needed for free-living being lost and many others being transferred to the nuclear genome.
What is the correlation between organelle and nuclear genome sizes?
There is no correlation.
What are the differences between organelle and nuclear genomes?
- organelle genomes lack features typical of nuclear chromosomes and exist as nucleoids.
- DNA replication replication is not tightly coupled to cell division.
- Organelle genome transcription and translation machineries are prokaryotic in character.
- some genes are transcribed together to form polycistronic RNAs.
- Introns exist but are of a different type.
- the genetic code can deviate from the standard.
- organelle transcripts can be subject to RNA editing.
What is mtDNA transcribed using?
mtDNA is transcribed using machinery that is related to T7 bacteriophage RNA polymerase.
What is mtDNA-directed RNA polymerase?
A single-subunit RNA polymerase and it requires the assistance of 2 transcription factors, mitochondrial transcription factor A, and mitochondrial transcription factor B.
How is mtDNA expressed in mammals?
- Transcription is initiated in the non-coding region.
- Transcription proceeds in both directions, from 2 promoters: light-strand promoter and heavy-strand promoter.
- Two transcripts spanning almost the entire genome are formed.
- These polycistronic primary transcripts are processed to yield mRNAs, tRNAs, and rRNAs.
How are mitorbomes different to bacterial ribosomes?
- They are 55S instead of 70S.
- They have evolved unique features reflecting the special requirements of highly hydrophobic OxPhos proteins in the organelle.
What happens when mammalian mtDNA is packaged into nucleoids?
- TFAM molecules bind to mtDNA in short patches.
- TFAM bends the mtDNA, and bridges neighbouring mtDNA stretches by cross-strand binding. This compacts the mtDNA to form the nucleoid.
- The mtDNA in the nucleoid is inaccessible to the transcription an replication machineries.
Why is does the standard genetic code deviate in organellar genomes?
The reasons are unclear but likely reflect the unique evolutionary and operational circumstances of the organelles.
What are mtDNAs and cpDNAs rich in?
They tend to be AT-rich.
Where is the role of TFAM in mtDNA conserved and lost?
It is conserved in animals and fungi, but the protein is absent in plants.
Organelle DNA-encoded RNAs often undergo what, and what does it alter?
They often undergo C-to-U editing which often alters the coding sequences of a transcript to produce translatable mRNA.
What has the largest mtDNAs?
Plants have the largest mtDNAs, and the number varies little between species.
What are some sources of “extra” DNA in plant mtDNAs?
- some derived from chloroplast, nuclear or viral DNA.
- some has been acquired by horizontal transfer from other plants.
What is the origin of most non-coding mtDNA?
Most is of unknown origin.
What is the ratio of synonymous substitution rates in mtDNA, cpDNA, and nuclear genes in angiosperms?
1:3:6.
What are the mutation rates in mtDNA in animals?
They are 1-2 orders of magnitude higher than in plant mtDNAs, and higher than animal nuclear genes.
What is MSH1 responsible for?
It is responsible for the unusually low mutation rates in plant organelle genomes. It is dual-targeted to chloroplasts and mitochondria, and it mediates efficient recognition and correction of DNA sequence errors.
What is the structure of plant mtDNAs?
Electrophoresis and microscopy studies suggest that genome-size mtDNA circles are rare or absent. Many repeated sequences are present which enables for homologous recombination and leads to highly variable structural organisation.
What is the organisation of most cpDNAs?
A long single-copy region (LSC), a short single-copy region (SSC), and two inverted repeats (IR).
What are losses of the mitochondrial genome documented in?
Anaerobic microbes, resulting in hydrogenosomes and mitosomes.
Where has loss of the plastid genome but retention of the plastid compartment been documented?
In a very small number of plants and algae.
How is human mitochondrial disease inherited?
It is inherited maternally.
What are the 2 different approaches to mitochondrial replacement therapy and what is the difference?
- maternal spindle transfer
- pronuclear transfer.
The difference is whether it is carried out before or after fertilisation.
What is the size of cpDNAs?
They are larger than mammalian mtDNAs, but smaller than plant mtDNAs.
What is sanger sequencing?
It was developed by Fred Sanger in 1977 and it used radioactively labelled ddNTPs with four independent reactions with each of the radioactive base analogues.
What does current Sanger sequencing use as a marker?
It uses flourescent tags instead of radioactively labelling.
What is the read length for Sanger sequencing?
up to 1000bp read.
How was Illumina sequencing found?
In the mid 90s, Shankar Balasubramanian and David Klenerman realised their work imaging the action of single polymerase molecules could be the basis for a new sequencing reaction by imaging the energy of the fluorescence omitted by the chemistry of the extension reaction.
What are the advantages of next generation sequencing?
- In vitro library preparation
- In vitro clonal amplification
- highly parallels as limited only by size of sequencing features and imaging limitations
- low reagent volume ratios per sequencing feature.
What are the impacts of genome sequencing?
- Medical/personal/human population genomics
- Metagenomics
- Environmental genomics
- Evolutionary/population genomics
- Understanding gene regulation mechanisms and the genome at new depths.
What is the purpose of genome re-sequencing?
To detect variation and inform on mechanisms underpinning phenotype.
What are “mate pairs”?
Circularised fragments of >1 kb pieces or “confirmation capture” brings more distant part of the genome together, with the ends appearing in the same sequenced fragment.
What indicates structural change?
When distance rules are broken among all reads.
What are the disadvantages of de-novo genome assembly?
- highly complicated.
- technical problems: biases in library preparation, biases in sequencing profile after amplification, sequencing error rates.
- assembly/information problems: polymorphisms and repetitive regions.
What is the result of whole genome sequencing of cells in early S-phase?
It results in over representation over sequence at origins of replication.
What is seen when mapping the genome is done after high coverage sequencing?
One sees peaks of reads at replication origins.
What is Tn5 transposase used for?
It is used to insert sequencing compatible sequences into the genome where it then dissociates and leaves the insertion sequences. It often inserts into open chromatin.
How can cytosine be methylated and why is it essential?
It is methylated at its 5th carbon and it is essential to development as a loss of any mammalian cytosine methyltransferases is lethal.
What roles have been established through pre-NGS studies of individual loci?
- Imprinting
- Retrotransposon silencing
- X chromosome inactivation.
What are the steps in Chromatin Immuno-Precipitation-sequencing?
- crosslink proteins to DNA
- fragment DNA
- Use antibodies to rescue DNA with nucleosomes with histone mark of interest
- de-crosslink DNA from proteins
- make sequencing library and sequence.
How are genomes now being sequenced?
They are being sequenced using technologies that produce long reads.
What are the benefits of long read sequencing?
- overcomes problems from repetitive regions
- allows for structural variation to be detected more directly
- much better assemblies
- easier to get high quality assemblies of new complex genomes
- epigenetic base modification can be read directly
- simpler library preps
- ability to sequence impure environmental samples more directly.
What does Pacific Biosciences allow for and what are the pros and cons?
It allows for Single Molecule Real Time sequencing. They have a longer read, 5kb on average but up to 15kb. They have a higher error rate but allow for detection of structural variation and detection of base modifications.
What are the advantages of Oxford Nanopore?
- it can read lots of different lengths
- it is portable
- it can convey structure information
- they are addressable and programmable on the array
- longest read is over 2 million bps.
What are the disadvantages of Oxford Nanopore?
- it is limited by the size of the molecules going into the machine
- can’t sequence genomes at very high accuracy yet
- some technical issues.
What is the Tree of Life program?
It is a project trying to sequence 60,000 eukaryotes in Britain and Ireland.
What can single-celled genomes tell us about the origin of animals?
Unicellular ancestor of animals had a complex repertoire of genes linked to multicellular processes, suggesting that changes in the regulatory genome were key to the origin of animals.