genomes and evolution Flashcards
Brief history of genome discoveries (19th to mid-twentieth century)
Gregor Mendel (1822-1884)
Showed units of inheritance are discreet, don’t blend, and persist overtime
Used this concept to predict the phenotype of pea plants
Law of segregation: the two alleles at a(diploid) locus separate during the process of forming gametes (meiosis), and randomly unite at fertilisation (in the zygote)
Law of independent assortment: the two alleles separate independently during the formation of gametes, so that traits are transmitted to offspring independent of each other
Reginald C. Punnett (1905)
Introduced the punnett square in 1905
Thomas Hunt Morgan (1866 –1945)
Nobel Prize in Physiology and Medicine 1933 for discoveries relating the role the chromosome plays in heredity
through his studies of mutations in fruit flies he demonstrated that genes are carried on chromosomes and are the mechanical basis of heredity
In particular, showed in 1911 that the white-eye mutant in Drosophilla was sex linked, and inferred the link to chromosomes from that.
Frederick Griffith (1928)
Non-virulent bacteria injection – mouse survives
virulent bacteria injection – mouse dies
virulent bacteria killed by heat – mouse survives
Non-virulent bacteria + heat treated virulent injection – mouse dies – as bacteria can transfer DNA from one to the other
Oswald Avery (1877-1955)
Built on the work by Frederick Griffith
Showed that DNA is the ‘transforming principle’, and that the polysaccharide coat of a bacterium was not – by testing isolated components of the cell in transformation experiments (Avery et al. 1944).
James Watson & Francis Crick (1953)
building on the X-Ray crystallography work of Rosalind Franklin and Maurice Wilkins determined the double-helix structure of DNA in 1953
The composition and nature of nuclear genomes
Nuclear genomes vary a great deal in size, and are divided into chomosomes in eukaryotes.
Weight of genome corresponds to number of chromosome pairs and hence to genome size
There are likely many reasons for variation in genome size, but one correlation within taxa is with body size.
->See: Gregory et al. (2000). Evolutionary implications of the relationship between genome size and body size in flatworms and copepods. Heredity 84: 201-208.
However, beyond specific taxa, this relationship breaks down
Non-coding DNA
Much of non-coding DNA is repetitive
Protein coding genes make up just 1.5% of the human genome and this is similar throughout eukaryotes
Genome sequencing
In 2001 the first draft of a full genome was compiled it was actually only 90% of it. This took years to do, a whole building full of machines and ~billions of dollars
In 2010 the 1000 genome project was carried out
In 2018 the 100,000 genome project was completed in the UK and 1 million had been sequenced world wide – the technology had improved rapidly
However these genomes tended to have gaps where repetitions were missed etc.
Now it is possible to sequence whole genomes using telomere to telomere (T2T) technology
The first human genome to be sequenced using T2T was in 2022
-> See - Nurket al. (2022) Science 376,44–53 : The first gapless, complete human genome
Currently the Earth biogenome project is trying to sequence the 1.5 million different eukaryote genomes, they have sequenced nearly 2000 species so far.
What is the rest of the genome doing if it is not coding?
Scientists were looking at RNA transcription and observing the different parts of the genome that produce RNA to see if non-coding regions were governing transcription
Discovered that 75% of the genome is producing RNA
Encode Project came together to look at the different classes of RNA products being produced
See: www.pnas.org/cgi/doi/10.1073/pnas.1318948111
Also see Encode project website
According to the Encode project: It seems that a lot of the RNA being produced is micro RNA (miRNA)
Micro RNA: ~22 nucleotides long, regulates gene expression at post-transcriptional level (development, apoptosis, metabolism)
Scientists were looking at RNA transcription and observing the different parts of the genome that produce RNA to see if non-coding regions were governing transcription
Discovered that 75% of the genome is producing RNA
Encode Project came together to look at the different classes of RNA products being produced
See: www.pnas.org/cgi/doi/10.1073/pnas.1318948111
Dan Graur disagreed with this 75% functional theory
Mutational load: whenever there is a mutation that affects the gene there is a potential negative effect
If mutational load is considered this leads to the conclusion that the functional fraction within a human genome cannot exceed 25% and is likely much lower than this
So yes some proportion of the genome functions alongside the genes to regulate them
However there is still a lot of so called ‘junk’ that we currently do not know its purpose
Some of it is probably architectural whereas other areas are likely to be repetition
The composition and structure of organelle genomes
Consistently ~17000 Kb in mitochondrial DNA in eukaryotes ~1Kb non-coding ‘controller region’
Mitochondrial and chloroplastic DNA is far more complicated and variable than in eukaryotes
How do genomes change over time? (1)
Triplet codons code for different aminos though some aminos have more than one codon that code for them
Insertion and deletion cause frame shifts – changing all the aminos coded after them
Genes also contain non-coding areas ‘residues’ that can change without affecting the amino produced
Mutation rate tends to be proportional to the overall population size of the species
Lower in larger populations
Many potential causes of mutation e.g. UV light
How do genomes change over time? (2)
Across taxa and within genomes many different mutations can occur
Areas of high levels of mutation are known as ‘hot spots’
No mutation = synonomous change (equivalent substitution)
Silent = no visible phenotypic effect
Nonsense = stop codon
Missense =
conservative – doesn’t cause much change in the protein shape
or nonconservative – does cause a change in protein shape
Transition – change from one pyrimidine to another or one purine to another – usually faster
Transversion – change from a pyrimidine to a purine or vice versa
Repetitive DNA
simple repeats make up about twice as much of vertebrate nuclear genomes as single-copy coding genes.
Makes up a lot of the genome
Often referred to as satellite DNA because it collected at a separate peak on a centrifuge gradient as a ‘satellite’ to the distribution
Tandem repeats of 100s or 1000s of up to ~500bp motifs
Satellite DNA
named for the segregation of a lot of similar sized DNA on a centrifuge gradient as a ‘satellite’ to the distribution
Mini satellite DNA
named this because the length of the repeat arrays was shorter than for ‘satellite DNA’ arrays
Alec Jeffries developed the technique of ‘DNA fingerprinting’ after discovering these highly variable, mini-satellite DNA arrays – a way to identify individuals
Micro-satellite DNA
Repeat arrays that are even shorter and simpler than mini-sattelites
Deithard Tautz first published on this type of locus, which later became the marker of choice for forensic work previously done by mini-satellite ‘DNA fingerprinting’
Identifiable on a gel and vary greatly
Repetitive DNA aka satellite DNA evolves very quickly via DNA turnover mechanisms that generate variation in the length of repeats.
Gene families
These are repetitive arrays of genes e.g. ribosomal DNA (rDNA) repeats a lot as it is essential for coding large quantities of it
DNA turnover mechanisms
DNA Slippage – within chromatids or during replication
Unequal Crossing-over – within repetitive non-coding arrays, or within gene families
Gene Conversion – promoting greater or less diversity
Transposition – ‘Jumping genes’
DNA slippage
within a single chromatid:
a combination of strand breakage, looping and repair could lead to expansion (repeat array) or contraction
slippage during replication:
misalignment within the array during replication together with excision or repair leading to expansion or contraction