Eukaryotic genomes and their evolution Flashcards
How big is the human genome?
Genome=Set of genetic material (DNA) present in a cell or organism
Also includes non-coding sections
3 billon base pairs which is 2 metres of DNA
Explain variability in genomes between humans and bacteria
A lot of variability in composition and size of genomes between species
Human is mostly made of non-coding DNA
Bacteria is mostly made of coding DNA
What is the C-paradox?
Size of genome doesn’t correlate to complexity
Example amoeba has 600 billion base pairs but human only has 3 billion base pairs
Amoeba is less complex but has a larger genome than humans
Gene number is also highly variable
Worm has same number of genes as human but is much simpler
What is the composition of the human genome
1% of human genome consists of exons (coding DNA that makes proteins)
24% is introns
Exons comprise 5% of each gene, so genes (exons + introns) comprise 25% of the genome
Human genome has 20,000 genes
Repetitive DNA (transposable elements) (<50%)
Regulatory elements (introns/other intergenic DNA): switches that activate/deactivate genes
non coding genome consists of 1 million enhancers
What is gene duplication and what does it lead to?
Gene duplication is how new genes evolve
38% of human genes are derived from gene duplication
Gene duplication leads to gene families (paralogous genes)
Are sister genes that share a common ancestor
Very similar sequence
Found in the same genome
Can be found on different or the same chromosome
May be clustered together or dispersed through the genome with diverse function
Degenerate into pseudogenes: come from same ancestor but have lost function
Paralogous vs Orthologous genes
Paralogous: 2 sister genes or gene clusters in the same organism, arises from gene duplication, structural similarity, come from common ancestor but have diverged since
Orthologous: same gene found in 2 different genomes with the same function, Example humans and chimpanzees both have a specific gene (with usually the same name)
Why is gene duplication rewarded by evolution?
More protein production
If gene doesn’t work anymore, sister gene can produce a similar protein
What is synteny?
Pieces of genome/chromosomal regions of different species where homologous genes occur in the same order
Come from the same ancestor
Relationships between mouse and human genomes, most functional genes are in a syntenic region
Explain the two different approaches to the human genome project
Public (Watson/colins) aproach said they would sequence in 15 years and cost 3 billion dollars
Celera genomics aimed to sequence in 3 years and 300k dollars. Used shotgun sequencing.
HGP was published in 2003
But 8% of genome is still unsequenced due to heterochromatin
Now there is next generation sequencing techniques (Illumina) that sequences quickly and cheap
How are genomic elements conserved among species? How can we use bioinformatics?
Conservation between species varies depending on what we are looking at: coding genes, enhancers/promoters, transcription factor binding sites
Bioinformatics: Uses sequence alignment tools to study conservation of the genome
How are coding genes conserved between species?
Sequence conservation predicts conservation in function. Orthologues are most likely to retain the common ancestral function 80% of human genes are found in mice. So can express the gene in mice to study effect of a specific disease gene. Use mice as model organisms
How are regulatory elements conserved between species (transcription factors)?
Does not apply to cis-regulatory elements. Conservation of binding preferences and binding sites. But only small amount of transcription factor binding is conserved among species.
How are enhancers conserved across species?
Enhancers with conserved sequences across species are NOT equally functional
Most enhancers are not functional across species
80% of human genes are conserved in mice
But humans and mice have different enhancers that regulate which genes are expressed
So function differently even though genes are the same
This also applies to primates
Humans and chimpanzees are 98% genetically similar but have different enhancers
Compare genome similarity of Humans vs Chimps
1% divergence between genes shared (98% same)
6% of genes are not shared between humans and chimps
Large amount of loss and gain of genes since evolutionary split
Human chromosome 2 is a result of the fusion of the chimp chromosomes 2A and 2B
Humans have lost many olfactory genes (humans don’t need to smell as much)
What are molecular clocks?
LUCA lived 3.8 billion years ago (first form of life)
We know due to molecular clocks
Uses fossils and rate of mutations to deduce when a species diverged
Nucleotide or amino acid sequences are compares among species to date when they last shared a common ancestor
Rate of mutation assumed to be constant
Rate may differ from gene to gene
Genes that are responsible for basic functions mutate more slowly
Mitochondria was formed from symbiosis: was a bacteria that was incorporated into the cell due to it’s essential function
Use mitochondrial genome to measure mutation rate as it has a constant mutation rate
Effects of mutations are neutral
Circular DNA with only a few genes
Inherit mitochondrial DNA from the mother without recombination
What are mitochondrial haplogroups?
Haplogroup: specific mutations present in mitochondrial DNA
Lived about 200,000 years ago in West Africa
Supports out of Africa hypothesis
How does a genome acquire new genes?
Horizontal gene transfer
Exon shuffling
Duplication and divergence - this is very rare (1% chance for 1 gene in 1 million years)
What are the 3 different outcomes of gene duplication?
Duplication of one gene leads to 2 similar genes
Selective pressure on both genes: genes stay similar (More genes = more proteins)
Selective pressure on just one of the genes: one copy degrades (Accumulates mutations and generates pseudogenes)
Selective pressure on just one of the genes: one copy acquires a new function (Gene is important but can tolerate a new function. sub-functionalization: new copy of gene is slightly different = specialization)
How does gene duplication occur during DNA replication/meiosis?
Gene duplication can occur during chromosomal recombination (crossing over)
Crossing over occurs during meiosis and leads to new combinations of alleles
Error in chromatid pairing leads to duplication of regions
During DNA replication due to DNA polymerase slippage
DNA replication occurs via DNA polymerase
Ex. 15 CA repeats originally
Polymerase pauses in CA repeat domain
Newly formed strand melts and reanneals incorrectly (slipping)
Mutation is repaired incorrectly = duplication
Ex. now 17 CA repeats
What is Neo (sub) functionalization? Give an example
After gene duplication, two genes with identical function are unlikely to be maintained in the genome
Each daughter gene adopts a part of the function of the parental gene
Changes occur in expression pattern of two genes
Gains mutations
Leads to genes having similar but not identical functions (specialization)
Genes are expressed at different times and in different cell types
Example: trypsin vs chymotrypsin
Duplicated 1500 million years ago
Proteases
Trypsin: cuts at arginine and lysine
Chymotrypsin: cuts at phenylalanine’s, tryptophan’s, tyrosine’s
Example: transcription factor families (S0X genes)
Many paralogues of S0X with similar functions
What are pseudogenes?
Pseudogenes: gene duplicates and one copy completely degrades
Occurs in the first million years after duplication if the gene is not under selection
Gene duplication generates function redundancy
Not advantageous to keep identical copies of the same gene
Mutations disrupting structure and function and not deleterious
Accumulate until gene becomes non-functional pseudogene
Time frame = 4 million years
Pseudogenes can still be transcribed to mRNA but will not produce a functional protein
What are the non-processed pseudogenes?
Tandem duplication of genomic region (from a normal duplication event)
1 copy faces lack of selection
Inactivating mutations or incomplete duplication
Missing regulatory regions
What are processed pseudogenes?
Reverse transcriptase activity (LINE, retrovirus, transposons): parasitic elements with a copy paste mechanism
Gene is transcribed to RNA
RNA is reverse transcribed to cDNA and re-integrated into the genome
Lack of regulatory regions/introns (mRNA source) = non functional
Contain polyA tail/flanking repeats (responsible for transcription termination)
Can integrate into the same or different chromosome
What are ribosomal protein pseudogenes in humans and how are they conserved across primates?
20,000 human pseudogenes in genome
Many are ribosomal protein pseudogenes
Large family (2000 copies)
Processed pseudogenes
Form specific L1 retrotransposon
Highly transcribed / high expression rate
Highly conserved across primates
2/3rds human RP pseudogenes also in chimpanzee genome
<12 shared with ordents
Implies recent origin
What are multigenerational families?
Multigene family: When a duplication is beneficial to form a group of similar genes
Genes in family can have slightly different functions so become specialized
Example rRNA genes (Mycoplasma genitalium:2, Xenopus laevis > 500)
Tandem gene family: members of multigene family are on the same chromosome
Dispersed gene family: members of the multigene family are on different chromosomes
What are HOX genes and what is their function?
HOX genes are a multigene family
They form a homeotic protein
Encode for transcription factors that bind DNA and can regulate activation or inactivation of genes during embryonic development
Important for development and patterning of limbs / appendages
Control pattern of body formation during early embryonic development
Control compartmentalization / regionalization of body parts in animals along head to tail (anterior-posterior) axis
What is the homeodomain in HOX genes?
Homeodomain / homeobox / HOX
Domain: functional unit of a protein
60 amino acid protein, forms a helix-turn-helix, highly conserved protein in animals
Has a DNA binding domain
Zinc finger domain also important for DNA binding
How was the growth of anntennapedia discovered?
Normal antennapedia gene is expressed in second segment of a flies thorax and helps in the development of the second pair of legs
Mutation changes the location of the gene and causes legs to frow from the fly’s head in place of the antennae
Not important how much genes are expressed but rather where they are expressed
If HOX TF’s are expressed in the wrong location, appendages / limbs grow in the wrong place
Homeotic = something has changed to resemble something else
Explain the composition and function of HOX genes in insects
Insects have one cluster of HOX genes consisting of 8 genes
8 genes are expressed in a specific region of the body
Cluster is divided into 2 clusters / complexes
Antennapedia complex: 4 genes responsible for head and first and second thoracic segments
Bithorax complex: 4 genes responsible to third thoracic complex, bithorax complex and 8 abdominal segments
Homeotic transformations in insects: mutations in insect HOX genes result in one body segment taking on the identity of another
Explain human HOX genes
Humans have 4 clusters of HOX genes
Each cluster has 13 genes = 52 HOX genes
Each cluster is in a different chromosome (4 in total)
HOXA, HOXB, HOXC, HOXD (HOXA1, HOXA2 for gene number)
Gene duplication and neo functionalization lead to 52 TF in humans = specialization = more complex structure / function in humans than in insects
How are HOX genes conserved between species?
Conservation of HOX genes between drosophila and humans
Many of human HOX genes were already present in drosophila (ancestral versions)
But neo functionalization allows human to be more complex than a fly (8 vs 52 genes)
Give some examples of mutation of HOX genes and their impact
HOXD13: patterning of fingers is impaired
HOXA2: impacts ear development
HOXB1: eye and face development
What is the HOX vertebrae common ancestor?
Branchiostoma lanceolatum: Oldest vertebrae known to have 1 cluster of HOX genes, ancestor of humans
Marine fish-like chordate (vertebrae)
Displays features of last common ancestor
1 cluster (15 genes) of HOX genes: barely has an appendage/mouth
What HOX genes does a Sea lamprey display?
Sea lamprey: oldest vertebrae that has 4 clusters of HOX genes like humans
Before increase in body plan complexity
More HOX genes = more complexity
What is genome duplication? Why is it more tolerated than single chromosome duplication?
Larger duplications than genes and segments is possible
Genome duplication: duplicating the entire genome (incl. transposons, regulatory elements)
One singular chromosome duplication is not tolerated well
Example down syndrome trisomy on chromosome 21, Edwards syndrome, trisomy 18, Patau syndrome, trisomy 13)
Leads to gene product imbalance and reduced life expectancy
Whole genome duplications (WGD) could be a source of speciation
Duplicating the entire genome is more tolerated
Eukaryotes contain 2 haploid gene sets (diploid)
Polyploidy: have multiple complete sets of chromosomes (entire genome is duplicated not only 1 chromosome
Where is polyploidy common and what are the 2 types?
Polyploidy is widespread in plants
80% of flowering plant species originated via polyploidy
Ex. oats, cotton, potatoes, banana, coffee
Polyploidy is common in invertebrates, fish and amphibians but rare in mammals
2 main types of polyploidy
Autopolyploidy: happens within the same species. mistake during meiosis makes diploid gametes instead of haploid gametes (4 chromosomes of each instead of 2)
Allopolyploidy: occurs between different species, hybrid reproduction
What is autopolyploidy and what issues does it produce?
Multiplication of identical species within a single (sub) species
Fertilisation by unreduced gametes
Error in meiosis accidentally produces diploid gametes
1-40% frequency of formation
Very common in plants
Can reproduce Successfully but can’t breed with parent species (2n + n = 3n)
Allows speciation
Autopolyploids are more viable than allopolyploids (especially in plants) because each chromosome has a homologous partner and can form a bivalent in meiosis
Issues=can induce disease symptoms
Genomic shock: widespread activation of transposons, gene expression, recombination
Things that are not meant to be repressed / activated
What is allopolyploidy?
Hybridization between 2 species reproductively compatible species that are very similar ex. only recently split in evolution
One step model (most common route): both / one parent(s) have unreduced gametes (diploid) due to error in meiosis = polyploid offspring (diploid + diploid = tetraploid
Two step model: hybridization between haploid gametes followed by somatic doubling (after mating duplication event)
What are the benefits of whole genome duplication?
Raw material for evolutionary diversification
Functional gene divergence
Defence against mutation (If one gene looses it’s function, another gene can replace it’s function)
Buffer against environment (and extinction)
Colonise new environments
Fitness consequences (Increases cell size, Organ size, Faster growth, Dosage regulated gene expression)
Locus
each gene has a locus which is a specific position on a pair of homologous chromosomes
Allele
alternative form of a gene. each parent donates one allele for every gene
Homozygous
alleles are identical. Same genetic variant in the two alleles in gene locus
Heterozygous
alleles are different. Different genetic variants in the two alleles in a gene locus
Genotype
combination of two alleles (maternal and paternal) for each gene
Dominant alleles
always upper case. Gene that will be expressed if two alleles are different
Recessive alleles
always lower case. Masked if two alleles are different
Phenotype
Physical manifestation of genotype
What is an SNP and how often do they occur?
SNPs: DNA sequence variations that occur when a single nucleotide (A, T, C, G) in the genome sequence is altered
Example: AATCGAC –> AAGCGAC
For a variation to be considered an SNP, it must occur in at least 1% of the population
SNPs make up 90% of all human genetic variation
SNPs occur approximately every 1000 bases
Why are SNPs important?
Can affect how humans develop diseases
Can affect how an individual responds to pathogens
Can affect how an individual responds to drugs, etc
In biomedical research for comparing regions of the genome between cohorts
Where do SNPs occur in the genome?
Intergenic region: a transcription factor or enhancer/regulatory sequence
In promotor or transcription factor binding region
In exon: affects amino acid sequence = affects protein (example a premature stop codon truncates the protein)
In intron: can be a regulatory region example mutation in splice site affects splicing