Structure of Genomes I & II (lectures 2-3) Flashcards
Number of genomes sequenced
(February 2024;
https://www.ncbi.nlm.nih.gov/genome)
- Eukaryotes: 30,530
- Prokaryotes: 567,228
Why is understanding genome structure important? = 3
1 * Medicine – predisposition to certain diseases, response to drugs
2 * Explanations for evolutionary change
3 * Production of “better” food (plants and animals), fodder, and fuel
Gene Ontology (GO) Annotations (from the website: https://geneontology.org/):
“Associations of gene products to GO terms are statements that describe”: 3
- “Molecular Function:
- the molecular activities of individual gene products” - “Cellular Component:
- where the gene products are active” - “Biological Process:
- the pathways and larger processes to which that gene product’s activities contributes”.
Prokaryotic genomes: What are they?
differences?
- Archea
- Bacteria
— differences:
different MOLECULAR and GENETIC characteristics
Prokaryotic genomes:
Bacteria and Archea
Similarities?
1 * no extensive internal compartments
2 * chromosome / nucleoid
3 * plasmid(s)
————4 * integrative or independent
Prokaryotic genomes: How is DNA packaged? = 6
- DNA packaged with DNA-binding proteins
2 * typically circular genomes
3 * negative supercoiling - Protein core limits loss of supercoiling if a break occurs
5 * domains
6 * loops ~10 – 100 kb
Escherichia coli nucleoid:
Know most about ‘Escherichia coli’ nucleoid but DNA-binding proteins found in other species, including archea.
Prokaryotic genomes cont’d = 5
shape and coiling
- Circular, double-stranded DNA
- Remove a few turns of the double helix
- Molecule forms a negative supercoil
diagram Protein core, Supercoiled DNA loops, Broken loop- no supercoiling
Look at Diagrams on it
Prokaryotic genome size:
Prokaryotic genome size is variable
- most < 5 Mb
- range: 112 kb – 14.8 Mb
- average gene density: ~950 genes / 1Mb
Prokaryotic genome size: Largest genome and smallest genome:
1 * largest genomes found in free-living soil bacteria – ability to respond to changing environment
2* smallest genomes in endosymbiotic bacteria – more consistent environment
Prokaryotic genome size - commonly arranged …?
- commonly arranged in OPERONS (not universal)
- group of genes
- involved in the same biochemical pathway and
- expressed as a single unit
Prokaryotic genome size…is proportional to?
Prokaryotic GENOME size is proportional to GENE NUMBER
Prokaryotic genome organisation and content: 4
- Majority of BACTERIAL and ARCHEAL genomes are CIRCULAR
…..2 * SOME are LINEAR,
e.g. Borrelia burgdorferi (causative agent of Lyme disease) - Many PROKARYOTIC genomes are MULTIPARTE
….4 * two or more molecules
Prokaryotic genome organisation and content :
PLASMIDS = 7
- typically CIRCULAR
- REPLICATION INDEPENDENT of nucleoid
- up to 1000s of COPIES / cell
- PARTITIONED TO NEW CELLS INDEPENDENT OF NUCLEOID
- contain GENES NOT ESSENTIAL FOR SURVIVAL IN PERMISSIVE HABITATS/CONDITIONS
- TRANSFERRED to / TAKEN UP BY VARIOUS SPECIES
- argument: plasmids not
be included in definition
of prokaryotic genomes
BUT….
Essential genes are found on ‘D. radiodurans’ R1 plasmids….
EXPLAIN = 4
‘B. burgdorferi’ (causative agent of Lyme disease)
- 1 linear chromosome
- 19 linear and circular plasmids
- indispensable genes, e.g. encoding some membrane proteins
Chromosome vs chromid vs plasmid
- CHROMOSOME (s) – located in nucleoid, carries essential genes
- CHROMID – uses plasmid partitioning system, carries essential genes
- PLASMID– uses plasmid partitioning system, carries nonessential genes
’ V. cholerae’ vs ‘D. radiodurans’
‘V. cholerae’ : one chromosome, one chromid
‘D. radiodurans’ : two chromosomes and two chromids
‘E.coli’ genome… space? separation?
other parts
- 11% of genome = non-coding DNA
- little space between genes
- some genes separated by only a single
nucleotide (thrA and thrB) or none (thrB and
thrC)
- some genes separated by only a single
- thrA-C = operon; encodes proteins for
threonine biosynthesis
- thrA-C = operon; encodes proteins for
- Some archeal genes have introns
- Some prokaryotes contain nested genes = genes encoded within other genes (aka overlapping genes)
- Bacterial genes = slightly longer than archeal genes
What is in ‘E.coli’ genome SEQUENCES? = 8
- repeat sequences
- few high-copy-number interspersed repeat families (compared with eukaryotes;)
- insertion sequences (IS)
- mobile elements (transposons) repeated in the genome
- nontransposable repeat elements
- repetitive extragenic palindromic (REP) sequences
- nontransposable repeat elements
- gene regulation?
8. * clustered regularly interspaced short palindromic repeats (CRISPRs)
- gene regulation?
‘Prokaryotic genome organisation and content’
Lateral (aka horizontal) gene transfer; where, who? = 5
- gene flow between prokaryotic species
- frequent
- gene flow between prokaryotic species
- most prokaryotic genomes contain hundreds of kb of DNA from different
prokaryotic species4.* transfers occur between bacteria and archea
- most prokaryotic genomes contain hundreds of kb of DNA from different
- DNA originates from the environment, exchange of plasmids and viral
vectors
- DNA originates from the environment, exchange of plasmids and viral
‘Prokaryotic genome organisation and content’
Lateral (aka horizontal) gene transfer; how? = 7
- multiple genes in a singe transfer
- mechanisms of transfer
3. * transformation
4. * conjugation
5. * transduction
- mechanisms of transfer
- confuses species relationships
7. * laterally transferred gene will have relatively similar sequences in two species – due to little time for sequence divergence
- confuses species relationships
‘Prokaryotic genome organisation and content’
Lateral (aka horizontal) gene transfer;EXAMPLES? = 4
- antibiotic resistance
- ability tolerate hot environments
- anaerobic to aerobic GROWTH HABITS
- METABOLIC PATHWAYS
Prokaryotic genome organisation and content cont’d DIAGRAMS
SLIDE 18
Prokaryotic gene function catalogue (GO Terms):
‘E. coli’ genome = 5
FUNCTION
GENE FAMILYS
- function of many genes still not known
- no significant similarity to any known genes in other bacteria
-
- gene families
-
- genes having arisen from gene duplication events
- e.g. rRNA genes in bacteria and archea
Eukaryotic genomes:
Heterochromatin vs Euchromatin = 16
- LINEAR CHROMOSOMES with NEGATIVE SUPERCOILING in MEMBRANE-BOUND NUCLEUS
‘Heterochromatin’
2. * densely staining regions in interphase nucleus
- chromatin densely packed
- constitutive
- permanently condensed chromatin
- DNA is gene poor - but does contain some genes
- centromeric, telomeric
- many repeat regions
- centromeric, telomeric
- other chromosome regions
10. * most of the human Y chromosome
- other chromosome regions
- constitutive
- facultative
- not permanently condensed
- exists in some cells at some times
- not permanently condensed
- DNA encodes genes that are inactive at particular times or in particular cells
- facultative
‘Euchromatin’
15. * cannot see in interphase nucleus
16.* DNA is less condensed and gene rich
Eukaryotic genome organisation;
‘Nucleosomes’ = 7
- protein core = histone octamer
- 2 subunits of histones H2A, H2B, H3, and H4
- protein core = histone octamer
- DNA ~147 bp
- “slide”
- expose chromatin regions for transcription
- also involves chromatin-remodelling proteins
- “slide”
- removed and replaced during DNA replication
Eukaryotic genome organization cont’d:
‘Higher orders of chromatin structure’ = 4
UNDERSTANDING EUCHROMATIN LOOPS
- Euchromatin loops ARE
- dynamic
- extension / merging – allow access to transcriptional machinery
- condensation – repress transcription
Eukaryotic genome organization cont’d:
‘Higher orders of chromatin structure’ = DIAGRAM
SLIDE 22
Eukaryotic genome organization: ‘Organisation of chromosomes in interphase nucleus’ = 13
- Organisation of chromosomes in interphase nucleus
- CT – chromosome territory
- specific for each chromosome
- LAD – lamin-associated domains
- heterochromatin interacting with nuclear lamins at nuclear periphery
- TAD – topologically associating
domain- Compartment A
8.* transcriptionally active
9.* enhancer-promoter
interactions
- Compartment A
- Compartment B
- transcriptionally repressed
(facultative heterochromatin)
- transcriptionally repressed
- Compartment B
- chromatin switches between compartments,
depending gene expression
demands
- chromatin switches between compartments,
Eukaryotic genome organization: ‘INSULATORS’
= 7
- define TAD boundaries
- 1-2 kb long
- in many (all?) eukaryotes
- DNAse I insensitive
- interact with specific DNA-binding proteins
- establish functional domains, e.g. loop
- prevent cross-talk of regulatory domains between functional domains
Eukaryotic genome organization: ‘INSULATORS’ DIAGRAM
SLIDE 24
Eukaryotic genome size: 3
- Great VARIABILITY in eukaryotic genome size
- 10 Mb – 100,000 Mb
- EUKARYOTIC GENOME SIZE is NOT PROPORTIONAL TO GENE NUMBER
Eukaryotic genome size: correlation? = 4
- Overall correlation of genome size with morphological complexity of organisms
-
- HOWEVER, no precise correlation between genome size and complexity
- *especially evident when looking within eukaryotic groups
- C-value paradox/enigma (C-value = haploid genome size)
-
Eukaryotic genome size:
‘Factors contributing to the C-value paradox/enigma’ = 3
Factors contributing to the C-value paradox/enigma
- non-protein coding DNA
- gene density
- “split genes” – # introns / gene
Non-protein coding DNA scales with morphological complexity DIAGRAM
SLIDE 27
PROKARYOTES AT THE BOTTOM
EUKARYOTES = MOST
Repeat sequences in eukaryotic genomes: ‘INTERSPERSED REPEATS’ = 10
- GENOMES of most MULTICELLULAR EUKARYOTES have substantial amounts of moderately and HIGHLY REPETITIVE SEQUENCES
- Interspersed repeats
3. * repeat units distributed (seemingly) randomly around the genome - in intergenic regions and introns
- DNA transposons
- retrotransposons
7. * LTR – long terminal repeat retrotransposons
8. * Non-LTR retrotransposons- SINE – short interspersed nuclear element
- LINE – long interspersed
nuclear element
- LINE – long interspersed
- retrotransposons
- in intergenic regions and introns
Repeat sequences in eukaryotic genomes: TANDEM REPEATS = 16
Tandem repeats
1. * repeat units located next to each other
2. * satellite DNA (satellite bands after fractionation and density gradient centrifugation of genomic DNA)
3. * repeat unit < 5 bp to > 200 bp
4. * clusters 100s of kb in length
5. * e.g. centromeric DNA
- minisatellites
7.* not part of satellite bands on gradients
8. *repeat unit up to 25 bp
9. * clusters up to 20 kb
- minisatellites
- e.g. telomeric DNA
- microsatellites
12. * not part of satellite bands on gradients
13. * repeat unit < 13 bp
14. * clusters < 150 bp
15. * used to establish kinship
16.* an individual’s genetic profile
- microsatellites
DIAGRAM SLIDE 29
Eukaryotic genomes contain pseudogenes (2 TYPES)
AND gene relics = 9
- Conventional pseudogene
- inactivated due to mutation
- Processed pseudogene
- derived from a mRNA that is converted to cDNA and reinserts into genome
- no introns or regulatory regions that ancestral gene had
6. * inactivated
- no introns or regulatory regions that ancestral gene had
- Gene relics
- truncated gene – from 5’ or 3’ end
- gene fragments
Eukaryotic genomes contain pseudogenes and gene relics = DIAGRAM
SLIDE 30
Eukaryotic gene density
Genes are more closely packed along the chromosomes of less complex organisms
Less complex organisms contain fewer split genes
Genome of yeast compared to genomes of more complex eukaryotes
* few genes with introns – yeast genome has 239; human genome > 300,000
G-value paradox
- Gene number does not scale with morphological complexity
- Alternative splicing leads to multiple mRNAs and proteins from a single gene
* explains part of the C- and G-value paradoxes
G-value paradox = DIAGRAM
SLIDE 33
Eukaryotic genomes contain gene deserts: 9
WHAT, SIGNIFICANCE? IN HUMAN GENOME?
- Large regions of chromosomes (10^5-10^6 bp) devoid of known genes or other functional genetic elements
- human genome
3. * 25% consists of gene deserts
4. * chromosomes 4, 5 and 13 (30-40% of the chromosomes)
- human genome
- significance of gene deserts
- not known
- some contain regulatory sequences that act over large distances to control gene expression
- not known
8. * others show no clear function 9. * superfluous regions of genomes??
Eukaryotic genomes contain gene families:
SIMPLE VS COMPLEX: SIMPLE…= 9
- Simple (aka classical) gene families
- all members have identical or nearly identical sequences
-
- arose from gene duplication events
- rRNA genes
5. *humans:
- rRNA genes
- 2000 genes for 5S rRNA
- single cluster on chromosome 1
- 280 copies of 28S, 5.8S, 18S repeat unit
- 50-70 repeats clustered on multiple chromosomes
Eukaryotic genomes contain gene families:
SIMPLE VS COMPLEX: COMPLEX…= 7
- Complex gene families
- members have similar sequences
- different enough to code for gene products with different properties
- arose from gene duplication events
5. * mammalian globin genes- expressed at different
developmental stages
- expressed at different
- biochemical properties correlate to physiological needs during development
- arose from gene duplication events
Eukaryotic genomes contain nested genes? How many Categories? =7
- Overlapping genes found in the genomes of yeast, protists and metazoans
- Two major categories
- genes nested within intron of another gene (= external host gene)
4. * relatively common in
eukaryotes
- genes nested within intron of another gene (= external host gene)
- non-intronic genes nested opposite coding sequence of external host gene
6. * no clear evidence of these in metazoan genomes
7. * present in yeast and
protistan genomes (and
prokaryotic genomes)
- non-intronic genes nested opposite coding sequence of external host gene
Eukaryotic genomes contain nested genes diagram
slide 36
Eukaryotic gene function catalogues (GO Terms):
= 8
- Human genome
- greatest number of genes in all categories except metabolism
- many more genes involved in defence and immunity
- ‘Caenorhabditis elegans’ (nematode worm) genome
- high number of genes in cell-cell communication category
- *1000 genes vs 1250 in humans
- *BUT only 959 cells vs 1013 cells in humans
The Hidden Genome:
Non-coding (nc)RNAs
Non-coding (nc)RNAs
* tRNAs, rRNAs, circRNA (circular RNA),
eRNA (enhancer RNA), lincRNA (long
intergenic non-coding RNA), microRNA
(miRNA), NAT (natural antisense transcript),
piRNA (PIWI RNA), scaRNA (small Cajal
body-specific RNA), siRNA (small interfering
RNA), snRNA (small nuclear RNA), snoRNA
(small nucleolar RNA)
The Hidden Genome =
RNAs encoding microproteins and peptides = 8
- smORF = small open reading frame
- shorter than 100 amino acids
- dORF = downstream open reading frame
- located in the 3’-UTR of known proteincoding genes
- dORF = downstream open reading frame
- uORF = upstream-encoded smORF
- located in the 5’-UTR of known proteincoding genes
- uORF = upstream-encoded smORF
- nuORF = novel unannotated open reading
frame
- nuORF = novel unannotated open reading
- SEP = small peptide
The Hidden Genome DIAGRAM
SLIDE 38
The Forbidden Genome: 4
- Short DNA sequences not compatible with life
- minimal absent words (MAWs)
- not found in a particular genome (nullomers)
- not found in any genome (primes)
The Forbidden Genome: USES = 4
- tags to distinguish samples (e.g. control or reference samples vs forensic samples)
- suicide genes that could be encoded by genetically modified organism and activated to destroy them if they prove dangerous
3 * anticancer peptides (NulloPs)
- biomarkers for cancers