Organisation of the Genome Flashcards
What forms the human genome
3 billion base pairs spread over 23 pairs of linear chromosomes (51Mbp-245Mbp)
Size of mitochondrial genome
16,569bp, circular DNA
What percentage of the human genome encodes for proteins
~1%
Size of E. coli genome
4.6Mbp, 4288 protein-encoding genes
Size of mouse genome
2,800Mbp, ~23,000 protein encoding genes
C-value paradox
Human genome is smaller than mudpuppy genome, but human genome has a greater percentage of protein encoding genes
Explain DNA Melt-Reassociation
Denatured ssDNA fragments -> rapid reassociation -> highly repeated reannealed dsDNA fragments
Denatured ssDNA fragments -> intermediate reassociation -> moderately repeated reannealed dsDNA fragments
Denatured ssDNA fragments -> slow reassociation -> Unique reannealed dsDNA fragments
Eukaryotic DNA sequence organisation
Single copy
Gene families
Tandem gene arrays
Intermediate repeat (transposable elements)
Simple sequence repetitive DNA
Single copy DNA in genome
- Forms ~25% of genome but exons only 1%
Size of average gene
27kb with 9 exons
Smallest gene
SRY on Y chromosome
0.9kb formed from 1 exon which is 850bp
Larger genes
DMD
Encodes for dystrophin
2400kb formed from 79 exons which are 180bp - introns are 30,770bp
Non-protein-coding single copy DNA
24% of genome is intron
15% of genome is single copy but not a part of a protein-coding gene
Function of single-copy non-coding DNA
- Most of this part is functional - over 80% has ≥1 biochemical activity
- Majority can be transcribed
22,219 non-coding genes
rRNAs, tRNAs, snRNAs
miRNAs - involved in gene regualtion (2,588 identified)
long non-coding (lnc)RNAs (14,727) - some known to be functional, e.g. Xist
Target regulatory proteins
Disease markers eg. DD3/PCA3 (prostate cancer)
Possible causative agents in disease (BACE1)
Human gene families
a-globins - 4 genes
b-globins - 5 genes
Actin - 15 genes
Keratin type 1 - 19 genes
b-tubulin - 19 genes
a-tubulin - 10 genes
What are tandemly arrayed genes (TAGs)
- Gene clusters created by tandem duplications
- One gene duplicated, copy next to original
- Can encode large numbers of genes at a time (2-100’s)
- 14-17% of human, mouse and rat coding genomes
Tandem clusters of rRNA encoding genes
Human embryo has 5-10 million ribosomes
Cell number doubles within 24 hours
Single RNA gene may not be able to provide enough RNA, but tandem repeats allows sufficient RNA production
What are the different eukaryotic transposable elements
Retrotransposons (retroposons):
Transpose via an RNA intermediate
Viral: retrovirus like e.g endogenous retroviruses or LINE like e.g. LINE1 and LINE2
Non-viral: SINEs or processed pseudogenes
DNA-DNA transposable elements:
Transpose directly from DNA to DNA
Similar to bacterial transposons - Not active in human genome
What are eukaryotic transposable elements important for
Genome evolution - Source of regulatory elements, site of recombination. Insertions can cause disease
Viral retrotransposons
Gag - Group antigens (viral core structure, RNA binding)
Pol - reverse transcriptase
Env - Envelope protein
LINE-1 element
> 500,000 copies in human genome
1-6kb in length
Only 40-50 active
Open reading frames:
ORF1:1137bp - homology to gag
ORF2: 3900bp - homology to pol
No LTRs
Timing and tissue specificity of L1 transposition
- Mostly repressed (methylation)
- Demethylation and increased transposition in tumours
Germ cells (many unique new insertions)
Early embryos (somatic cells)
Neural progenitor cells during childhood
Each human is a unique mosaic
What percentage of the human genome has transposable element composition
~30%
Non-viral elements
SINEs (13% of genome)
- Genomic copies of small RNAs
- Mostly belong to Alu family (7SL RNA)
- also copies of snRNAs and tRNAs
Processed pseudogenes (genomic copies of mRNAs)