Lecture 1 revision Flashcards
How many base pairs in human genome
3 billion over 23 chromosomes
How many base pairs in mitochondrial genome
16,569bp, circular DNA
C-value paradox
Explains that the size of the genome of an organism does not correlate with its actual size and complexity
Explain DNA Melt-Reassociation
Denatured ssDNA ->
- Highly repeated reannealed dsDNA after rapid melt association
- Moderately repeated reannealed dsDNA after intermediate melt-reassociation
- Unique reannealed dsDNA after slow melt-association
What types of molecules are responsible for eukaryotic organisation
Single copy
Gene families
Tandem gene arrays
Intermediate repeats (transposable elements)
Simple sequence repetitive DNA
Overview of human genome groups (pie chart)
26% introns
20% LINES
13% SINES
12% Misc, unique sequences
8% misc. heterochromatin
8% LTR retrotransposons
5% segmental duplications
3% DNA transposons
3% simple sequence repeats
2% protein coding genes (20,441)
What percentage of protein coding genes form genome
Around 25%, but only 1% of those are exons
Smallest vs largest protein coding gene
Smallest: SRY (0.9kb with one 850bp exon)
Largest: DMD (2400kb with 79 180bp exons, and a 30,770bp intron size)
Explain the DMD gene in ENSEMBL
> 2 million base pairs on the X chromosome
Main mRNA is 14kb containing 79 exons
SRY gene in ENSEMBL
~0.7kb on Y chromosome
727bp single exon mRNA
What percentage is non-protein coding ?single copy DNA
26% of genome is introns
15% is single copy but not part of a protein-coding gene
Functions of single-copy non-coding DNA
- Most is ‘functional’ - Over 80% has >1 biochemical activity
- Majority can be transcribed
- 22,219 non-coding genes
- rRNAs, tRNAs, snRNAs encoded for
- miRNAs involved in gene regulation (2588 identified)
- Long non-coding RNAs (14727) - some known to be functional that target regulatory proteins, disease markers, causative agents in disease
Human genes families
a-globins- 4
beta-globins - 5
Actin - 15
Keratin type 1 - 19
Beta-tubulins - 19
alpha-tubulins - 10
Beta cluster
-> epsilon
-> Gy
-> Ay
-> (pesudoB)
-> delta
-> Beta
alpha cluster
-> zeta
-> Pseduozeta
->Pseudoalpha
->Pseudoalpha
-> a2
-> a1
-> theta
Explain tandemly arrayed genes
- Gene clusters created by tandem duplications
- Gene duplication = copies next to original
- Encode large numbers of genes at a time (2-100s)
-14-17% of rat, mouse and human coding genome to allow faster transcription
Tandem clusters of rRNA encoding genes
human embryo has 5-10 million ribosomes
Cell number doubles in 24 hours
single RNA gene not enough to provide RNA, but tandem repeats allow RNA production
RNA polymerases for transcription required
Retrotransposons
Transpose via RNA intermediate:
Viral: retrovirus-like e.g. endogenous retroviruses
Non-viral - SINEs and processed pseudogenes
DNA-DNA transposable elements
Transpose directly from DNA to DNA
Similar to bacterial transposons - none active in human genome
Why are eukaryotic transposable elements important
Important in genome evolution
source of regulatory elements
Recombination sites
Insertions can cause disease
Retrovirus/retrotransposon life cycle
Retrovirus enters cell
-> RNA
-> Provirus
-> RNA
-> Retrovirus
Viral retrotransposons
LTR-gag-pol-int-env-LTR
gag - Group antigens
pol - Reverse transcriptase
Env - Envelope protein
LINE-1 element
> 500,000 copies in genome
- 1-6kb in length
- 40-50 are active
- ORF1 - 1137bp - homology to gag
- ORF2 - 3900bp - homology to pol
- NO LTRs
Target-site direct repeat -> Multiple stop codons (1kb) -> Coding region (1kb for ORF1, 4kb for ORF2) -> A/T rich region
Timing and tissue specificity of L1 transposition
- Mostly repressed (methylation)
- Demethylation and increased transposition in tumours
- Germ cells (many unique new insertions)
- Early embryos (somatic cells)
- Neural progenitor cells during childhood - each human is a unique mosaic
Transposable element composition in human genome is…
30%
Of the 30 percent:
Non-ME sequence - 33%
ME and repeat remnants - 21%
17.6% - LINE1
Alu - 10.7%
ERV - 8.9%
LINE2 - 3.5%
DNA - 3.4%
Non-ME repeats - 2%
Others MEs - 0.7%
SVA - 0.1%
Non-viral elements
SINEs (13% of genome):
- Genomic copies of small RNAs
- Most belong to Alu family (7SL RNA)
- Also copies of snRNAs and tRNAs
Processed pseudogenes (genomics copes of mRNAs)
Alu sequences
150-300bp
1 million copies (10% human genome)
Found in other vertebrates
occur every ~6kb
transcribed to RNA
transpose using LINE reverse transcriptase
Sites of recombination
Explain SINE-VNTR-Alu
SVA
non-autonomous hominid specific retrotransposons
Don’t exist in old world monkeys
Several subtypes
Can be transcribed
Mobilize by LINE L1 retrotransponase
Associated with human disease
SVA associated diseases (deletions)
Leukaemia (2kb insertion in HLA-A, SVA/+, deletion)
Neurofibromatosis 2 (1.7kb insertion into NF2, SVA/+, deletion)
X-linked agammaglobulinemia (XLA) (0.25kb insertion into BTK, SVA/Y, exon skipping)
Which transposon types are capable of jumping
LINEs, Viral retrotransposon (except LTR-gag-LTR), DNA transposon repeats that contain transponase
Tandem repeat DNA (simple sequence DNA)
~8% of genome
Repeat unit 2-200bp
- Array length up to 5,000,000bp (alphoid DNA)
- Short tandem repeats
- Mini/microsatellite DNAs
Short tandem repeats
5% genome
repeat length 1-6bp
Total array length - 100bp
Length variation can affect gene expression ins hereditary conditions like schizophrenia by maybe directly binding transcription factors
STRs enriched in TFs
Mini and microsatellite DNA
Mini:
15-100bp repeats
Total array length is 0.5-30kbp
Micro
2-5bp repeats
60-200bp array length
Array length is variable - VNTRs or STRs
Used in gene mapping and paternity tests