L6: Transposable Elements Flashcards
What was the 1,000 genomes project?
Effort to sequence the genomes of ~ 1000+ genomes from across the globe, resulted in 2,504 individuals across 26 distinct populations
What were the major findings from the 1,000 genomes study? (6)
- 88 million variants in 2,512 genomes
- 84.7 million single nucleotide
polymorphisms (SNPs) - 3.6 million short insertions/deletions (indels)
- 60,000 structural variants (including big copy number variations)
- Many segregating SNPs (between populations)
- Rare alleles are often restricted to sub-populations
How much does a typical genome differ to that of the reference human genome?
We find that a typical genome differs from the reference human genome at 4.1 million to 5.0 million sites.
What accounts for the vast majority of these variants found between a given genome and the reference genome?
SNPs and short indels account for >99.9% of variants
Aside from these SNPs and short indels describe the remaining sources of variation
2,100 to 2,500 structural variants affecting ∼20 million bases of
sequence
* ∼1,000 ‘large’ deletions,
* ∼160 copy-number variants,
* ∼915 Alu insertions,
* ∼128 L1 insertions,
* ∼51 SVA insertions,
* ∼10 inversions.
Which of these listed sources of variation are transposable elements?
Alu, L1, SVA
What in our genome consists mostly of transposable elements?
Most of our genome’s ‘Junk DNA’ consists of Transposable Elements. Intriguingly, regulatory sequences often originate from TEs. They mimic gene sequences surrounding them; If they are in a blood cell they will express promotors for red blood cells etc.
When do transposable elements remain in our genome?
They stay in the genome if they infect a germ cell
What are transposable elements (TEs)?
TEs are repetitive genetic sequences that once had or still have the ability to transpose, that is, to mobilise and insert elsewhere in the genome. In contrast to genes, TEs are enormously diverse across species and often species-specific
How much of our genome are transposable elements?
45%
How can TEs be classified?
Broadly, TEs can be divided into two classes: DNA transposons excise and insert a DNA intermediate when they transpose (‘cut and paste’), whereas retrotransposons reverse transcribe RNA intermediates prior to integration (‘copy and paste’). DNA transposons are few and inactive in most mammals whereas retrotransposons are abundant
How can retrotransposons be further classified?
Retrotransposons are further classified by whether they contain long terminal repeats (LTRs). Most LTR-containing retrotransposons in mammals are endogenous retroviruses (ERVs). Frequent recombination between LTRs leaves behind many solo LTRs in the genome.
Retrotransposons lacking LTRs include autonomous long interspersed elements (LINEs) and non-autonomous short interspersed elements (SINEs), which require LINE-derived proteins for their mobilization. In addition to LINEs and SINEs, humans encode additional primate-specific composite elements called SINE variable-number tandem-repeat Alu (SVA) elements
Where do TEs originate?
The origin of most TEs is uncertain. Whereas ERVs are likely to have arisen from ancient viral infections, some non-LTR retrotransposons may have evolved from self-splicing group II introns in bacteria. These group II intron TE predecessors are still mobile in eukaryote organelles and gave rise to the spliceosome, thereby contributing to eukaryote evolution.
Many SINEs arose from cellular RNAs such as tRNAs. Whereas SINEs evolved several times independently, LINEs are all related to each other and can be traced back to eukaryotes. RNA-binding domains of diverse TE-encoded proteins even occur in all cellular life
Do mammalian TEs still jump?
Although in most mammals DNA transposons and the majority of retrotransposons are inactive, some copies of distinct retrotransposon families can still mobilise including LINEs (L1; also known as LINE-1), SINEs (B1, B2) and ERVs (intracisternal A-type particles (IAPs) and early transposons (ETns)) in mice and L1 and Alu in humans.
To what extent do the different TEs make up our genome?
L1: 16.9%
Alu: 10.6%
SVA: 0.2%
Other non-LTR retrotransposons: 6%
LTR retrotransposons: 8.3%
DNA transposons: 2.8%
Comment on the length and activity of LINE1
- Full length LINE 1 elements are 6 KB long
- 99.9% of LINE1 insertions are truncated on the 5’ side (=inactive)
- Active throughout Primate evolution
- Active in humans
Describe the Line1 evolution
L1 elements replicate via an RNA intermediate that is copied into genomic DNA at the site of insertion. This mechanism of replication is not very efficient and generates mostly defective copies that are truncated at their 5′ end
These copies can be classified into families of hundreds to thousands of elements based on the shared nucleotide differences they inherit from their common progenitor (or group of closely related progenitors). Because the vast majority of L1 inserts are pseudogenes, they accumulate mutations at the neutral rate
Consequently, older families are more divergent than younger ones. Phylogenetic analyses of L1 families in murine rodents and in primates have shown that, over the long-term, a single lineage of L1 families amplifies and evolves, one family replacing its predecessor as the dominant family. Families of closely related variants can occasionally coexist for short periods of time until one family prevails and dominates the replicative process.
How conserved are LINEs?
They are human specific, other species have their own. Newborn children will have lines that no one else has.
What accounts for the ‘bulk’ of retrotransposition in the human population?
There are 80–100 ‘hot’ retrotransposition - competent L1s in an average human being.
WHat LINE is currently active?
- Currently L1PA1 (=L1Hs) is active
- ~1500 L1Hs insertion in humans
- ~128 non ref L1 insertions (not in the reference genome)
What is the composition of LINE1 elements?
A full-length element is 6 Kb long and contains a 5′ untranslated region (5′UTR), two open-reading frames (ORFI and ORFII), and a 3′UTR. The 5′UTR has a regulatory function, ORF1 has an unknown function (RNA binding?), ORF2 encodes Endonuclease and Reverse transcriptase. L1 has its own promoter (5’UTR) which translates the whole sequence and another regulater of which the function is unknown.
Where did LINE1 stem from?
Comes from a virus which infected vertebrates a long time ago; no longer exists
Are LINE1s autonomous? What does this mean?
They are autonomous as they can retrotranspose themselves