L4: Segmental Duplications and Fusion Transcripts Flashcards

1
Q

Colette’s current research line is regarding the role of human-specific duplicated genes in neurological disorders. Regarding evolution, in what two ways can this be studied?

A

Humans and prior: What genetic changes have occurred in the human genome during evolution? How might these affect human brain development?

Humans: Is there genomic variation in modern humans that exists as part of ongoing evolution? Could these changes have given our species an increased susceptibility to certain diseases?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

On what four levels foes genomic evolution roughly take place?

A

1) Changes in the coding part of genes
2) Changes in how genes are regulated
3) Changes in (epi)genomic environment?
4) Creation of new genes / deletion of existing genes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Name four ways in which the creation of new genes or deletion of existing genes could occur

A
  • Retrotransposition of processed pseudogenes
  • De novo genesis from non-coding DNA
  • Gene duplication
  • Gene fusion
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How may de novo genesis from non-coding DNA occur?

A

Couldn’t previously encode a protein but undergoes some sequence mutation which allows translation to occur

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How can gene duplication occur?

A

Error during replication

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How may gene fusion occur?

A

Deletion causes fusion of two genes which create a new transcript

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

To what extent do non-orthologous regions between human and chimp account for in our differences?

A

Non-orthologous regions between human and chimp account for 3% of genetic differences

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Do segmental duplications account for any differences in our genome between humans and chimps?

A

Yes, there is no overlap in the human specific SD with chimps whatsoever

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is one major way that duplications have functioned as a mechanism of genome evolution?

A

There have been several whole-genome duplication (WGD) events in the genome of the vertebrate ancestor

the ancestral chordate had seventeen chromosomes. Subsequent to one WGD resulting in 34 chromosomes, a second WGD plus chromosomal fusions resulted in 54 chromosomes at the base of the vertebrates. Additional chromosome fusions occurred in the lineages leading to euteleostomes (bony vertebrates) and amniotes (including birds and mammals). A third WGD occurred in the teleost lineage.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the relevance of these WGDs for evolution?

A

Opens the floodgates for a whole new wave of evolutionary changes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a much more common duplication event than WGD?

A

Much more frequent are smaller duplications of subsections of the genome: segmental duplications:
Genomic regions greater than 1 kilobase (but usually 100s of kbp to mega bp), with greater than 90% identical sequence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the relevance of these segmental duplications for our genome evolution?

A

These events have been the origin of many new protein-coding genes during evolution (NOTCH2NL, ARGHAP11B etc arise from this)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What event during mitosis can incur duplications or deletions in the genome?

A

Non-homologous crossover can cause duplications/deletions in the genome: Unequal crossing over leads to a reciprocal duplication and deletion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Where in the genome does non-homologous crossover often occur?

A

Often involved with regions displaying high levels of sequence similarity / repetitive DNA. By mistake lined up and swapped genetic material (often when this happens is around same sequence across people- could lead to disorders). Leads to swap and deletion of large portions of DNA. Therefore they are not randomly distributed; occur in these hotspots- in certain clusters or loci. Some homologous sequence which triggers exchange of genetic material

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Why is non-homologous crossover relevant to evolution?

A

An important source of novel genetic material and new genes! (Often leads to negative effects but rarely can lead to something evolutionary beneficial)

There was a ‘burst’ of duplications in the human-great ape lineage; Duplications were highest in the ancestral branch linking humans and great apes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How frequent are deletions as opposed to duplications?

A

3-fold excess of duplications over deletions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are duplications in NHC linked to?

A

Duplications are linked to segments of DNA known as core duplicons

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What can these segments of DNA called core duplicons lead to?

A

Segments of DNA called core duplicons result in the serial accumulation of segmental duplications. Results in increasingly larger duplication blocks (100s of kbp)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What pattern is seen in evolutionarily younger duplicated segments?

A

Evolutionarily younger duplicated segments are located at increasing distances from the core

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is the relevance of core duplicons to segmental duplications?

A

Most human-specific segmental duplications are located around core duplicons

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Give an example of a human-specific family of genes arising from segmental duplication

A

The morpheus gene family, also named the nuclear pore inter-acting protein (NPIP) family, is one of the best studied human core duplicon gene families. It originated from one gene present on chromosome 20 in macaques and expanded in the great ape–human (NPIP) gene family on lineage along chromosome 16 through segmental duplications. It can be subdivided into two distinct subfamilies, NPIPA and NPIPB, which mostly differ with respect to exon 5 and the structure of amino acid repeats in the C-terminus

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

How is the NPIP gene relevant to evolution?

A

It has not been possible to identify paralogs outside of primates, i.e. it appears tobe a newly evolved or rapidly evolving gene. In fact, the Morpheus gene family was shown to be one of the most rapidly evolving gene families during hominoid evolution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Describe how segmental duplications can be evolutionarily beneficial

A

Segmental duplications can include genes or parts of genes. There are many examples in which a complete or partial gene duplication has created an entirely new human-specific gene over the course of evolution.

Genes arising from duplication events are not subject to strong selection, so can accumulate mutations at a greater pace. Segmental duplications also accelerate evolution by providing homologous sequences that encourage further rounds of duplication. Many genes located in highly duplicated regions serve critical neurodevelopmental functions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

How may an entirely new gene arise from a duplication? What else could happen regarding the gene following a duplication? (4)

A

Mutations could accumulate on the duplicated copy so that it takes on a novel function OR degenerates and becomes a pseudogene OR the duplicated and parental copy undergo subfunctionalisation.

If only part of the gene is duplicated, it might already have a different function to its parental gene.

If it is placed in a new genetic context, it may be expressed in different spatial or temporal context.

It could be adjacent to other genes so that fusion transcripts are formed from transcriptional readthrough (Might be placed upstream of an existing gene, you can get something where the first gene starts to be transcribed and you get a fusion transcription)

25
Q

If a segmental duplication is not evolutionarily beneficial, what else can arise from it?

A

Segmental duplications can lead to serious disorders: Highly repetitive and unstable regions are much more prone to deletions and duplications. Unequal crossover events can lead to the loss of genes that have acquired a critical new function.

26
Q

Give seven examples of disorders arising from segmental duplications from one family of disorders

A

Copy number variations in multiple human-specific genes have been linked to neurodevelopmental disorders:
* Developmental delay
* Autism
* Schizophrenia
* Epilepsy
* Spinal muscular atrophy
* Microcephaly
* Macrocephaly
* Others…

27
Q

Describe the figure shown in the slides depicting how evolutionary innovation can lead to genomic instability

A

Increasing genomic complexity can lead to segmental duplications which can further this complexity and later give rise to new genes and functionality. This can also give rise to genetic deletions and duplications which can then give rise to neurological disorders.

28
Q

Why do we stand to benefit from knowing the evolutionary history of these duplicated regions?

A

The human genome has many regions that have undergone a high amount of segmental duplications or other genomic rearrangements over evolution. These regions are also prone to continued rearrangements and these can cause genetic disease
Knowing the evolutionary history of these regions can help us understand what goes wrong in these diseases

29
Q

How common are fusion transcripts?

A

While structural rearrangements can place whole genes or parts of genes in new contexts so new “fusion transcripts” arise, “fusions” between two adjacent genes are very common

Basically all pairs of adjacent genes show some level of transcriptional readthrough, but usually at very low levels (~100 times less than the two full-length genes). But in some cases, such fusion genes could become evolutionarily beneficial and take on a novel function

30
Q

Where has there been a lot of work done on fusion transcripts (field)

A

There has been much focus on the role of fusion transcripts in cancer, but they are also abundant in normal tissues (e.g. brain).

31
Q

In a karyoplot showing fusion transcripts, what do long loops typically depict?

A

(The long-range loops shown on the karyoplot likely come from DNA sequences that have undergone structural rearrangements, and their new location is not properly marked in the reference genome.)

32
Q

In what two ways can fusion transcripts vary?

A

Fusion transcripts vary in expression level and frequency in the population

Some are widespread, others are found in very few individuals

33
Q

What clinical relevance do fusion transcripts then have?

A

Could differences in fusion transcript expression underlie disease susceptibility in some cases?

34
Q

What is the relevance of segmental duplications to fusion transcripts?

A

There is a tendency for fusion transcripts to cluster in regions enriched with segmental duplications, although they are also found in non-duplicated regions.

35
Q

What region of the genome was selected for the lecture due to its richness in segmental duplications?

A

17q21.31 is a region enriched with segmental duplications

36
Q

What is significant about the variation of this region in the population? (what was originally thought)?

A

The region can occur in direct orientation (H1) or inverted orientation (H2) in the human population (25%). Contains multiple genes implicated in neuronal functioning including CRHR1, MAPT and KANSL1.

E.g H1:
CRHR (=>) MAPT(=>) KANSL1
KANSL1 (=>) CRHR (<=) MAPT(<=)

37
Q

Why is this H1 and H2 description of 17q21.31 not completely accurate?

A

There are 8 structural haplotypes (sets of DNA variations, or polymorphisms, that tend to be inherited together) within this region. This reflects that there is a lot more variation hidden when considering one reference human genome.

38
Q

Describe the 8 haplotypes present in 17q21.31

A

Region containing CRHR1 and MAPT: [g]
Region containing KANSL1: [y]
Region adjacent to that containing KANSL1 in original models: [b]

H1’: Three haplotypes:
[g >] [< y] [b] [b]
[g >] [< y] [b]
[g >] [< y] [b] [b] [b]

H1D: Two haplotypes:
[g >] [< y] [y] [b] [b]
[g >] [< y] [y] [b] [b] [b]

H2’: Two haplotypes:
[y >] [< g] [b]
[b] [y >] [< g] [b]

H2D: One haplotype:
[b] [y >] [< g] [y] [b]

39
Q

What syndrome is associated with 17q21.31?

A

17q21.31 microdeletion syndrome (Koolen de Vries syndrome):
* Developmental delay, intellectual disability
* Cheerful, social disposition
* Distinctive facial features
* Epilepsy
* Hypotonia
* Cardiac, kidney and skeletal abnormalities

40
Q

Where lied the difficulty in detecting segmental duplications in people?

A

When sequencing for the reference genome, fragments of DNA ~300,000 bp were Sanger sequenced and assembled together (“contigs”). The consensus sequence was taken and transposable elements or SNPs not alligned with the consensus were not integrated into the reference genome.

The method of assembly does not allow for correct mapping of loci containing recent duplications. The reference genome is used, however, when examining the genome and sequencing a given persons DNA; take DNA and map to reference genome.

41
Q

Describe how we can detect variation in segmental duplications between people (but not orientation)

A

Whole genome sequencing can be used and coverage/ density plots can show the number of reads that were mapped in that location in the reference genome. Regions with increased / reduced coverage indicate duplications / deletions

In areas of duplication you should see a rise in the density plot. This is meant to represent how many sequences have mapped to that region of DNA, can see how many copies someone has in a given location. Doesn’t tell us where they are on the genome. Analysing coverage from sequencing data therefore tells us about copy number, but not about orientation and arrangement of segmental duplications

42
Q

How can copy number and orientation be detected?

A

Copy number and orientation can be detected with fluorescence in situ hybridisation (FISH). FISH uses fluorescent DNA probes to target specific chromosomal locations within the nucleus, resulting in colored signals that can be detected using a fluorescent microscope.

43
Q

How was the evolutionary history of the 17q21.31 locus reconstructed and what was found? (3)

A

Aligning sequences between different haplotypes allows us to estimate the time to most recent common ancestor and reconstruct the evolutionary trajectory of the region.

Then H2 (“inverted”) orientation is the ancestral haplotype. The inversion occurred independently in humans and chimpanzees. This regions is prone to recurrent inversions.

H2 only: New world & old world monkeys

H1/H2 polymorphic: Orangutang, Gorillas, chimpanzees, humans

44
Q

How are there patterns in the evolutionary history of the 17q21.31 locus within humans?

A

From our H2 ancestor in western Africa, there was a H1-H2 diverge 2.3 million years ago which migrated south in Africa. H1’ carriers also migrated to east Africa.

There was also an African-European H2 diverge with the out-of-africa migration. Included in this was a H2’-H2D diverge 1.3 million years ago and a H1’-H1D 250K years ago. H1’ carriers migrated to asia and and the other migrated to europe ????

45
Q

Has there been selection of particular 17q21.31 haplotypes during evolution?

A

When comparing genetic sequences across the locus, we see that H2D individuals are extremely similar to each other (extremely low genetic diversity). This is suggestive of a recent bottleneck followed by a population expansion or selective sweep.

46
Q

Describe specifically what changes in 17q21.31are attributed to Koolen de Vries syndrome and what is observed in neurons of patients with this disease

A

Microdeletion syndrome is caused by loss of the gene KANSL1. Most of the symptoms of Koolen de Vries syndrome have been attributed to loss of KANSL1. KANSL1 is part of an epigenetic modifying complex, which influences gene expression via H4K16 acetylation. Koolen de Vries syndrome neurons show synaptic defects due to oxidative stress and proliferation defects.

47
Q

Is this region of the genome linked to any other diseases apart from 17q21.31 microdeletion/ Koolen de Vries syndrome? Explain

A

17q21.31 is also linked to risk of neurodegenerative diseases: Genome-wide association studies (GWAS) look at single nucleotide polymorphisms (SNPs) throughout the genome, and assess which of these are statistically linked to disease. There are certain SNPs found more often on H1 haplotypes, and others found more often on H2.

H2-associated SNPs have been linked to:
o Reduced incidence of Parkinson’s disease, Alzheimer’s disease, and progressive supranuclear palsy
o Increased fecundity (reproductive potential)
o Larger intracranial volume

Thus it could be said that a forward orientation has an increased risk compared to reverse orientation.

48
Q

What is more likely than these being SNPs according to Colette?

A

If some has a forward and someone has a backward they don’t recombine; suggesting they have been evolving separately.

The more likely underlying factor than individual SNPs are that these SNPs, once inherited, combined with something else in the locus.

49
Q

How could this alternative manner of variation in 17q21.31 have come about?

A

It could have been due to segmental duplication:

Most people with the protective H2 haplotype are H2D, i.e. have an extra segmental duplication (giving a partial KANSL1 duplication) not found in the reference genome. Perhaps this duplication contains some genetic material that affects neurodegenerative disease risk.

50
Q

Why don’t people with H1 have the same risk then, since H1D also has a partial KANSL1 duplication arising from segmental duplication?

A

The proportion of H1 people that are
H1D is much lower, which could explain why the H1 group has higher risk overall

30% of H1 individuals have a partial KANSL1 duplication (H1D) while over 95% of H2 individuals have a partial KANSL1 duplication (H2D).

SNPs in H1 are associated with increased risk while SNPs in H2 individuals are associated with increased risk. Maybe something in this duplication carries something to do with neurodegenerative diseases

51
Q

What differences are there in the transcription of H1D and H2D? (2)

A
  1. There is one full length transcription of the KANSL1 gene in all haplotypes. The partial duplication in H1 covers the KANSL1 exon 1, 2 and 3 before being truncated while H2D only covers exon 1 and 2.

No duplication:
[g >] [< y] [b] [b]

H1D:
[g >] [< y] [y] [b] [b]

H2D:
[b] [y >] [< g] [y] [b]

  1. KANSL1 genetic duplications also create different fusion transcripts in H1D and H2D
52
Q

Describe how the KANSL1 genetic duplications create fusion transcripts

A

They looked in RNA sequencing data for all sequencing reads containing 3’ end of KANSL1 exon 3 and found that KANSL1 exon 3 fused to the ARL17A/B exon but out of frame.

They also looked in RNA sequencing data for all sequencing reads containing 3’ end of KANSL1 exon 2 and found that exon 2 fused to the KANSL1 alternative exon but out of frame BUT it also fused to the LRRC37A3 in frame.

They found that a small proportion of H1’-H1’ individuals could possess any of these fusions.

All H1D-H1D individuals possessed the exon 3 ARL17A/B fusion and none of the others.

H2D-H2D individuals all possesed the alternate out of frame fusion and a large proportion possessed the in frame LRRC37A3 fusion and none possessed the ARL17A/B fusion.

Therefore the KANSL1 fusion transcripts result from the fusion of KANSL1 exon 2 or 3 with exons of other nearby genes, or novel exons and they are strongly linked to specific 17q21.31 haplotypes (H1D and H2D).

53
Q

How strongly expressed are these fusion transcripts?

A

These fusion transcripts are highly expressed, sometimes as highly as the normal full-length KANSL1 mRNA (ARL17A/B (out-of- frame) in H1D-H1D individuals). Buttt the OOF novel exon was 50% of expression and LRRC37A3 was around 10% of expression in H2D-H2D individuals.

The potential consequences of these transcripts for protein expression depend on whether the fusion is in-frame or out-of-frame.

54
Q

How were these findings regarding the KANSL1 fusion transcripts validated?

A

These KANSL1 fusion transcripts can also be validated by PCR amplification from RNA extracted from people with the segmental duplication. This was carried out via reverse Transcriptase (RT)-PCR (below) followed by Sanger sequencing.

55
Q

How does the genomic assembly of H2D lend insight to these findings?

A

Long-read sequencing shows us the full exon structure of the fusion transcripts. In the H2D alternate genome assembly, KANSL1 is upstream of LRRC37A3, segmental variation has placed part of KANSL1 upstream of this other gene, and transcriptional readthrough generates the fusion transcript

56
Q

How would we explore whether these KANSL1 fusion transcripts have a function?

A

The next step is expressing these fusion transcripts in neurons and/or organoids, to establish their potential function. From there we can do Rna-seq, ChIP-seq, electrophysiology etc

57
Q

Relate the earlier figure regarding increased genomic complexity to these new findings

A

Increasing genomic complexity can lead to segmental duplications such as those which occurred repeatedly in the 17q21.31 locus; originating from a core duplicon.

This can further this complexity and later give rise to new genes and functionality: Partial KANSL1 duplication and fusion transcripts
* Protection against neurodegenerative diseases?
* Another role in neurodevelopment?

This can also give rise to genetic deletions:
Due to the high level of sequence similarity in nearby segmental duplications

This can also give rise to duplications:
17q21.31 microdeletion / Koolen de Vries syndrome

58
Q
A