Gene structure Flashcards
Splicing (revision)
- OH from one branch attacks the P link between the last base of exon and first base of intron
- OH of last base of exon performs hydrophilic attack on P bond between last base of intron and first base of next exon
- Now have 2 exons ligated together; left w/intron in lariat structure
* Does not require free energy (ATP)
* Can splice over large distances (10’s of kilobases)
* Exon skipping → ligate exon 1 and 3; skip exon 2
* Cryptic splice site → splicing of exon 1 into middle of exon 2 → can cause frameshift in protein
* Co-ordinated by spliceosome and co-factors
Spliceosome
- 5 snRNAs (small nuclear) and ~50 proteins make up the spliceosome
- Core proteins = 4 snRNPs (small nuclear ribonucleic particles)
Assembly and action of the spliceosome
- U1 and U2 assemble onto pre-mRNA in a co-transcriptional manner
- U1 binds at 5’ of intron (donor site)
- U2 binds at 3’ of intron (acceptor site) - U1 and U2 snRNPs form the pre-spliceosome (complex A)
- Pre-assembled snRNP U4-U5-U6 is recruited to form complex B
- Complex B undergoes series of rearrangements to form catalytic B’
- U1 and U4 are kicked out - Complex B’ catalyses first step of splicing, generating complex C (contains exon free exon 1 and intron-exon 2 lariat intermediate)
- Complex C undergoes rearrangements catalysing 2nd step ligating 5’ exon to 3’ exon → post-spliceosomal complex contains the lariat intron and spliced exons
- Release of spliced mRNA and lariat w/help of RNA helicases
Different organisms have different levels of splicing
- Number of genes in an organism’s genome is not a good assessment of protein diversity
- E. coli only 0.1% of genes undergo AS; humans 95%
Multiple forms of mRNA transcript variation –> diversifying the proteome
- Exons retained or skipped
- Introns excised or retained
- 5’ & 3’ splice site positions moved: exons longer or shorter
- E.g. DSCAM gene in drosophila
o Has multiple exons; forms v. specific immunoglobin regions within the protein
o DSCAM = membrane-anchored cell surface protein; role in neural development for axon and dendrite self-avoidance; 24 exons permit over 38,000 variants
Alternative splicing (AS)
- Allows related but different protein forms in different tissues
- Supplements transcriptional control to control expression/function of gene etc.
- Insertion/deletion of specific domains
- E.g. it can regulate antibody and neuropeptide production
- Can regulate expression by including exon w/no stop codon
o Triggers nonsense-mediated RNA degradation
o Regulates balance of functional to non-functional RNAs - Unprocessed RNA not transported to cytoplasm, if transported and translated, the protein is truncated due to stop codon in the intron
Effect of AS on mRNA and protein
- Rate of translation of mRNA
- mRNA degradation susceptibility
- Insertion/deletion of amino acids
- Insertion/deletion of functional domains
- Polypeptide truncation due to stop codon
- Protein properties and functions:
o Soluble or membrane bound
o Subcellular location changes
o Affinity changes for substrate - Consider:
o Selection of alternative splice sites can be tissue and developmental-stage specific
o Splice site selection must be tightly regulated → many genetic diseases can be caused by point mutations that activate cryptic splice sites or delete splice sites
3 groups of alternatively spliced transcripts
- 5’ transcript ends differ from one another
- Due to different transcriptional start sites
- Then pre-mRNA processed differently - 3’ ends differ from one another
- When different poly(A) sites are utilised for transcriptional termination
- Different use in different tissues - Middle portions differ
- Can’t be explained by using different transcription start or termination sites
- Example: troponin T gene (involved in function of skeletal muscle)
o With different internal exons there is 64 different ways they can be combined to express different mRNAs (found in different muscle types)
o Regulated by tissue-specific splicing factors acting on the pre-mRNA
o Used in heart attack: take blood sample; do PCR to detect presence of heart version of troponin T → heart cells ruptured releasing that mRNA into bloodstream
Alternative splicing regulation
- Through splicing factors = proteins that recognise cis-acting sequences within the RNA transcript
o They “Promote” or “inhibit” splice sites in different cases
Example: determination of sex in fruit flies (Drosophila)
- Different protein products in males and females
- Splicing cascade
- Top of hierarchy = Sxl (sex lethal) gene → can autoregulate its own splicing
o Sxl is master sex determination gene in somatic cells → inhibits male-specific binding
o Sxl binds intron and inhibits binding of U2AF factors at splice acceptor sites so stop codon at exon 3 is excluded; splicing goes to next exon producing fully functional Sxl protein
o Have positive feedback to make sure that whenever Sxl is expressed in females it does not include stop codon
o Only expressed in females - Next: tra (transformer) gene
o In males (no Sxl) → get normal splicing of exon U1 to beginning of exon U2AF; within that region there is a stop codon so get small, truncated protein product w/no function
o Sxl binds at proximal splice site (intron 1); prevents U2AF binding and binds cryptic site in exon 2 (known as the ‘distal splice site’); allows the splicing out of a stop codon → produces fully functional transformer protein - Then regulates dsx (doublesex gene) transcription factor
o Males: tra2 binds to binding site in exon 4 (ESE – Exon splicing enhancer); skip exon 4 (contains sequence that promotes transcriptional termination) so have splicing exon 3 and 5; get longer protein product; have transcription factors w/slightly different properties because of inclusion of exon 5 → they then bind to different targets and recruit different complexes to regulate gene expression
o Females: tra binds to tra2 and recruit members of U2AF family to promote splicing to to exon 4; results in transcriptional termination; shorter protein - Tra then regulates the splicing of the transcription factor gene fruitless
o Fruitless encodes a transcriptional regulator that determines development; it has isoforms expressed in males and females
o Has male specific transcription start site at P1
o When Tra is present in females, it promotes splicing form end of exon 2 onto exon 3 resulting in inclusion of stop codon
o In males, splicing occurs before stop codon onto exon 3 → get a fruitless isoform w/male-specific region at N-terminus → this can regulate male-specific phenotypes incl. behaviour - Sxl can also directly inhibit MSL2 involved in dosage compensation
o Males only have 1X chromosome so all genes on X would only be expressed in ½ ; to compensate that, MSL2 increases the expression across the whole chromosome
o Because males do not have Sxl, translation of MSL2 is not inhibited, so it can upregulate expression across the X chromosome
Splicing factors
- Act positively (e.g. Tra) to promote the use of a splice site
- Act negatively (e.g. Sxl) to inhibit the use of a site
- Some are constitutively expressed, others tissue/cell-type -specific
Gene hierarchy:
* Each gene product controls splicing of the next gene in hierarchy
Fruitless gene in males
- Has many aspects of male courtship behaviour
- Orienting → tapping → singing → licking → attempting copulation
Regulating alternative splicing: response to signals
- Levels of intracellular Ca2+ can impact splicing of a gene
- When neurons depolarize → get high Ca2+ → get phosphorylation of specific factor which binds CaRRE site within pre-mRNA → STREX domain is left out; get splicing and formation of gene products less sensitive to calcium
- Activation of neurons results in the alternative splicing of the K+ channel gene SLO
Regulating factors
- SR proteins (serine-arginine rich domain)
- Usually at C-terminal
- Constitutively expressed SR proteins can interact with specifically expressed factors
- Can influence splicing in 2 ways:
a) Bind 5’ splice site & promote U1 snRNP binding
b) Bind (ESEs) exonic splicing enhancers within downstream exon and promote U2AF binding - Heterogenous nuclear Ribonucleoproteins (hnRNPs)
- Inhibit splicing in general so prevent binding of U1 or U2AF factors
ESE = exonic splicing enhancer
ISE = intronic splicing enhancer
ESS = exonic splicing silencer
ISS = intronic splicing silencer
SR proteins (serine arginine repeats) = stimulate splicing
hnRNPs (heterogenous nuclear RiboNuclearProtein) = hinder splicing
Alternative Splicing dictated by
- RNA sequences,
- Constitutive or tissue-specific trans-acting factors
- Splice site strength:
o Ability to bind general factors (e.g. U1 snRNP)
o Presence or absence of ESEs/ISEs (presence of ESE = stronger)
Link to transcription
- Splicing occurs as transcription is still going on
- Rate of elongation can affect splicing pattern
o If polymerase is moving slowly = more likely to include exons with weak acceptor sites
o Faster = more likely to skip exons with a weak acceptor site
o E.g shown in vivo in Drosophila, using a mutant line w/slower RNA Pol II
Identification of alternative splicing
- Microarrays, now next-generation sequencing
o Can sequence all mRNA produced in cell → analyse to find which exons are being included/excluded → ca do this in tissues, different cell types, etc
Future research
- Questions:
o Alternative Splicing in complex tissues – how do you know which cells have a specific splice pattern?
o How can you rapidly identify the target genes of certain splice factors in specific cell types? - What they did → Targeted DamID
o Tagged U2AF50 w/a protein from E. coli DAM (DNA adenine methyltransferase -methylates specific A within GATC sequences)
o Allows to profile protein-DNA interactions in cell-specific manner within Drosophila
o Adapt to splicing factors; maybe see where they are directly interacting w/DNA as genes are transcribed and spliced
o Looked promising; observed strong peaks at 3’ ends of introns; unfortunately, could not repeat it - They also:
o Cloned other splicing factors and performed Target DamID on them
o Most did not show any association w/DNA but Sxl strongly associates w/transcriptional start sites (surprise as not expected)
o Think about whether Sxl could also bring sexual dimorphism in the nervous system through transcriptional regulation, as well as alternative splicing
Intron early theory
- Introns found in all eukaryotic genomes
o Except ‘nucleomorph’ in a species of free-swimming, biflagellate monads - Walter Gilbert
- Introns originated in prokaryotes, then lost by ‘genome streamlining’ (compressing genome)
- Early introns gains thought to be invasive and deleterious
- Could help promote the ‘Exon theory of gene evolution’
o If you have intronic sequences between protein domains, allows greater rates of recombination
o Can shuffle protein domains in genome to create proteins w/new functions
o Shuffling permitted by introns - Allowed the creation of complex genes (and a large protein collection!)
Intron late theory
- Introns only evolved once eukaryote formed
- Archaea and bacteria never had introns or spliceosome
- Though that introns would have jumped into random placement in genes
o Not necessarily corresponding to protein structural elements
Possible ‘in-between’ model
- Prokaryote with group 2 introns (self-splicing retroelements) → invaded archaea-like cell → ended up forming mitochondria known as last common eukaryotic ancestor (has intron-rich genes)
o Introns formed after endosymbiotic event (ie. formation of mitochondria)
Roles of introns during life phases
- Introns can be a host burden
o Spliceosome complex is huge
o Energy & time cost
o Vulnerability e.g. need recognition of cis-regulatory sequences - Roles can be classified as:
o ‘Sequence-dependent functions’
o ‘Length-dependent functions’
o ‘Splicing-dependent functions’
Life phases
- Genomic intron
o In DNA form sitting between exons - Transcribed intron
o From DNA to RNA - Intron being spliced
- Excised intron
o In lariat form - Exon junction complex -harbouring transcript
- Genomic intron
Introns within genome harbour transcription initiation sites (e.g. cis-regulatory elements_
Elements for which transcription factors can bind and regulate transcription of gene intron is located in
Can be enhancers (promoting transcription), silencers (repressing expression), TF binding sites
Often found in 5’
~40% of binding sites → introns
Example: AFP (α-fetoprotein)
Plasma protein made in the liver and yolk sack in the foetus
Regulates osmotic pressure
Has tissue specific expression
Can have P1 promoter before exon 1 or P2 promoter in first intron → leads to formation of different proteins
Introns harbour transcription termination sites
Intron sequences can regulate Polyadenylation + cleavage
eg. in Flt-1 gene
Soluble version inhibits angiogenesis by binding extracellularly to vascular endothelial growth factor
Has 2 transcriptional termination sites
Full length form (terminates after exon 14) → have membrane formed
If termination is between exon 13/14 → form soluble form inhibiting angiogenesis
Different termination sites identified using PCR
Use primer binding upstream at polyA site; extract RNA; size of PCR bands represent form being formed
Can harbour nested genes (whole genes)
~800 in Drosophila melanogaster
May have their own promoter & different expression profile
Are often non-coding RNA & protein-coding genes
- Transcribed introns
- RNA polymerase II: elongation rate up to 50 kb min -1
o Intron transcription may take hours
o Time delay between gene activation and translation of the protein - HES7 gene (helix-loop-helix transcription factor expressed in mice)
o Oscillation of HES7 protein levels is important for directing mesoderm cells to form somites during embryonic development
o Forms negative feedback loop
o HES7 can repress its own transcription
o Unstable protein
o Delays due to time for transcription, splicing and translation; have initial expression of HES7; as levels increase, feedbacks to own promoter to inhibit its own expression; levels go down so mRNA goes down so protein levels go down; HES7 no longer repressed so cycle can start again
o Introns are v. important for ensuring correct period of oscillation; length of intron can affect timing between peaks
o Mutant where introns removed from HES7 → disrupted body plan → somites formed in tightly clustered manner → lethal to developing organism
- Spliced introns
- Splicing linked to transcription
- Linked via RNAPII C-terminal domain
- Splicing can affect: Initiation, Elongation and Termination
- Initiation → U1 binds at donor site of intron → helps recruit splicing factors (TFIID and TFIIH)
o TFIID = first protein to bind to DNA during formation of pre-initiation transcription complex of RNA Pol II
o U1 can help promote initiation → exploited to increase expression of transgenes in cell culture/transgenic organisms - Elongation → U1 interacts w/RNA Pol II subunit (TAT-SF1) to increase efficiency of elongation
- Termination → Endonucleolytic cleavage and poly(A) tail addition to mRNA
o U2 binds at 3’ end and promotes action of CPSF protein to cause termination →CPSF can also help recruit and improve efficiency of U2 acting at specific donor sites
o U1 inhibits action of CPSF