Paul Gardner Flashcards
outline the importance of ncRNAs?
ncRNAs are central to life’s major processes; some studies suggest >80% of human genome encodes ncRNAs
ncRNAs involved in central dogma reactions:
Y RNA important for replication initiation
spliceosome packed with ncRNAs; potentially involved in catalaysis of splicing reaction
RNase MRP made up of RNA; chops up long transcripts to produce ribosome
snoRNP guides modifications of rRNAs
RNAseP processes precursor tRNAs
alot lot of these prob ribozymes
what is a vault organelle?
example of ncRNA
eukaryotic organalle; function not fully understood yet
larger than ribosome and highly conserved across much of life
knockouts are viable
what is a PrfA thermoswitch?
example of ncRNA based mecahnism of thermoreg of translation
PrfA protein a master coordinator of a major virulence regulon in Listeria; translation of PrfA regulated by thermoswitch upstream of start codon
cold temps - forms secondary structure preventing ribosomal binding = no translation; ingest listeria our bodies heat up RNA to about 37 degs; RNA structural change and ribosome can bind
ncRNAs are some of the most ___ molecules of life?
highly conserved
what are self-splicing RNAs?
RNAs that when transcribed fold into secondary structure and cleave themselves out of RNA
how was it identified that transcription is pervasive?
early evidence from EST and microarrays suggested pervasive transcription
later confirmed by ENCODE and RNA-seq experiments
how do we define ncRNAs?
can be defined in many ways; two extremes:
All RNAs other than mRNA i.e. transcription alone sufficient to define a ncRNA
RNA genes which produce transcripts functioning as structural, catalytic or regulatory RNAs, other than encoding a protein i.e. requires evidence of function; function often also used in ambiguous way (e.g. causal vs selected effects)
what is the evidence suggesting a lot of transcription is noise?
some studies have shown a lot of transcription occurs from random DNA sequence
TF binding sites cover the genome
we have a big genome which can be broken down into functional and junk bits; both can end up getting transcribed
discuss the proportions of cellular mammalian RNAs by mass and by molecules?
RNA by mass; rRNA makes up 80-90% due to large size
RNA by number of molecules; tRNA makes up most
so cellular RNA pool very much dominated by rRNAs and tRNAs
why is it important to consider the makeup of cellular RNA pool when using RNA-seq methods and what strategies are used as a result?
RNA-seq can be used to sequence pool of transcripts in cell BUT if cellular RNA pool dominated by rRNAs and tRNAs the sequences will be dominated by these
can size select into long and short RNA fractions; can deplete rRNA to better find unique transcripts like ncRNAs; can enrich mRNAs using polyA enrichment to better quantify protein-coding transcripts
what are catalytic RNAs?
RNA that catalyse reactions aka ribozymes
outline how RNA can acts as an enzyme?
RNA can store genetic information and catalyse reactions e.g. RNA viruses
discovered in early 1980s RNA can act as an enzyme (ribozymes) e.g. self-splicing introns
this was identified to be a two-step reaction called transesterification involving cleavage and ligation
what is transesterification?
exchanging organic functional R group with organic group of an alcohol
the two-step reaction used by ribozymes to catalyse biochemical reactions
outline the RNA world hypothesis?
solution for question of which came first; genetic encoding info in DNA or enzymatic activities of protein
possible neither; RNA may have been progenitor of modern DNA/RNA/protein-based life
what is the ribozyme RNase P?
is a ribonucleoprotein (RNP) found in all branches of life
required for maturing precursor tRNAs; reaction catalysed by ncRNA component
how is the ribosome a protein synthesis ribozyme?
eukaryotic and bacterial ribosome both large complexes composed of RNA. protein and have large (50s) and small subunit (30S); a lot of similarity when aligned
used to be thought protein in ribosome drive translation but they not that conserved and located at periphery of complex; catalytic core of ribosome highly conserved across all life and entirely RNA
discuss the ribosome catalytic core?
aka peptidyl transferase centre; important location for translation reactions i.e. taking charged tRNAs w aa attached, chopping them off, adding to peptide chain
key interactions include with mRNA (RBS w small subunit) and tRNA interactions w large subunit via CCA tail
major target for antibiotic development (and just the ribosome in general)
what is the spliceosome?
takes pre-mRNA and chops out introns between all the exons to produce mature mRNA w no introns
does this by assembling diff RNAs and proteins on mRNA; RNAs do most of the key interactions on complementary regions of mRNA
so spliceosome forms complex on mRNA and catalyses transesterification (cleavage + ligation)
what evidence is there to suggest the splicesome might be a ribozyme?
catalytic core of spliceosome comprised of RNA and it shares a lot of similarities (structure, reactions) with self splicing introns
outline splicesome assembly and splicing reactions?
involves multiple stages, transient interactions between several ncRNAs, mRNA and proteins
two-key chemical steps:
- cleavage of 5’ exon-intron boundary
- ligation of 5’ exon w 3’ splice site
i.e. two transesterification reactions
what are the key interactions of the spliceosome?
between splice sites, branch site and spliceosomal RNAs
discuss the catalytic core of the spliceosome?
U2 and U6 snRNAs create active centre similar to that of group II splicing introns
catalytic core is highly conserved across all eukaryotes (we only found spliceosomes in eukarya) and composed entirely of RNA
what is SELEX?
systematic evolution of ligands by exponential enrichment (evolution in a test tube)
take pool of random RNA sequences and apply selection pressure via affinity with some other molecule in membrane of column
RT the ones u got and PCR amplify and induce some error w low fidelity DNApol so you get variation and run back through
repeat over and over and eventually you get population of RNAs which bind your ligand of choice w high affinity
outline polymerase error rates?
DNA polymerases used in lab (e.g. taq) have varying fidelity; pick whatever one suitable for your SELEX to induce error
can also increase error rates w modified nucleotides; misreading of template = more errors following PCR
why do we want to induce error during SELEX?
you want a lot of variants within that population not just original copy
idea is using SELEX to generate RNAs with novel properties
how has SELEX been used to develop fluorescent RNAs?
use SELEX approach to select for things that bind fluorogens which are small-molecule dyes that fluoresce by binding aptamers (e.g. RNA)
has allowed discovery of lots of fluorescent RNAs that fluoresce at diff wavelengths and fold into specific structures with lots of non-canonical bping to bind specific fluorogens
can sequence population throughout SELEX to see evolutionary changes e.g. what is conserved as it goes on
how was a SELEX-like approach used to develop novel proteins?
one SELEX-like approach called directed evolution used same principles; induce mutations in protein-coding gene, insert into bacteria, select for those with desirable properties, repeat more rounds of this to evolve novel proteins
why is RNA structure important?
can be essential for function
outline RNA structural components?
RNA structures are module meaning they can be decomposed into subcomponents e.g. loops, stems, bulges - these are what we use for computational modelling
how can we represent RNA secondary structure with dot-bracket notation?
if not involved in watson-crick bp you put a dot and if it is you put a bracket
what is the difference between canonical (watson-crick) and non-canonical (wobble) base pairs?
canonical - C-G, A-U (3/4 of RNA bp are canonical)
non-canonical - G-U
discuss how all base-pairs are possible?
under certain conditions every base pair can form
i.e G will pair with A, C with C etc.
how can RNA base-pairing differ based on geometry?
each nucleotide has three possible edges where each nt interaction can occur
- watson-crick edge (most common)
- sugar edge
- hoogsteen edge
these interactions can also occur in cis (normal bp) or trans (both sugars same direction)
this means we have 18 possible pairing relationships based purely on geometry i.e. lots of diverse nt interactions
what is the driving force of RNA folding?
base stacking - RNA backbone is negatively charged so bases stack like coins in a roll
non-canonical interactions underrated in their contribution to RNA structure
how can we predict RNA structure?
methods range in effort required and accuracy of result; more effort generally means more accuracy
can predict secondary structure from:
1 - computational prediction e.g. free energy minimisation
2 - indirect experimental evidence e.g. chemical/ enzymatic probing
3 - direct evolutionary evidence e.g. comparative analysis
4 - direct structural evidence e.g. x-ray crysto, NMR, C-EM
1 and 2 more accessible, faster and generate more models
3 and 4 more effort but give more information and more accurate models
outline how you predict RNA structure from free-energy minimisation (single-seq)?
algorithm that maximises basepairs to find minimal free energy structures; total energy can be computed by summing energies for each structural component (e.g. stacks, loops) - all you have to do is decompose into structural components and enter sequence
calculates most stable secondary structure that corresponds to that sequence; the more negative gibbs free energy the more stable
energies can be looked up in tables derived from melting experiments; ground state (0) assumed to be completely unfolded i.e. no bping
discuss the accuracy of minimum free energy (MFE) as a method to predict RNA secondary structure using computational modelling?
accuracy is low and this method sucks because:
energy parameters estimated from non-biological conditions and models extrapolated from limited experiments
fails to account for a variety of things influencing RNA folding e.g. crowding of cellular environment, PTMs, folding kinetics, co-transcriptional folding, transcriptional pausing
also you end up getting a lot of v different structures with similar energy values
but this method is easy
why do we use comparative sequence analysis to predict RNA secondary structure from direct evolutionary evidence?
often RNA structure conserved better than RNA sequence
conserved RNA structure indicated from covarying base-pairs (cause negative selection preserves variation that maintains base-pairs) - identify these with deep alignments
alignments can be pasted into RNAalifold which converts covaration measures to pseudoenergies; gives bonuses to stacks supported by covariation and penalises variation that is inconsistent with pairing; combines this with MFEs for each sequence and gives consensus secondary structure prediction
compare RNAalifold with AlphaFold2?
alphafold is an AI model that uses similar approach for protein predictions
- build deep sequence alignments
- find covarying sites
- predict global structure
there is currently no alphafold for RNA cause limited solved RNA structures and also RNA folding more complex cause six torsion angles (protein has 2)
what is RNA structure probing?
structure-dependent modification of RNAs can impede polymerases
use reagent that covalently modifies either paired or paired bases
map fragments to full-length RNAs to infer features of RNA structure; can tell if the nt is paired or unpaired based on what reagent used
info can then be used to improve or constrain MFE structure predictions
outline the benefit of using high-resolution methods for predicting RNA structure?
e.g. X-ray crysto, NMR, cryo-EM
these are ideal as RNA is challenging; flexible ribose+phosphate backbone, weak long-range tertiary interactions, alternative conformations and multiple functional states
outline how despite many non-coding variants being associated w disease this area is largely understudied?
up to 90% significant GWAS results lie in non-coding regions; roughly half of these map to introns
large scale screens for disease association often don’t study non-coding SNPs as coding variants are enriched and can test for function much easier
outline how redundancy makes it difficult to link ncRNAs to disease?
proteins are generally single copy but many ncRNAs are multicopy with multiple paralogs or pseudogenes
for most nuclear genes both parental copies expressed
because of this ncRNAs robust to variation due to redundancy; hard to knockout w frameshift etc. cause another copy covers for it
exception is 24 mitochondrial RNA genes as maternally inherited so genes single copy
why are mitochondrial variants associated with disease?
mitochondrial transfer RNA genes especially susceptible as single copy and maternally inherited i.e. no redundant copies
22 mt-tRNAs and >350 mutations in these reported; phenotype can be complex; same mutation often results in v diff diseases and vice versa
diseases associated with variants in mitochondrial genes tend to affect intensive processes e.g. muscle and brain function
why tf would you get mitochondrial donation treatment? What even is it?
if you a carrier of a mitochondrial syndrome and wanna have kids can do this
take donor eggs, take out nucleus leaving maternal mitchondria, put your own nucleus in and use IVF to produce embryo
what are snoRNAs?
required for maturing rRNAs; guide covalent modifications of target rRNAs and snRNAs; some may regulate splicing events
two main classes; H/ACA box and C/D box snoRNAs; called this cause carry motifs
motifs: H - ANANNA; C - AUGAUGA; D - CUGA
i.e. H(ANANNA) which is a hinge and an ACA tail
evolutionarily conserved sites imply important interactions
discuss our current understanding of snoRNAs?
some have been well characterised in terms of function (people have figured out their targets)
some are orphans i.e. no known targets
about 17% C/D box and 16% H/ACA box snoRNAs are orphans including SNORD116
discuss the snoRNA SNORD116?
C/D box orphan that has strong link with prader-willi syndrome which results from a paternal deletion on chromosome 15 i.e. imprinted locus; characterised by weak muscles and developmental issues in newborns; constant hunger in adults and physical deformities
SNORD116 has 29 tandomly repeated copies i.e. multicopy but all located on same part of genome hence why deletion takes them out –> PWS
function and targets of SNORD116 unknown; recent research may have found a SNORD116 target; potentially influences expression of 200 mRNAs
what is the difference between major and minor spliceosome?
target different splice sites and comprised of slightly different RNAs (minor has U11 and U12 while major has U1 and U2, both have U5); there is a lot of homology between them tho
diff stages of formation referred to as diff complexes; key one is B complex (tri-snRNP) which is critical for function; forms on the mRNA at site of intron being removed
what is MOPDI aka taybi-linder syndrome?
autosomal recessive disorder characterised by developmental issues
caused by mutations in single copy RNU4ATAC gene encoding U4atac/U6atac snRNA leading to decreased formation of tri-snRNP complex resulting in small splicing defect and retention of introns usually removed by minor spliceosome
U4atac and U6atac have long region of complementarity and bind each other to form part of tri-snRNP complex - most variants linked to MOPDI affect formation of stem loop
severe effect; death by age 3
why are there so few ncRNAs linked with disease?
non-synonymous changes in proteins are enriched and easier to test
can be difficult to discover mechanisms of ncRNA function
large numbers of paralogs and pseudogenised copies of ncRNAs make identifying variants difficult
exome sequencing (protein-coding exons) has dominated large-scale efforts to connect genetic variation w disease