Understanding our genome Flashcards
What is the difference between a genomic and a cDNA library?
A genomic library contains DNA fragments that represent the entire genome of an organism, whereas a cDNA library includes clones that correspond to the mRNA sequences from an organism or from specific cells of an organism.
What is the function of oligo dT chromatography?
to separate mRNA from the other RNA in the cell
What percentage of RNA in a eukaryotic cell is rRNA?
~ 90%
What percentage of RNA in a eukaryotic cell is tRNA?
~ 6%
What percentage of RNA in a eukaryotic cell is mRNA?
~ 2-4%
What is found at the 3’ end of eukaryotic mRNAs?
a poly A tail (added after the mRNA is formed)
How does oligo dT chromatography work?
- oligo dT affinity column
- mRNA A tail hydrogen bonds to oligo dT
- rRNA and tRNA cannot bind to the column
- mRNA is eluted using high salt to break A=T bonds
How is double-stranded cDNA formed from mRNA?
- mRNA is copied to cDNA using reverse transcriptase, dNTPs, oligo dT primer
- RNA phosphodiester bond cleavage by ribonuclease H
- RNA is replaced by DNA by DNA polymerase
- DNA ligase is used to repair the phosphodiester backbone
What does the ‘H’ in ribonuclease H stand for?
hybrid
In sequencing of the human genome, why is only a small amount of enzyme used?
so that the genome is not cut at every restriction site
What is a BAC?
Bacterial artificial chromosome
What are BACs used for?
to create libraries with large fragments (insert size ~ 100 000 bp)
Outline the method of using BACs
- white blood cells are mixed with agarose and placed in a mould
- cell wall is ruptured in the agarose
- restriction enzyme is added to digest DNA in the agarose mold
- each mould is placed in a well of agarose gel
- gel is run and viewed under UV light and DNA of 100 000 bp is excised from the gel
- DNA is eluted from the excised agarose
- DNA is ligated to a plasmid vector excised with the same restriction enzyme
- treated with DNA ligase
- bacteria are transformed
- transformed bacteria are picked into 384-well plates
- bacterial DNA is isolated for sequencing reactions
Why are white blood cells used?
- easy to take a blood sample (non-invasive)
- no associated moral concerns
What is the function of agarose?
to protect the BACs from mechanical shear
What is the size of the mitochondrial genome?
16.6 kb
How many genes does the mitochondrial genome encode?
37
What percentage of the cell’s DNA is made up of mitochondrial DNA?
up to 0.5% due to the hundreds of mitochondrial genomes found in the cell
What genes does the mitochondrial genome encode?
2 rRNA genes
22 tRNA genes
13 polypeptide-encoding genes for oxidative phosphorylation
What is the H strand?
the heavy stand; G-rich
What is the L strand?
the light strand; C-rich
How many genes does the H strand encode?
28
How many genes does the L strand encode?
9
Describe some features of the mitochondrial genome
- circular genome
- genes contain no introns
- genes do not overlap
- the whole strand is transcribed and then cleaved
How much of the genome encodes RNA?
10%
What types of RNA are encoded?
mRNA, rRNA, tRNA, snRNA, snoRNA, other RNAs eg. telomere RNA, micro RNA
What is snRNA?
small nuclear RNA
What is snoRNA?
small nucleolar RNA
Which protein-coding genes have no introns?
tRNA, histones, α-interferons
What is the advantage of histones having no introns?
During the S-phase of the cell cycle, a vast quantity of histones is needed for the formation of the newly synthesis chromatin. The intronless organisation of histone genes may facilitate a highly efficient organisation of histone synthesis.
What is the longest human gene?
dystrophin (2.6 kb)
How long does it take to transcribe dystrophin?
16 hours
What is the implication of the long transcription time of dystrophin?
There is a large amount of time in which mutation can occur
What percentage of dystrophin DNA is protein-coding?
0.6%
Which genes is the globin family comprised of?
alpha globin gene cluster on chromosome 16
(embryo, fetus and adult)
beta globin gene cluster on chromosome 11
(embryo, fetus, adult)
How do gene families arise?
due to gene duplication
Why is the number of genes important?
in order that the right amount of protein is synthesised
Give an example of the importance of gene number
- α-thalassemia is caused by a deficiency of α-globin genes
- the alpha globin cluster is found on chromosome 16 and encodes two alpha globins for fetus and adult
- this makes a total of four alpha globins for fetus and adult in each cell, since an individual inherits one copy from each parent
3α = α-thalassemia trait
2α = mild anaemia
0α = hydrops fetalis
Why does the baby die at birth?
The intact genes for embryonic alpha globin means that the embryo can survive in the womb, but the baby dies soon after birth
How many histone clusters are there in humans?
11 cluster
How many histone genes are there in humans?
60 genes
Over how many chromosomes are the histone genes spread in humans?
over 7 chromosomes
What is the relationship between members of a gene family?
each member encodes the identical protein (highly conserved)
What would happen without histones?
Without histones, DNA could not be compacted into the cell nucleus and would not fit into the cell.The compacted molecule is 40 000 times shorter than the unpacked molecule.
What percentage of the genome is made up of non-coding DNA?
90%
What does ‘non-coding’ mean?
does not code for RNA (except micro RNA)
What types of non-coding DNA exist in the genome?
- introns, regulatory regions
- pseudogenes - redundant, produced as part of the evolution of genes
- gene fragments - also produced as part of the evolutionary process, micro RNAs - regulate gene expression
Give an example of a pseudogene
two Ψ genes on the alpha globin cluster on chromosome 16
one Ψ gene on the beta globin cluster on chromsome 11
Why are pseudogenes non-coding?
Pseudogenes are genes that have picked up mutation and lost their function, since they are now unable to bind RNA polymerase to be transcribed.
What are processed pseudogenes?
- DNA underwent transcription, splicing and polyadenylation to form mRNA
- mRNA was converted back into DNA by reverse transcription
- the DNA was re-integrated into the host genome to form a pseudogene
- due to viral infection during evolutionary history
Where are processed pseudogenes found?
often found on a different chromosome to the functional gene
What is a transposable element?
also known as a transposon; a DNA sequence that can change its position within the genome
What are Alu elements?
- the most abundant transposable elements in the human genome
- primate-specific
- do not occur in exon sequence
How many Alu elements are estimated to be in interspersed throughout the human genome?
over one million
What percentage of the human genome is estimated to consist of Alu elements?
10.7%
How big are Alu elements?
~ 280-300 bp in length
How can Alu elements be detected?
by digestion with restriction endonuclease Alu1
one main resulting band = one prevalent sequence
What is the relationship between Alu elements and mutation?
Alu elements are a common source of mutation in humans, but these are often confined to non-coding regions
What is the structure of an Alu element?
- identical target site duplication (TSD) sites on either side of the Alu dimer
- dimer comprised of two similar but distinct monomers (left and right arms) joined by an A-rich linker
How is the structure of the Alu element believed to have come about?
dimer emerged from the fusion of two distinct monomers over 100 million years ago
How long is the polyA tail?
the length of the polyA tail varies between Alu families
Where are Alu elements thought to be derived from?
Alu elements are thought to be derived from the small cytoplasmic 7SL RNA (the signal recognition particle RNA), a universally conserved ribonucleoprotein that directs the traffic of proteins within the cell and allows them to be secreted.
Is the Alu repeat a SINE or a LINE?
SINE
What does SINE stand for?
short interspersed nuclear element
What does LINE stand for?
long interspersed nuclear element
What does TSD stand for?
target site duplication
Why are the TSD sequences identical?
- the DNA was originally circular, with the TSD sequences hydrogen bonded to one another
- circular sequence cut with restriction enzyme and overhang blunted by the addition of free nucleotides to form two identical sequences, one on each end
What percentage of our genome is made up of LINEs?
17-20%
What is the structure of a typical LINE?
consists of two non-overlapping open reading frames (ORF), which are flanked by UTR and target site duplications
What is encoded by the first open reading frame?
a RNA-binding protein of 500 amino acids that functions as a chaperone
What is encoded by the second open reading frame?
a protein-complex that has endonuclease and reverse transcriptase activity
How do LINEs promote their own transcription?
- promoter for transcription
- reverse transcriptase to copy RNA into DNA
- endonuclease to cleave target DNA
- RNase H for RNA removal (can digest RNA hydrogen bonded to DNA)
Outline the insertion of LINEs into the genome
- transcription of LINE mRNA
- translation of ORF1 and ORF2 proteins, which bind to LINE mRNA at the polyA tail
- mRNA binds to target DNA at AT-rich sequences to form RNA-DNA hybrid
- LINE endonuclease cleaves target DNA
- LINE reverse transcriptase copies LINE mRNA into cDNA
- LINE-encoded RNase H degrades the RNA strand
- second-strand synthesis and repair occurs
What happens to LINEs over time?
- LINEs shorten as they ‘age’
- most are truncated at the 5’ end to remove the promoter
- RNA polymerase cannot bind and truncated LINEs cannot be transcribed
What are some consequences of LINEs being inserted into the human genome?
- disruption of gene transcription
- insertion into a promoter can silence a gene
- insertion into an intron can slow down transcription
What is the consequence of LINEs being inserted into an intron?
- the slowed rate of transcription enables a higher frequency of mutation
- proteins may be synthesised too slowly to meet the demands of the cell
How does LINE insertion differ in humans?
in number and position in the genome
What is the frequency of LINEs in somatic cells and in the germline?
LINEs are rare in somatic cells and more abundant in the germline
How does the cell protect against LINEs?
heavy methylation of LINEs to silence them by preventing the binding of RNA polymerase and other proteins required for transcription
What percentage of cells in our body are microbes?
90%
What percentage of functional genes in our body are microbial?
99%
How do commensal microbes help the body?
- form us eg. help immune system development
- feed us eg. process food, provide vitamins
- protect us eg. fight undesirable pathogenic bacteria
What genetic information is targeted in sequencing the microbiome?
16S rRNA
- must specifically target bacterial RNA
- small subunit of the ribosome
- common to all bacteria
- present in one or more copies
What regions do the bacterial 16S rRNA genes have?
variable (v) regions and conserved regions (common between species
Outline one method of bacteria identification
- isolate DNA from faecal sample
- amplify bacterial 16S rRNA gene with primers that encompass variable regions
- sequence 16S rRNA gene amplified product of variable gene
- data analysis and processing of gene sequence
- taxonomix classification using reference databases
- relative abundance of species within sample
Outline another method of bacteria identification
- isolate DNA from faecal sample
2 sequence all DNA using next generation sequencing method - data analysis and processing of bacterial rRNA gene sequence