Genome Structure Flashcards
Describe the basic structure of DNA
• DNA is deoxyribonucleic acid
• It is a macromolecule consisting of a linear strand of nucleotides
• Single linear strands bind to complementary strands to form double-stranded DNA
Two antiparallel strands of DNA
Bases “stacked”
Two grooves
• Major
• Minor
Describe the charge of a DNA molecule
Negatively charged because of the phosphates
What type of molecule is a single strand of DNA
linear macromolecule
How many base pairs is the human genome and how many genes are there in a human?
- Human genome is 3 x 109 base pairs – 3Gbp
* It contains ~20 000 genes.
What is the main issue with DNA?
DNA is very lengthy:
There is around 2m of DNA in a nucleated cell
• 37.2 trillion cells in your body
• That is 7.44x1013 metres of DNA
So how do we fit DNA into nucleated cells?
The solution is Histones
Describe the charge of histones
How many histones form a nucleosome?
What binds the linker DNA
Basic positively charged that binds to negative DNA
Eight histones 2x(H2A+H2B+H3+H4) form the nucleosome
Histone 1
Have a look at the image of X-ray crystallography of DNA around histones
On image
What are the phases of DNA packing?
- DNA double helix
- Nucleosomes
- Chromatin fibre
- Extended section of chromosome
- Loops of chromatin fibre
- Metaphase chromosome
Descirbe the structure of a chromosome
On image
What is a human karotype and what does it show?
- Stained chromosomes, nucleus in metaphase
* Banding patterns
Define genome (3)
- The primary DNA sequence encodes all the gene products necessary for a human
- The primary DNA sequence also includes a large number of regulatory signals
- Much of the DNA sequence does not have an assigned function as yet
Define exon (there are two)
• The exome is made up of gene sequences
Coding regions of DNA
• Some definitions use all of the coding sequences (~37 Mbp – 1.2% of genome)
• Some definitions use all of the gene sequences (~60Mbp – 2% of genome)
Define gene and describe the structure of a gene
- All of the DNA that is transcribed into RNA plus all of the cis-linked (local) control regions that are required to ensure quantitatively appropriate tissue-specific expression of the final protein
- It is NOT just the bits that encode the final protein, regulation of the gene is very important.
On image
Where are the size of the genes globin and dystrophin ?
globin = 1.8kb, dystrophin = 2.4Mb
What are intergenic regions?
Intergenic regions contain sequences of no known function, such as repetitive DNA, endogenous retroviruses, pseudogenes. They may contain many regulatory elements.
How do genes tend to exist and what does this allow?
Genes often cluster in families – e.g. globin clusters
- allows for co-ordinated gene regulation
- may just reflect evolutionary history
What is an intron?
How many are there and what is the size of them?
Non-coding regions of DNA
- Vary in number – from 0 to at least 311
- Vary in size - 30bp to 1Mbp
Describe the structure of a gene
On image
What are the two major regions that the promoter region contains and what are there functions?
- Regulatory region - needed to regulate the recruitment of RNA polymerase 2
- TATA box - needed to recruit general transcription factors and RNA polymerase
3 general functions of the promoter region
- Promoters recruit RNA polymerase to a DNA template
- RNA polymerase binds asymmetrically and can only move 5’ to 3’
- Regulation occurs via transcription factors
What are the purpose of enhancers?
• Enhancers upregulate gene expression – they are short sequences that can be in the gene or many kilobases distant. They are targets for transcription factors (activators).
What are the purpose of silencers?
• Silencers downregulate gene expression. They are also position-independent and are also targets for transcription factors (repressors).
What are insulators?
• Insulators are short sequences that act to prevent enhancers/silencers influencing other genes
What is transcription catalysed by?
In what direction does it catalyse the reaction?
Does it trascribe coding AND non-coding regions?
- Messenger RNA synthesis (transcription) is catalysed by RNA Polymerase II
- Transcribes in 5’ to 3’ direction
- Transcribes everything after the transcription start site (exons and introns)
- mRNA is post-transcriptionally modified
What is the purpose of RNA polymerase 2 and look at its structure?
RNA polymerase II recognises promoters efficiently with the assistance of many other transcription factors
What are the 7 stages of transcription?
- DNA
- RNA polymerase recruited (closed complex)
- DNA helix is locally unwound (open complex)
- RNA synthesis begins
- Elongation
- Termination
- RNA polymerase dissociates
What does transcription produce?
What does post-transcriptional modification produce?
pre-mRNA
mRNA
What 3 process form post-transcriptional modification?
- Capped at 5’ end
- Spliced - introns removed
- Polyadenylated at 3’ end
Describe how a 5’ capped is formed?
Whats the purpose of the 5’ cap?
After 25-30nts are synthesised, a methylated cap is added to the 5’ end by three enzyme activities:
• RNA 5’-triphosphatase
• Guanylyltransferase
• N7G-methyltransferase
The first two activities are carried out by a bifunctional capping enzyme (CE)
RNA Pol II is also required
Makes it resistant to digestion by enzymes within the cell
How is the 3’ Poly A tail formed?
What’s the purpose?
CPSF (Cleavage and Polyadenylation Stimulating Factor) recognises the PAS (Polyadenylation signal) and acts on cleavage site
• CSTF (Cleavage Stimulating Factor) recognises GU-rich Downstream Elements (DSE)
• PAP (Poly-A polymerase) is recruited and adds multiple A bases after cleavage site
• PAB is Poly-A Binding Protein. Other proteins appear to be required for this process – CFIm (Cleavage Factor Im), CFIIm and Simplekin
- The 3’ poly A tail
o Around 250 As are added
o Protects the end from degradation
o Also help target messages out of the nucleus
Describe the process of splicing
What is it?
What catalyses it?
What happens in the process?
Splicing of introns (removal of introns)
• The spliceosome catalyses this reaction
• The spliceosome (150 proteins) catalyses the OH group to the P group
• This frees up the Oh group, linked to phosphate
• 2’ to 5’ linkage
• Its then spliced out and is known as the lariat
What happens after splicing?
splicing targets mRNAs for nuclear export:
Once they are spliced, TRanscription-Export (recruited by Exon Junction Complex) which helps to export this mRNA out of the nucleus. This allows targeting of the MRNA to the ER.
Give an outline of the whole process
On image
What is Alternative splicing?
- We can choose to splice out different exons as well as introns and so can produce new proteins as a result.
- We can skip or even add exons to produce variations of a protein from the same gene (isoforms)
How do we get protein variation?
On image
How is DNA organised in somatic cells and how is it identified?
What did it involve and require?
- In somatic cells the nuclear DNA is arranged non-randomly
- Organisation has been identified using Hi-C (detects genomic DNA sequences in close proximity) and high-throughput microscopy
- Involves CTCF protein and Cohesin protein complex, as well as transcription machinery
What are the 2 compartments that the genome is found in?
- Compartment A – transcriptionally active with active histone modifications
- Compartment B – transcriptionally repressed with repressive histone modifications
These are interspersed throughout the 2D sequence but the same compartment types are brought close together in the 3D genome
What are Topologically-Associated Domains (TADs)?
What are they separated by?
- Individual compartments are made up of several non-interacting sub-compartments
- These are Topologically-Associated Domains (TADs)
- They are usually separated by the Transcriptional Repressor CTCF protein
Have a look at transcriptional control and genome structure (3D)
Last page of lecture