Genome structure Flashcards
Describe the structure of DNA.
Nitrogenous Base
>ring (single or double) structure composed of carbon and nitrogen
Pentose Sugar >5 carbons form cyclical structure with O bridge between C1 and C4 >nitrogenous base joined to C1 >phosphate group joined to C5 >hydroxyl group joined to C3
Phosphate Group
>attached to C5 of pentose sugar via ester bond (phosphodiester bond)
Which direction is the DNA sequence read?
From the 5’ to the 3’ end
How are the bases arranged?
bases are stacked, and there are 2 grooves, major and minor grooves
How does 2 metres of DNA fit into an average cell of only 50um in diameter?
- DNA is bound to histone proteins to form nucleosomes (contains 8 histones, 146 bp)
- Nucleosomes fold up to form chromatin fibre
- Chromatin fibre them forms loops
- Chromatin fibres are compressed and folded further to form the most condensed form of DNA, the chromosome
What is a chromatosome?
Consists of a nucleosome and an H1 histone protein
*H1 binds to linker DNA between nucleosomes to join them
Describe the relationship between Tightness of DNA packing and expression.
DNA tightly packed= less accessible= less gene expression
DNA more loosely packed= more accessible= more gene expression
What are the different types of chromosomes?
Metacentric: centromere in middle and chromosome arms almost equal
Submetacentric: centromere located sub-median resulting in slightly unequal lengths of chromosomal arms
Acrocentric Chromosome: centromere located quite near the end of the chromosome
What is the relationship between genome size and the complexity of an organism?
here is a trend for simpler organisms to have fewer genes, HOWEVER genome size is not strongly related to complexity of an organism
What is the primary DNA sequence?
encodes all gene products necessary for an organism and a large number of regulatory signals (not in encoding parts)
much of the DNA sequence does not have an assigned function
What is an exome?
sequence of DNA that codes for a protein and thus determines a trait
all of the DNA that is transcribed into RNA plus all of the cis-linked (local) control regions that are required to ensure quantitatively appropriate tissue-specific expression of the final protein
a gene is not just the bits that encode the final protein, regulation of the gene is very important
What do intergenic regions contain?
sequences of no known function, such as repetitive DNA, endogenous retroviruses, pseudogenes
may contain regulatory elements
What is a gene cluster?
Two or more genes that code for the same or similar products and are found close to each other in the genome
When do genes cluster?
Genes often cluster in families e.g. globin clusters:
- allows for coordinated gene regulation
- may just reflect evolutionary history
Describe the gene structure.
- not all exons are coding sequences
- never an intron after the last exon
- promoter end contains signals which promote gene expression (CAAT and TATA box), giving signal to start transcription
- untranslated regions are part of exons, but don’t code for anything
What are introns?
- vary in number (0-311); histones don’t have any introns
- vary in size (30bp-1Mbp)
- can contain other genes
- purpose not known, but splicing important part of this
What is the function of the promoter region?
Recruit RNA polymerase to DNA template and binds asymmetrically to move in 5’ to 3’ direction
Binds transcription factors to regulate gene expression
Describe the process of transcription.
1) Promoter recruits RNA polymerase to DNA template
2) DNA helix locally unwinds
3) RNA synthesis begins in 5’ to 3’ direction
4) RNA polymerase moves along (elongation)
5) RNA polymerase reaches termination signal and stops
6) RNA polymerase dissociates, carrying along with it the pre-mRNA strand
What Regulatory regions that are not in the promoter which affect gene expression?
These regulatory regions are position independent:
Enhancer sequences
> short sequences in gene or many kilobases distant which are targets for transcription factors (activators) to up-regulate gene expression
Silencer sequences
> targets for transcription factors (repressors) to down-regulate gene expression
Insulator sequences
> short sequences that act to prevent enhancers/silencers influencing other genes outside of the genes they are meant to target
Descrive the process of Modification of pre-mRNA to matura RNA.
Eukaryotic mRNA is extensively modified before it reaches the cytoplasm:
Capping
Polyadenylation
Splicing
What is capping?
Methylated guanosine cap added to the 5’ end of eukaryotic RNAs.
The 5’ cap protects the mRNA from degradation and is required for translation in the cytoplasm.
What is splicing?
Spliceosome interacts with 3’ end of an exon and 5’ end of the next and breaks the link between the intron to the first exon and then the link to the second exon
Spliced exons are joined together and intron becomes lariat structure
Why is splicing needed?
so other proteins (EXON junction complex proteins & TREX complex) target the spliced mRNA for export from the nucleus towards the endoplasmic reticulum/ribosomes for translation into protein
What is polyadenylation?
the addition of multiple adenine nucleotides to the 3’ end of a newly synthesised mRNA molecule
What proteins are involved in polyadenylation?
- CPSF (Cleavage and Polyadenylation Stimulating Factor) recognises polyadenylation signal (PAS) which is AAUAAA at end of RNA molecule.
- CSTF (Cleavage Stimulating Factor) recognises GU-rich downstream elements (DSE)
- PolyA Polymerase (PAP) recruited and adds multiple A bases after cleavage site
- PolyA Binding Protein (PAB) will bind polyA tail and further protects it from degradation
- Other proteins required such as CFIm (Cleavage Factor Im), CFIIm and Simplekin
What are the different splicing patterns?
exons can be skipped/added so variations of a protein (called isoforms) can be produced from the same gene
a single gene can therefore encode for many different proteins, just from splicing, causing for an increase in genome variation
How is DNA organised most of the time?
not organised into chromosomes
in somatic cells, nuclear DNA is arranged in domains identified by using Hi-C (detects sequences in close proximity)
involves CTCF protein and Cohesin protein complex, as well as transcription machinery