Understanding our genome Flashcards
How much of the genome is taken up by protein-coding DNA, non-protein coding DNA, and repetitive DNA respectively?
1.5% of the genome is taken up by exons - protein coding DNA.
25% of the genome is introns - non protein coding DNA
43% of the genome is made up of repetitive DNA
Why is the mRNA that enters the cytoplasm shorter than the nuclear transcript?
When mRNA is transcribed, both exons and introns are transcribed. Before the transcript leaves the nucleus, it is spliced to remove the introns and the 3’ is polyadenylated to form a poly-A tail. This is then known as a processed mRNA and can enter the cytoplasm.
What are the key features of a protein-coding gene?
Protein-coding genes will always include exons that will provide the sequence for the amino acid sequence. The exons/introns will also be flanked by a 5’ flanking sequence, which includes a promoter and an enhancer, and a 3’ flanking sequence, which includes a terminator. These sequences assist RNA Pol II during transcription.
What are the key features of a gene family?
- A set of genes that are similar in sequence and function
- Thought to have arisen by duplication events (paralogs)
- Products from each gene are similar but have different properties due to the acquired mutation in their sequences
Describe the organisation of the human alpha and beta global families
The globin family each consists of several globin subunits that are transcribed sequentially throughout development. For example, the alpha globin family starts with the zeta genes, followed by three theta genes (pseudogenes), then two alpha genes. The zeta gene is only transcribed as an embryo, whereas the alpha subunit is transcribed as a foetus and throughout the rest of life. Similarly, the beta globin family starts out epsilon (transcribed at embryo); two gamma genes (fetus); theta gene (pseudo) and then delta and beta subunits (adult).
The theta subunits are thought to have been functional before but acquired too many mutations and now is only a pseudogene. It is transcribed into mRNA but there is no protein translated.
The globin genes are also transcribed at different levels throughout different tissues during development.
Why are so many histones needed and why are they important for DNA?
The histone gene family consist of five subunits that make up the histone. There are two of each H2A, H2B, H3 and H4 subunits, making up the nucleosome octamer, which the DNA wraps around. H1 protein sits on top to secure the DNA.
This family is encoded throughout the genome many times ~60 times. All the histone families are identical and highly conserved due to selective pressure. They are essential for the formation of the nucleosome and for DNA packaging. This increases transcription as many copies of mRNA are being produces. Histone mRNA does not contain any introns and is not polyadenylated as the proteins need to be made rapidly, especially during DNA replication.
What are the 3 possible outcomes from a duplication event?
1: The duplicate is inserted into the chromosome and functions the same as the original gene, most likely due to selective pressure.
2: The duplicate acquires a mutation that leads it to now have a similar but distinct structure and function.
3: The duplicate acquires a mutation and can no longer function, producing a pseudogene.
Why are the globin family pseudogenes no longer functional genes?
They would have appeared in the genome due to duplication. However, over time, they may have acquired a significant amount of mutations in the base sequence that meant the gene was still able to be transcribed but the mRNA would not produce any functional protein.
What is an unitary pseudogene, and how does it arise? Give an example.
A unitary pseudogene is one that is not a part of a gene family. It is a single gene that has acquired mutations and is no longer functional. However, it is allowed to exist because whatever was encoded for on the gene is not necessary in the organism.
For example, humans have a pseudogene for L-gulono-gamma-lactone-oxidase, which means that humans cannot synthesise ascorbic acid (Vitamin C). However, this mutation is tolerated because we can supplement with vitamin C in our diet.
What is a processed pseudogene, and how does it arise? Give an example.
A processed pseudogene is one in which a cDNA of a processed mRNA has somehow been inserted into the genome. It is thought that this would have occurred due to viral infections that have introduced viral reverse transcriptase into the organism, which RT’ed mRNA into cDNA and re-integrated this into the chromosome. These are seen in any location of the chromosome, irrespective of the original gene, as it is a random event.
These are easy to spot because they will have a length of A-rich region downstream of the pseudogene. It also does not have a promoter and so cannot be transcribed.
There are several types of genome repeats, depending on their size, what are they called?
LINES - long interspersed nuclear elements which are about 650 Mbp
SINES - short interspersed nuclear elements which are about 400 Mbp
LTR elements - long terminal repeat elements which are about 250 Mbp
DNA Transposons - about 100 Mbp
Describe the characteristic features of the L1 element present in the human genome.
L1 (LINES) elements made up about 20% of our genome. They are unique in that they are the only autonomously active gene in the whole genome - this gives them the ability to replicate and re-insert themselves into the genome. L1 gives the genome plasticity as it is a dynamic force. Therefore, this high level of recombination means that an individual’s L1 pattern is distinctive.
They consist of a promoter at the 5’ end, ORF1, ORF2 and a polyA tail. The ORF1 encodes a chaperone that is able to bind both RNA and DNA. ORF2 encodes a polyproteins with reverse transcriptase and endonuclease activity.
What is target-site-primed reverse transcriptase and how does it occur?
This is the process by which LINES replicate and re-insert themselves into the nuclear genome. LINE region is transcribed into a mRNA, which travels into the cytoplasm to get translated. The two protein products, ORF1 and ORF2 proteins, are formed. ORF1 is a chaperone protein so it binds to the ORF2 protein and the LINE mRNA, in order to escort these back into the nucleus. Once in the nucleus, the mRNA finds a target site, which is T-rich and complementary to its polyA tail. ORF2 uses its endonuclease activity to cut the DNA so that it can anneal with the mRNA. The 3’ cut end of the DNA acts as a primer for ORF2 to use its reverse transcriptase activity to synthesise a new strand of cDNA using the mRNA as a template. Human RNase H then degrades the mRNA strand by cleaving the phosphodiester bonds in the RNA strand. Then a complementary DNA strand is made using the cDNA as a template and this dsDNA is then inserted into the genome by DNA ligase.
Can LINES be harmful to the individual?
They can be harmful when the LINES are inserted into areas of coding DNA. For instance, if a LINE is inserted and disrupts a promoter sequence that then means a gene won’t be transcribed. Or it may insert into the exon itself. It can cause diseases, such as haemophilia.
What are some reasons that LINES may not be able to autonomously replicate?
LINES may be truncated, due to not all of the gene being transcribed before insertion into the chromosome. If it is missing the sequence of its 5’ end, it cannot replicate as it will have lost its promoter.
LINES transposition can be inhibited, especially in somatic cells, by heavy methylation which silences a gene.