Understanding our genome Flashcards

1
Q

How much of the genome is taken up by protein-coding DNA, non-protein coding DNA, and repetitive DNA respectively?

A

1.5% of the genome is taken up by exons - protein coding DNA.
25% of the genome is introns - non protein coding DNA
43% of the genome is made up of repetitive DNA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why is the mRNA that enters the cytoplasm shorter than the nuclear transcript?

A

When mRNA is transcribed, both exons and introns are transcribed. Before the transcript leaves the nucleus, it is spliced to remove the introns and the 3’ is polyadenylated to form a poly-A tail. This is then known as a processed mRNA and can enter the cytoplasm.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the key features of a protein-coding gene?

A

Protein-coding genes will always include exons that will provide the sequence for the amino acid sequence. The exons/introns will also be flanked by a 5’ flanking sequence, which includes a promoter and an enhancer, and a 3’ flanking sequence, which includes a terminator. These sequences assist RNA Pol II during transcription.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the key features of a gene family?

A
  • A set of genes that are similar in sequence and function
  • Thought to have arisen by duplication events (paralogs)
  • Products from each gene are similar but have different properties due to the acquired mutation in their sequences
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Describe the organisation of the human alpha and beta global families

A

The globin family each consists of several globin subunits that are transcribed sequentially throughout development. For example, the alpha globin family starts with the zeta genes, followed by three theta genes (pseudogenes), then two alpha genes. The zeta gene is only transcribed as an embryo, whereas the alpha subunit is transcribed as a foetus and throughout the rest of life. Similarly, the beta globin family starts out epsilon (transcribed at embryo); two gamma genes (fetus); theta gene (pseudo) and then delta and beta subunits (adult).

The theta subunits are thought to have been functional before but acquired too many mutations and now is only a pseudogene. It is transcribed into mRNA but there is no protein translated.

The globin genes are also transcribed at different levels throughout different tissues during development.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Why are so many histones needed and why are they important for DNA?

A

The histone gene family consist of five subunits that make up the histone. There are two of each H2A, H2B, H3 and H4 subunits, making up the nucleosome octamer, which the DNA wraps around. H1 protein sits on top to secure the DNA.

This family is encoded throughout the genome many times ~60 times. All the histone families are identical and highly conserved due to selective pressure. They are essential for the formation of the nucleosome and for DNA packaging. This increases transcription as many copies of mRNA are being produces. Histone mRNA does not contain any introns and is not polyadenylated as the proteins need to be made rapidly, especially during DNA replication.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the 3 possible outcomes from a duplication event?

A

1: The duplicate is inserted into the chromosome and functions the same as the original gene, most likely due to selective pressure.
2: The duplicate acquires a mutation that leads it to now have a similar but distinct structure and function.
3: The duplicate acquires a mutation and can no longer function, producing a pseudogene.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Why are the globin family pseudogenes no longer functional genes?

A

They would have appeared in the genome due to duplication. However, over time, they may have acquired a significant amount of mutations in the base sequence that meant the gene was still able to be transcribed but the mRNA would not produce any functional protein.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is an unitary pseudogene, and how does it arise? Give an example.

A

A unitary pseudogene is one that is not a part of a gene family. It is a single gene that has acquired mutations and is no longer functional. However, it is allowed to exist because whatever was encoded for on the gene is not necessary in the organism.

For example, humans have a pseudogene for L-gulono-gamma-lactone-oxidase, which means that humans cannot synthesise ascorbic acid (Vitamin C). However, this mutation is tolerated because we can supplement with vitamin C in our diet.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a processed pseudogene, and how does it arise? Give an example.

A

A processed pseudogene is one in which a cDNA of a processed mRNA has somehow been inserted into the genome. It is thought that this would have occurred due to viral infections that have introduced viral reverse transcriptase into the organism, which RT’ed mRNA into cDNA and re-integrated this into the chromosome. These are seen in any location of the chromosome, irrespective of the original gene, as it is a random event.

These are easy to spot because they will have a length of A-rich region downstream of the pseudogene. It also does not have a promoter and so cannot be transcribed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

There are several types of genome repeats, depending on their size, what are they called?

A

LINES - long interspersed nuclear elements which are about 650 Mbp

SINES - short interspersed nuclear elements which are about 400 Mbp

LTR elements - long terminal repeat elements which are about 250 Mbp

DNA Transposons - about 100 Mbp

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Describe the characteristic features of the L1 element present in the human genome.

A

L1 (LINES) elements made up about 20% of our genome. They are unique in that they are the only autonomously active gene in the whole genome - this gives them the ability to replicate and re-insert themselves into the genome. L1 gives the genome plasticity as it is a dynamic force. Therefore, this high level of recombination means that an individual’s L1 pattern is distinctive.

They consist of a promoter at the 5’ end, ORF1, ORF2 and a polyA tail. The ORF1 encodes a chaperone that is able to bind both RNA and DNA. ORF2 encodes a polyproteins with reverse transcriptase and endonuclease activity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is target-site-primed reverse transcriptase and how does it occur?

A

This is the process by which LINES replicate and re-insert themselves into the nuclear genome. LINE region is transcribed into a mRNA, which travels into the cytoplasm to get translated. The two protein products, ORF1 and ORF2 proteins, are formed. ORF1 is a chaperone protein so it binds to the ORF2 protein and the LINE mRNA, in order to escort these back into the nucleus. Once in the nucleus, the mRNA finds a target site, which is T-rich and complementary to its polyA tail. ORF2 uses its endonuclease activity to cut the DNA so that it can anneal with the mRNA. The 3’ cut end of the DNA acts as a primer for ORF2 to use its reverse transcriptase activity to synthesise a new strand of cDNA using the mRNA as a template. Human RNase H then degrades the mRNA strand by cleaving the phosphodiester bonds in the RNA strand. Then a complementary DNA strand is made using the cDNA as a template and this dsDNA is then inserted into the genome by DNA ligase.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Can LINES be harmful to the individual?

A

They can be harmful when the LINES are inserted into areas of coding DNA. For instance, if a LINE is inserted and disrupts a promoter sequence that then means a gene won’t be transcribed. Or it may insert into the exon itself. It can cause diseases, such as haemophilia.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are some reasons that LINES may not be able to autonomously replicate?

A

LINES may be truncated, due to not all of the gene being transcribed before insertion into the chromosome. If it is missing the sequence of its 5’ end, it cannot replicate as it will have lost its promoter.

LINES transposition can be inhibited, especially in somatic cells, by heavy methylation which silences a gene.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Describe the characteristic features of an Alu1 sequence.

A

Alu family is the most abundant of the SINES. Alu1 gene consists of a transcription start site, two promoters for RNA Pol III, and ends with a poly A tail. The two promoters are internal promoter (downstream of transcription start). RNA Pol III is specifically used to transcribe internal promoter genes.

17
Q

Why is Alu1 not capable of being autonomously active?

A

it does not contain the base sequence for reverse transcriptase and must use the one produced by L1 elements. The process of integration is thought to be similar because of this.

18
Q

Why might mutations in the CYP2D6 gene cause adverse drug reactions?

A

CYP family is one of the biggest families in the genome. It generally codes for enzymes involved in drug metabolism, drug activation or excretion. In the population, there are a wide range of different mutations that have been found in CYP2D6, with even some people not having it or having multiple.

CYP2D6 is responsible for converting codeine into morphine. Those without CYP2D6 may not be able to convert it into morphine and therefore, not feel any pain-killing effects. On the other hand, people with multiple CYP2D6 gene (ultra rapid metabolisers) could metabolise the codeine too quickly and produce an exaggerated morphine effect, which may result in overdosing.

19
Q

How might knowledge of an individual’s genomic information be used to improve drug regimes and health outcomes?

A

Depending on the individual’s genome, they will metabolise and excrete drugs at different rates. Knowing this, drug regimes can be adapted to a concentration that is going to be most effective for that individual. Or if there is a serious mutation, an alternative drug/treatment could be used to improve health outcomes.

20
Q

What are the key features of the mitochondrial genome?

A

The miDNA is about 16.6kb big and encodes 37 genes. It has a small size, but there are many copies of the genome within one mitochondrion. miDNA is only inherited by your maternal line. It encodes 2 rRNA, 22 tRNA and 13 polypeptide coding genes.

The genome is circular and consists of genes encoded on both sides. The inner strand (L strand) encodes L-gene, with a total of 9 genes, and it is C rich. The outer strand (H strand) encodes a total of 28 genes - H-genes. The H strand is G rich. The genes do not overlap and contain no introns.

There are a few differences in the translation code. E.g. mitochondrial mRNA would use UGA for aa Trp whereas the nuclear genome would use it for a stop codon.

miDNA is more prone to mutation because it is not protected by histones. The oxidative phosphorylation that occurs in the mitochondria expose the DNA to more reactive oxygen.

21
Q

What are the roles that commensal microbes play within our bodies?

A

Commensal microbes (‘normal’ microbes) are important within our bodies for our immune system to develop; to process ingested food and provide some vitamins that are excreted from the microbes to us; and to fight against undesirable pathogenic bacteria.

Unbalanced microbiomes have been associated with obesity, immunity, mental health issues and response to drug treatments.

22
Q

What are the two methods used to analyse an individual’s microbiome?

A

One method employs PCR using a fecal sample from the invidual. We can identify different species of bacteria through amplification of the 16S rRNA gene (ribosomal subunit) because all bacteria use 16S rRNA and contain conserved gene modules but each species will have differing gene modules that can be identified. The complementary sequence of V4 module is used as a primer because it is highly conserved. This can present a problem if unknown species and its DNA appear, or missed microbes that may not have the conserved region.

The other method uses a metagenomic approach, again using fecal samples. This will contain both bacterial and human DNA. All the DNA in the sample is sequenced and bioinformatics is used to analyse the sequences of the fragments. This way no DNA is missed, but it does require more time and money.

23
Q

Why does studying a cell or tissues’ transcriptome help us in research?

A

The transcriptome is the collection of mRNA that is present in cells. By detecting mRNA in the cells, it is possible to see what is being transcribed and expressed as proteins at different times/conditions/in different tissues.

24
Q

Briefly describe the process of in situ hybridisation

A

In situ hybridisation is where an antisense RNA that is complementary to the target RNA is expose to the cell/tissue. This antisense RNA then hydrogen bonds to the target RNA inside the cells.

To make the antisense RNA probe, what is needed is a plasmid containing the gene that encodes the target RNA. The plasmid can be adjusted to ensure that the antisense RNA is produced, as opposed to the usual sense RNA, by moving the promoter and this is then used to transcribe the RNA. Bacteriophage RNA Pol is usually used to transcribe the probes. This is then amplified by PCR. UTP is used because it is radioactive and allows us to visualise the probe when it is in situ.

A control of sense RNA can be used to ensure the results are reliable. Because the sense RNA will have the same sequence as the target RNA, it will not hybridise and will not produce a signal.

25
Q

What are the variety of methods that can be used to look at the transcriptome and how do they differ in the information that they can give about it?

A

In situ hybridisation uses a labelled antisense RNA probe that hybridises with the RNA in the cell and can be visualised. It can be used to look at a single RNA at a time, in either cellular and tissue expression.

Northern blot, similar to Southern blot, uses gel electrophoresis to separate out RNA by size and then uses a labelled probe to identify the RNA. It is similar in looking at a single RNA, usually within a tissue, but it can also determine the size of the mRNA because it is run through gel electrophoresis.

Quantitative Real Time PCR (QRT-PCR) can be used on single RNA but it can be quantitative, so the resulting quantity of RNA can be analysed to find the original quantity of the RNA.

Microarrays allows for simultaneous analysis of multiple RNAs present in a cell, but it is not quantitative.

RNA sequencing allows for simultaneous analysis of multiple RNA and is quantitative.

26
Q

How can a microarray be used to compare similarities and differences between two mRNA populations?

A

Microarrays analyse the complete transcriptome from each cell because each chip contains a different known sequence of oligonucleotides, usually around 25mer ssDNA. These therefore can hybridise with any complementary sequence that may be found in the transcriptome. All of the mRNA from the cells are converted into fluorescently labelled cDNA before using the microarray by reverse transcriptase. The two populations of mRNA will be given different colours e.g. green and red. When this is added to the microarray chips, the fluorescence given off by each chip can be detected. If the chip is completely one colour e.g. green, it shows that that mRNA is only present in that population. If the fluorescence detected is a mixture of the two colours, e.g. yellow, then it shows that both populations express that mRNA.

27
Q

How might microarrays be useful for therapeutic and diagnostic purposes?

A

Microarrays are used for cancer-typing, which is where the gene expression profiles can be determined to identify the cancer stage/type and predict the clinical outcome of patients. It may help improve outcomes by informing choice of appropriate therapeutic strategies.

28
Q

How does oligo dT chromatography help to separate mRNAs from rRNAs and tRNAs?

A

The matrix of this chromatography technique is made up with polyT oligonucleotides, in order to hybridise with the polA tails of all the processed mRNA present in the sample.

29
Q

How does oligo dT chromatography help to separate mRNAs from rRNAs and tRNAs?

A

The matrix of this chromatography technique is made up with polyT oligonucleotides, in order to hybridise with the polA tails of all the processed mRNA present in the sample. This then allows for the analysis of all of the mRNA that is present in a cell/tissue. mRNA can then be eluted using low salt concentrations, which would break the A=T bonds.

This does not work for rRNA or tRNA because they do not get processed to contain polyA tails.

30
Q

Describe the key steps in RNA seq that are needed to prepare the DNA for sequencing.

A

The processed mRNA is purified from the cell/tissue sample by oligo dT chromatography. The RNA is then fragmented into smaller ribonucleotides (~100-200 bp), which will help in the amplification later. A variety of different primers are used to convert RNA fragments into ssDNA, including oligo dT for the polyA tails to bind to, and random hexamer sequences for the RNA fragments that don’t have polyA tails.

With this sample of RNA-DNA strand, cDNA is produced using ribonuclease H to cleave the phosphodiester bonds in the RNA strand. DNA Pol then can remove the ribonucleic acid residues and replace it with DNA to form the double stranded cDNA.

Lastly, adaptor sequences are added to both sides of the fragment, to aid amplification and sequencing. This is so that the primers used for PCR can be designed to be complementary to the adaptors (and then used for priming in sequencing as well)

31
Q

Describe how this cDNA, synthesised from the mRNA sample, can be used to quantitatively analyse the expression of mRNA in a cell/tissue by RNA sequencing.

A

The cDNA is sequenced and the analysis of the resulting sequences can be done via bioinformatics. It compares this sample genome to a reference genome. The more fragments of the gene appear, the more abundant the mRNA for the gene is within the cell.

This is an useful method because it can be used to compare wild-type and knock-out samples. It also looks at the entire genome so as well as getting the answer about the specific target you were looking for, you can also find effects that may have been unknown or unpredicted.