Human Genome Flashcards
What is the C-value paradox?
Genome size is not always proportional to the complexity of the organism.
To a certain extent, species complexity is linked to the genome size but is a loose relationship, so:
- genome size is not always proportional to complexity of organism
- similar organisms may have greatly differing genome sizes (due to differences in amount of repetitive DNA)
- there is a correlation between minimum genome size for a class of organism and complexity
What are retroelements?
Retroelements are DNA elements that arise by reverse transcription
What has the smallest known genome of vertebrates?
Fungu (pufferfish) has smallest known genome size of vertebrates with it being 390 millions of base pairs.
What are/were the aims of the human genome project and its successors?
- generation of complete sequence of human genome
- encode project: identification of all functional human DNA sequences e.g. Genes, splice junctions, promoter and enhancer sequences etc.
- 1000 genomes project: mapping of sites where genome sequence varies between individuals
- cancer genome atlas and cancer genome project: identification of genomic alterations associated with cancer
What are the differences between heterochromatin and euchromatin?
Heterochromatin DNA is very tightly condensed and therefore is assumed to be non functioning, non coding, largely repetitive DNA - enzymes can’t get to it
Euchromatic DNA is potentially transcribable DNA which is less condensed.
Where can repetitive sequences occur?
Repetitive sequences can occur within genes as well as intergenic DNA - it is very common to see repetitive sequences in the introns of genes
What makes up intergenic DNA?
Intergenic DNA is made up from pseudogenes, structural DNA sequences and repetitive DNA
Pseudogenes - genes which were once active in our evolutionary past but have gone out of use
Structural DNA sequences - sequences needed to maintain chromosome integrity
What percentage of the genome do exons, introns and repetitive DNA make up?
Exon’s make up 2.9% of the genome with only 1.2% of this coding and the other 1.7% being non-coding
Introns make up 36.6% of the genome
Repetitive DNA makes up 45% of the genome with this being split into interspersed repetitive DNA and tandem repetitive DNA
What is interspersed repetitive DNA?
Interspersed repetitive DNA is made up of;
- DNA transposons
- LINEs - long interspersed nuclear elements
- SINEs - short interspersed nuclear elements
- Endogenous retroviruses
Interspersed repetitive DNA is derived from transposons which are mobile genetic elements which can move to new locations within the genome of the cell
What are the two types of transposition?
Copy and paste - e.g. Have a transposon on chromosome 1 and makes a copy which inserts itself in chromosome 2. Old or newly inserted transposon acquire mutations which make it no longer able to transpose.
Cut and paste - e.g. Have a transposon that was in chromosome 1 which uproots and inserts itself into chromosome 2
Which of the LINESs is active?
LINEs represent 21% of the genome (870,000 LINEs in genome split into 3 families)
LINE-1 elements are potentially active with about 500,000 of these in our genomes
LINE-2 and LINE-3 elements are least abundant and are inactive
What do LINEs encode?
LINE-1 encodes two separate proteins as it has two open reading frames.
- One open reading frame encodes for transposition RNA binding protein.
- The other open reading frame encodes large protein which has both endonuclease activity and reverse transcriptase activity.
What is the difference between direct repeat and inverted repeat flanking sequences?
Direct repeat - same sequence repeating itself on the same strand
Inverted strand - sequence in 5’ to 3’ on one strand that is also present on the lower strand in 5’ to 3’.
SINEs and LINEs are flanked by direct repeats
How are LINEs transposed?
- LINE-1 DNA sequence are transcribed into LINE-1 mRNA
- LINE-1 encodes RNA binding protein and endonuclease/reverse transcriptase protein
- RNA binding protein and endonuclease/reverse transcriptase protein bind to LINE-1
- Complex of LINE-1 RNA protein complex enters nucleus
- LINE-1 endonuclease cuts the target site (few Ts followed by a few As
- LINE-1 has stretch of A residues at 3’ end which allows it to base pair with target site
- Reverse transcriptase associated with mRNA reverse transcribes makes a DNA copy of it
- now have DNA-RNA duplex, RNA is replaced with DNA
What are SINEs?
SINEs are short interspersed nuclear elements which constitute approximately 13% of the human genome and are split into 3 families:
- Alu - only SINE active in transposition
- MIR
- MIR3
SINEs transpose by the same mechanism as LINEs despite having no capacity to make a protein and so they use the LINE-1 proteins.
By what mechanisms can SINEs and LINEs lead to mutations?
1) a transposition that leads to a LINE-1 or Alu sequence disrupting gene - Transpsotion interrupts gene
2) deletions can be caused by recombination between two nearby Alu or LINE elements. They ,ah arise form unequal crossing over between repeats
- the Alu sequence before gene has become aligned with the Alu sequence after the gene. The two are slightly out of register but recombination can occur because there is virtually identical sequences, there is crossing over, unequal crossing over means loss of gene in a gamete
What is a gene?
A gene is a union of genomic sequences encoding a coherent set of potentially overlapping functional products - encode project definition
The set of DNA sequences required to encode a particular functional product or a set of products which overlap in sequence - Glenn’s definition