Lecture 1 Organisation of the Human Genome Flashcards
What are the different types of DNA?
Nuclear, mitochondrial, bacterial and viral
What are the dimensions of nuclear DNA?
10bp per turn, with 3.4 nm per turn and 2.37 diameter
How many genomes are present in a cell?
2 genomes
How many base pairs are there in the nuclear genome and corresponding genes?
3 x 10^9, (3-3.3 billion) corresponding to 30,000-20,000 genes
How much does mitochondrial DNA account for in the genome?
0.001%
How many base pairs are there in the mitochondrial genome? How many genes and how many proteins?
16,569 bp,
37 genes
13 involved in respiratory chain (protein coding)
24 non-coding RNAs (2 rRNAs, 22 tRNAs
Does the mitochondrial genome contain introns?
No
What is a karyotype?
The characteristic number and size of chromosomes examined through staining chromosomes during metaphase.
What is the composition of the human genome?
3200 Mb, 2000Mb is intergenic, 1200 is gene relates sequences.
What are LINES?
Long INterspersed repeat Elements;
6-8kb in length, can copy themselves into the genome. They encode proteins required for their integration into the genome: 3 Line families (LIINE1/2/3) however only LINE1 is transcriptionally active.
What are SINES?
Repeat elements shorter than 6kb in length that have lost the ability for retrotransposition.
What are the SINE families?
ALUs and MIRs; only ALUs are active.
Can ALUs be found throughout the animal kingdom?
No, they are only present in primates, with 1 approx every 3 kb. They also have their own subfamilies and are over 80 million years old - molecular clocks
What percentage of the human genome is transposable?
Approx 50%; however they do not move much anymore
What are LTRs?
Long Terminal Repeat elements, related to viruses. Originate from retrotransposons, related to retroviruses
What percentage of the genome is composed of retroviral insertions?
8.3%
What are microsatellites?
Short sequence (1-15bp) repeated in tandem many times (2-50). Dinucleotide repeats are the most frequent (unineucleotide repeats are found). Most common is 3-4 bp repeats.
How do microsatellites affect genomic complexity?
The reduce the complexity (ACACACACAC or GGGGGGG) for example.
They are prone to expansion and contraction during replication due to polymerase slippage.
Can microsatellite sequences be found in coding sequences?
Yes, but not very often, 3bp in length if found.
What are the major classes of RNA sequences that do not contain open reading frames?
tRNA (25% in cluster on Chr6) rRNA (28S, 18S, 5.8S, 150-200 copies) snoRNA (RNA processing/base modification, 97 different snoRNA) snRNA (22snRNA, multiple copies of some) Unknown function (mostly single copy)
What are pseudogenes?
sequences related to coding or non-coding sequences that have become mutationally saturated so that the function is lost. They are evolutionary relics, derived from genes (both coding and non-coding) by duplication or retrotransposition.
What are the different types of pseudogenes?
Gene fragments: single/ multiple exons.
Whole genes that include introns, mutated splice sites
Processed pseudogenes: mature mRNA from expressed gene reverse-transcribed and integrated into the genome: 4/5 exons that are inserted into the genome.
What are the features of coding genes?
They make up 1-5% of the genome, producing proteins which perform cellular activities. Can be single copy or multiple copy. Can be grouped into families based upon sequence familiarity; often evolved by duplication and divergence. Superfamilies also exist
What is the significance of gene overlap?
Numerous genes can be located on the same section of DNA. For example the BRCA1 (5’-3’) and Rho7 (3’-5)
Opposite strands can also code different genes
Describe the general structure of coding genes.
Promoter: RNA pol binding site, TF sites, TATA vs TATA-less
Introns and exons (coding must have exons - mitochondrial genes do no have introns)
Must have a 5’-UTR, drives translation for tissue specific elements. Has regulatory functions
Start codon (ATG); multiple ATGs present
Polyadenylation sequences to end transcription
Splice sites (AG/GT vs AT/AC)
Splice enhancers; exonic and intronic
Stop codon (TAA, TAG, TGA)
Once transcribed, mature mRNA can also have polyA tails added to them.
Describe the structure of the beta-globin gene.
It has three exons; at the beginning of the introns, GT at the beginning and AG at the end; this pattern can be found throughout the genome – splice sites.
Describe the alternative splicing methods.
Exon skipping.
Intron retention
Alternative 3’ site
Alternative initiation
Describe the Average Human Gene.
It covers 10-15kb of genomic DNA
Encodes a cDNA 1500-2000 bp, with the majority being intronic sequences (>85%)
There is a large variation in gene size, dystrophin is 2Mb whereas some are
Describe the genome’s content.
50’000-100’000 genes were predicted in the genome; there are 30’000-35’000 genes: alternative splicing however can account for some differences in number.
Describe the common features that can be identified in genes with a common function.
The leader sequence (which directs the protein to the cell surface) are approx 20 amino acids in length.
Extracellular domains
Stalk region
Membrane anchoring/ transmembrane sequence
intracellular domains (of different families) to deliver different sequences.
Which processes account for the largest number of genes?
Metabolism, reproductive and expression processes. Gene number reflects this (metabolism has the greatest number of genes)