Lecture 1: Organisation of the Human Genome Flashcards
DNA - What does it do?
Hereditary/Genetic Information carried by DNA
Shape of DNA - describe it
- calculations, who
1.double helical structure
- described by Watson and Crick (and Rosalind Franklin) in 1953
3 * 10bp/turn, 3.4nm/turn, 2.37nm diameter
Which types of DNA exists as Double helical structure:
4
- Nuclear
- Mitochondrial
- Bacterial
- Viral
Every (Nucleated) human cell has HOW MANY GENOMES
- WHAT IS THE CONTENT?
2 Genomes
- Mitochondrial DNA
- Nuclear DNA
Explain Mitochondrial DNA: 8
- (<0.001% of DNA)
** 16 569bp,
**37 genes,
*** 13 involved in respiratory chain,
***24 non-coding RNAs
– Closed,
– circular DNA,
— densely packed
Explain Nuclear DNA
- > 99.999% of DNA
**(~3109bp, >20000 genes)
*** 23 pairs of chromosomes, varying sizes
*** Genes spaced irregularly, contain introns and exons
*** >2m of linear DNA per cell, requires dense folding
Understanding the Mitochondrial Genome:
1 * Membrane-enclosed organelles
2 * 1000s per cell (depending on cell type)
3 * Converts energy from food to usable ATP
4 * Genome is ~17kb, closed circular loop
5 * Mitochondrial genome encodes :
* 2 ribosomal RNAs (rRNA)
* 22 transfer RNAs (tRNA)
* 13 polypeptides (mostly resp. chain)
6 * Genes do not have introns (cf. prokaryotes)
- Mitochondrial genome encodes : 3
- 2 ribosomal RNAs (rRNA)
- 22 transfer RNAs (tRNA)
- 13 polypeptides (mostly resp. chain)
Human Karyotype: Male vs Female
Male: 46 Chr, XY
Female: 46 Chr, XX
Paired 1-22 (Autosomes)
23rd pair = Sex chromosome XX, OR XY
Chromosomal DNA packaging: Histones?
- Histones are NUCLEAR-ENCODED GENES
- String beads
- ~11nm
- A type of protein found in chromosomes.
- Histones bind to DNA, help give chromosomes their shape, and help control the activity of genes. Enlarge. Structure of DNA.
Chromosomal DNA packaging: Chromosomes
- 46 Chr
- Chromosomes must be UNFOLDED and REFOLDED DURING REPLICATION AND WHEN GENES ARE EXPRESSED.
Chromosomal DNA packaging:
CODES?
SIZES?
Many repeated “codes” within DNA sequence
1.Metaphase Chromosome
~1400nm
2.Condensed chromatin
~300-700nm
3.Packed chromatin fiber ~30nm
- DNA double helix approx 2nm
Structure of DNA
- Sugar: deoxyribose or ribose
- Phosphate group: PO4-2
- Nitrogenous Base: Cytosine-Thymine, Adenine-Guanine
1-3 = Nucleotide
- complementary strands of nucleotides held together by hydrogen bonds between G-C and A-T base pairs.
6.Purines (adenine and guanine) are two-carbon nitrogen ring bases
- pyrimidines (cytosine and thymine) are one-carbon nitrogen ring base
- In the DNA segment shown, the 5′ to 3′ directions are down the left strand and up the right strand.
— The 5′-end (pronounced “five prime end”) designates the end of the DNA or RNA strand that has the fifth carbon in the sugar-ring of the deoxyribose or ribose at its terminus.
9.A codon is a DNA or RNA sequence of three nucleotides (a trinucleotide) that forms a unit of genomic information encoding a particular amino acid or signaling the termination of protein synthesis (stop signals).
- Genes are short pieces of DNA that carry specific genetic information.
- Genes are made up of a sequence of nucleotides.
What is TRANSCRIPTION?
WHAT ARE THE STEPS
In biology, the process by which a cell makes an RNA copy of a piece of DNA. This RNA copy, called messenger RNA (mRNA), carries the genetic information needed to make proteins in a cell. It carries the information from the DNA in the nucleus of the cell to the cytoplasm, where proteins are made.
- Transcription is the first step in gene expression. It involves copying a gene’s DNA sequence to make an RNA molecule.
- Transcription is performed by enzymes called RNA polymerases, which link nucleotides to form an RNA strand (using a DNA strand as a template).
- Transcription has three stages: initiation, elongation, and termination.
In eukaryotes, RNA molecules must be processed after transcription: they are spliced and have a 5’ cap and poly-A tail put on their ends. - Transcription is controlled separately for each gene in your genome.
What is TRANSLATION?
STEPS OF TRANSLATION?
In biology, the process by which a cell makes proteins using the genetic information carried in messenger RNA (mRNA). The mRNA is made by copying DNA, and the information it carries tells the cell how to link amino acids together to form proteins.
Translation proceeds in three phases:
- Initiation: The ribosome assembles around the target mRNA. The first tRNA is attached at the start codon.
- Elongation: The last tRNA validated by the small ribosomal subunit (accommodation) transfers the amino acid. It carries to the large ribosomal subunit which binds it to the one of the preceding admitted tRNA (transpeptidation). The ribosome then moves to the next mRNA codon to continue the process (translocation), creating an amino acid chain.
- Termination: When a stop codon is reached, the ribosome releases the polypeptide. The ribosomal complex remains
What is the HUMAN GENOME PROJECT?
- Sequenced between 1990 and 2001 by a public International
Consortium (IHGSC) and by a private company (Celera Genomics) - Doesn’t represent a single individual, made of a PATCHWORK OF SEQUENCES FRO DIFFERENT INDIVIDUALS.
- Early analyses done “by hand”. More recently, large-scale computerbased analyses have been required
- Freely available online programs to compare sequences
– (e.g. BLAST : http://www.ncbi.nlm.nih.gov/BLAST/) - SEQUENCES are “ANNOTATED” with WITH ALL KNOWN INFORMATION REGARDING GENES, REPETITIVE REGIONS, OTHER INFORMATION.
- Questions :
– Is the genome sequence complete?
– How do we look at the genome content?
– What is the content of the genome?
– What are the functions of individual components of the genome?
– How does the genome vary from individual to individual?
With the NUCLEAR GENOME …HUMAN GENOME PROJECT WE CAN LOOK FOR: 2
- Repetitive sequences
- Specific sequences
* Predicted sequences (similarity with other known genes)
* mRNA/cDNA sequences
* ESTs = Expressed sequence tags (ESTs) are fragments of mRNA sequences derived through single sequencing reactions performed on randomly selected clones from cDNA libraries
COMPOSITION OF HUMAN GENOME…
human genome = 3200 Mb
1. - Gene related sequences = 1200Mb
- GENES = 48Mb
- Related Sequs = 1152Mb
— Pseudogenes,
— gene fragments,
—introns and UTRs
- Intergenic DNA = 2000Mb
- Interspersed Repeats = 1400Mb
—LINEs =640Mb
—LTR = 250Mb
—SINEs =420Mb
—Transposons = 90Mb
- Intergenic DNA = 2000Mb
- Other Intergenic = 600Mb
— Microstaelites = 90Mb
— VARIOUS = 510Mb
LOOK AND UNDERSTAND DIAGRAM SLIDE 11
What are Retrotransposons? = 6
1 * Sequences related to retroviruses
2 * gag, pol and env genes
3 * LTRs (long terminal repeats)
4 * MANY TRUNCATED SEQUENCES IN THE GENOME (LACK ‘env, or just ‘LTRs’)
5 * Pol produces a reverse transcriptase
which ALLOWS DNA TO BE INTEGRATED INTO THE GENOME.
6 * Unlike retroviruses, retrotransposons can’t
move between cells
LINEs, SINEs and ALUs…
what is LINEs? 4
- LINE : Long INterspersed repeat Element
- Long (6-8kb) and can copy themselves to other parts of the genome
- ENCODEPROTEINS WHICH ARE REQUIRED FOR THEIR INTEGRATION INTO THE GENOME.
- 3 distinct LINE families, LINE1, LINE2 and LINE3. Only LINE1 is still transpositionally active
What are SINEs and ALUs? =7
- SINE : Short INterspersed repeat Element
2 * Shorter than LINEs. Often aren’t able to integrate themselves
3 * Use LINE proteins to integrate
4 * SINEs are also divided into families (ALUs and MIRs). Only remaining active
family are ALUs
5 * ALUs occur only in primates. ~1 every 3kb
6 * ALUs classified into numerous sub-families based on their sequence
7 * ALUs are >80M years old. Can be used as molecular clocks
LINES, SINES and ALUs ..percentages..5
1 * About 50% of genomic DNA is transposable elements
2 * Can damage the host genome through insertional mutagenesis or
unequal crossover.
3 * They don’t move much anymore
- Most = SINEs (then Alul)
LINEs
LTR elements
DNA elements
- mariner
Unclassified (least) - Total of all types = 44.7
*transposable elements in the human genome
What are Microsatelites? 4
1 * Short sequences (1-15bp) repeated in
tandem many times (2-50).
Dinucleotide repeats are the most
frequent
2 * Result in “low complexity” sequence * eg. ACACACACACACACACACACACA
or GCGCGCGCGCGCGCGC
3 * Prone to expansion and contraction
during replication due to polymerase
“slippage”
4 * Microsatellite sequences can be found
in coding sequences, but not very often
What are Non-coding RNA genes (ncRNA)?
a functional RNA molecule that is transcribed from DNA but not translated into proteins
Non-coding RNA genes (ncRNA) MAJOR CLASSES…8
1 * tRNA (Translational machinery; gene cluster on Chr 6 – almost complete set)
2 * rRNA (Translational machinery; 150-200 copies. )
3 * Short Regulatory ncRNA
—- 4* snoRNA (RNA processing/base modification. 97 snoRNA, >85% single copy)
—- 5. * snRNA (RNA processing/splicing, multiple copies of some)
—– 6. * miRNA/piRNA/tiRNA (gene expression)
7 * lncRNA (epigenetic control of chromatin, promoter-specific gene regulation, mRNA
stability, X-chromosome inactivation and imprinting)
8* Others? (very current field of research)
Non-coding RNA genes (ncRNA)
LOOK AT SLIDE 17
What are Pseudogenes? 6
1 * Sequences related to coding or non-coding sequences that have mutated such that expression/function is lost (e.g. stop codons introduced,
frameshifts etc)
2 * Derived from genes (coding and non-coding) by duplication or
retrotransposition
- Different types include :
— 4 * Gene fragments
* single exons, multiple exons. Very common
- Different types include :
—5 * Whole genes
* Includes introns. Splice sites often mutated
* Processed pseudogenes
—6. * Mature mRNA from expressed gene reverse-transcribed and integrated
into the genome
Pseudogenes
Different types include : 3
1 * Gene fragments
* single exons, multiple exons. Very common
2 * Whole genes
* Includes introns. Splice sites often mutated
3 * Processed pseudogenes
* Mature mRNA from expressed gene reverse-transcribed and integrated
into the genome
Pseudogenes
- Missing promoter
- missing start codon
- frameshift
- premature stop codon
- missing intron
- partial deletion
look at gene segment drawing SLIDE 18
What are “CODING” GENES? 9
1 * Make up 1.5% of the genome, but they are the most studied
2 * Produce proteins which act perform activities required by the cell (metabolism, transcription, translation, etc etc)
3 * Can be single copy (e.g. Beta globin) or multiple copy (eg HLA class I genes)
4 * Genes can be grouped into families based on sequence similarity
— 5– Often evolved by duplication and divergence and found in clusters
6 * Some families group into superfamilies based on a common protein domains (eg Ig-SF)
7 * Coding genes can be identified by comparing mRNAs (i.e. spliced sequences) with
genomic sequences
8 * Genbank is a public store of mRNA sequences generated by laboratories worldwide
9 * Gene/mutation naming conventions important for communication of findings
Genes and repeat elements overlap = 4
1 * Numerous genes, different orientations (forward and reverse, opposite strands)
2 * Pseudogenes and gene fragments often intermingled (repeat content very dense)
3 * Coding genes can overlap in opposite orientations
4 * Some genes may contain complete genes within introns
Genes and repeat elements overlap IMAGE
SLIDE 20.. DRAW AND LABEL
General structure of coding genes
All genes have; 9
1) Promoter – TF & RNA pol binding site
(TATA vs TATA-less)
2) Introns and exons (coding)
3) 5’ UTR (drives translation)
4) Start codon (ATG)
5) Splice sites (AG/GT vs AT/AC)
6) Splice enhancers (exonic, intronic)
7) Stop codon (TAA, TAG, TGA)
8) 3’UTR (mRNA stability & localisation)
9) Polyadenylation signal (sequence)
General structure of coding genes
Fig 3.6
All genes have; IMAGE DIAGRAM
LABEL AND DRAW THE DIAGRAM ON SLIDE 21
Understanding Splicing and splice sites: 7
1 * DNA is transcribed to RNA in the nucleus
2 * RNA is exported to the spliceosome where introns are spliced out to yield
a mature mRNA
3 * Specific sequences affect splicing
—- 4* Splice acceptor/donor sites occur
at intron/exon boundaries
5 * Enhancers/Silencers occur within introns and exons and can affect splicing in specific tissues (SR proteins)
—- 6* SR proteins can direct the inclusion and exclusion of specific exons
—7 * The mixture of SR proteins differ from tissue to tissue
Splicing and splice sites image
draw and label diagram on slide 22
Alternative splicing can produce multiple protein isoforms… 6
- exon skipping
- intron retention
- alternative 5’ donor or 3’ acceptor
- mutually exclusive exons
- alternative promoters
- alternative splicing and ployadenylation
understand all forms and draw the diagrams on slide 23
The “Average” human gene = 4
1 * Large variation in gene size (2kb – 2Mb)
2 * Large variation in protein sizes
3 * Large variation in UTR lengths (3’ generally longer than 5’)
4 * Many genes have alternative first exons with different 5’UTRs
LOOK AT TABLE ON SLIDE 24
Understanding the features of Human genes..
1 * Approximately 50000 – 100000 genes were predicted in the genome
2 * The completion of the genome sequence in 2001 showed 20,000-25,000
3 * Alternative splicing explains the difference (some genes can produce >10 different proteins)
4 * Common features can be identified in genes with related functions
5 * Cell Surface Receptors for example;
* Leader sequence (to direct proteins to the cell surface) ~20 amino acids
* Extracellular domains (of different families. One example is Ig, SH-linked)
* Number of EC domains can vary
* A stalk region
* A membrane anchoring sequence and/or transmembrane sequence
* An intracellular domain (of different families). Delivers signals
Human Genes….Cell Surface Receptors for example; 6
- Leader sequence (to direct proteins to the cell surface) ~20 amino acids
2 * Extracellular domains (of different families. One example is Ig, SH-linked)
3 * Number of EC domains can vary
4 * A stalk region
5 * A membrane anchoring sequence and/or transmembrane sequence
6 * An intracellular domain (of different families). Delivers signals
NKp44 – a receptor on NK cells
draw image on slide — gene segment and features…
SLIDE 25
Protein families =12
- Cellular processes
- metabolisim
- DNA replication/modification
- intracellular signalling
- cell-cell communication
- protein folding and degradation
- transport
- multifunctional proteins
- cytoskeletal/structural
- defence and immunity
- miscellaneous function
- transcription/translation
Look at slide 26 graph