genes, genomes and genomics Flashcards
What is a genome?
The entire compliment of hereditary genetic information, in other words DNA
Region of the cell containing this genetic material is called a nucleus/nucleoid
This will include genes, regulatory sequences, structural DNA and ”junk” DNA (non-coding)
Genome functions
Genome size and complexity
DNA divided into “GENES” in usually “discrete” units of heredity
One gene per protein (most of the time)
Genomes vary in size
DNA is measured in base pairs (bp)
1 bp = 1 letter in the genetic code (A,T, C or G)
Therefore: 1,000bp = 1,000 As, Ts, Cs and Gs (1,000bp = 1kb)
Genome size is therefore measured in base pairs
- C-value: the total amount of DNA in the genome
- Does not always equate to the number of genes contained within the genome
- We would expect that the more complex the organism, the more DNA is needed to sustain the organism
- Therefore, linear relationship between genome size and organism complexity
The C-value paradox: massive disparity between genome size and complexity
Sequencing
Background
The human genome contains 3,164,700,000 bp
The average gene consists of 3,000bp
Sizes vary greatly
Largest known human gene being dystrophin at 2.4 million bp
The total number of genes is estimated at 20,000 - 25,000, perhaps or maybe more recently suggested to be 21,000
TWO approaches
EXTRINSIC
Compare the sequence to known sequences
Other genes already identified in other species
AB INITIO
Compare the sequence to key motifs in a gene
COMBINED APPROACH:
Human genome resources publicly available
Why study genomes?
L2 Viral genomes
DNA or RNA
Single or double stranded or both
Linear or circular or both
Viruses are not prokaryotes
Type of genome depends on life cycle
Prokaryotic vs eukaryotic genes
eukaryotic:
Untranslated region at 5’ to 3’ end
A TATA box indicates where a genetic sequence can be read and decoded
Promoter sequence, specifies to other molecules where transcription begins
Transcription is a process that produces an RNA molecule from a DNA sequence
prokaryotic:
Prokaryotic vs eukaryotic genomes
differences:
SIMILAR STRUCTURE TO PROKARYOTIC GENE (BUT DIFFERENT)
CODING SEQUENCE INTERRUPTED BY INTRONS
INTRONS ARE SPLICED FROM THE MRNA
Prokaryotic vs eukaryotic genome structure
eukaryotic:
DNA
Double stranded
Linear segmented
Many chromosomes
Location: cytoplasm
Mitochondria: multiple copies per cell.
Except mammalian red blood cells
Own genome: circular, multiple copies per cell
DNA is wrapped around histones
Each histone complex forms a nucleosome
Several nucleosomes wrapped together form a “solenoid” structure
Chromatin fibre
Nucleosomes wind into helix
Six nucleosomes per complete turn
prokaryotic:
DNA
double stranded
Circular
Non-segmented
One chromosome
Replication begins at “Ori” and ends at “Ter”
Two “replichores”
Left and right
DNA forms supercoils
Supercoils form DNA loops
Supercoils relaxed by topoisimerase
HETEROCHROMATIN and TRANSPOSABLE ELEMENTS
HETEROCHROMATIN-TIGHTLY PACKED FORM OF DNA CONDENSED DNA COMES IN MULTIPLE VARIETIES
TRANSPOSABLE ELEMENTS(TRANSPOSON OR JUMPING GENE)-GENE CAN CHANGE POSITION IN THE GENOME, CAN CREATE OR REVERSE MUTATIONS, DUPLICATION OF GENETIC MATERIAL
is the human genome unique
Homology is the existence of shared ancestry between a pair of structures or genes, in different taxa
Derived from the same ancestral tetrapod structure Most human genes are homologous to other species
DNA sequence that can be compared between two genomes is almost 99% identical
DNA categories across genome
SINEs are short interspersed nuclear elements, non- coding transposable elements
LINEs are long interspersed nuclear elements, transcribed in RNA and then converted back into DNA with RT(reverse of transcriptase) to insert into genome
Functional elements of the human genome
Alu sequences are the most common SINE
About 300 bp long
1,000,000 + copies in the human genome
Comprises approximately 10% of genome
The eukaryotic gene- regulatory sequences
This is a sequence of DNA found in the core promoter region of genes in archaea and eukaryotes
Non-coding DNA-sequence
TATA box is the binding site of the TATA-binding protein (TBP) and other transcription factors
TF recruit the enzyme called RNA polymerase
TATAWAW (W= A or T)
THE TATA BOX-DEFINES THE DIRECTION OF TRANSCRIPTION AND STRAND OF DNA TO BE READ
The eukaryotic and prokaryotic gene
A short (50–1,500 bp) region of DNA
Bound by proteins (activators) to increase the likelihood that transcription of a particular gene will occur
These proteins are usually referred to as transcription factors
Proteins known as transcription factors
Bind to the enhancer and increase the activity of the promoter
Found in both prokaryotes and eukaryotes
DNA is folded and coiled in the nucleus
Enhancer may actually be located near the transcription start site in the folded state
The eukaryotic gene-regulatory sequences
The eukaryotic equivalent of SHINE DALGARNO sequence in prokaryote
ACCATGG, part of the start codon
Functions as the protein translation initiation site
Site where ribosomes bind
Additional ribosomal binding site
5’ methylated cap of the messenger RNA
the human genome
Human genome = 3.17 x 109 bp
Different types of protein
Metabolism, Transcription/Translation
Genes cluster together into families
Eg.: RAB GTPase family and Tpr operon
GENE FAMILY RAB GTPASE (intracellular vesicle trafficking)
Family A-H
Represented gene families across taxa
Red: Mammal
Green: Plant
Orange: Fungus
No yeast Rab genes in family B and C
Human and Arabidopsis have at least one copy from each gene family
Some family clades are diversified and others show low diversity
Gene duplication and deletions
the human genome
Human genome = 3.17 x 109 bp
Different types of protein
Metabolism, Transcription/Translation
Genes cluster together into families
Eg.: RAB GTPase family and Tpr operon
GENE FAMILY RAB GTPASE (intracellular vesicle trafficking)
Family A-H
Represented gene families across taxa
Red: Mammal
Green: Plant
Orange: Fungus
No yeast Rab genes in family B and C
Human and Arabidopsis have at least one copy from each gene family
Some family clades are diversified and others show low diversity
Gene duplication and deletions
Tryptophan operon in prokaryotes and eukaryotes
prokaryotes:
Operon is continuous segment on one chromosome
Transcription = one start
Translation = five start sites
eukaryotes:
Five genes carried on different chromosomes
Transcription = five starts
Translation = five start sites
gene duplication
Gene duplication major mechanism through which new genetic material is generated during molecular evolution
Can be defined as any duplication of a region of DNA that contains a gene
Gene duplications can arise as products of several types of errors in DNA replication and repair machinery
Source of duplication includes recombination, retro-transposition events and replication slippage
Pseuogenes
Pseudogenes are segments of DNA that are related to real genes
Pseudogenes have lost at least some functionality relative to the complete gene
Genes are only VERY small part of the eukaryotic genome
Pseudogenes often result from the accumulation of multiple mutations within a gene whose product is not required for survival
Also caused by duplication and deletion
“junk” DNA
Many thousands of genes within a chromosome:
* Centromere
* Telocentric (centromere at telomere)
* Acrocentric (short arm very short)
* Metacentric (two arms roughly of equal length)
* Telomeres
* Chromatin
* Repetitive DNA
Telomere
Protects the ends of chromosomes by forming a protective loop
Telomere DNA loops back on itself to form a circular structure
Play a critical roles in chromosome replication and maintenance
Consisting of repeats of a simple-sequence DNA
Containing clusters of G residues on one strand (AGGGTT)
Important factor in determining lifespan and reproductive capacity
Insight into aging and cancer
Chromatin and satellite DNA
Euchromatin is the genetically active and generally not condensed in interphase
Involved in transcribing RNA to produce proteins used in cell function and growth
This predominant type of chromatin found in cells during interphase is more diffuse
Heterochromatin is inactive and concentrated in/around the centromeres and telomeres
Heterochromatin has repeat sequences called satellite DNA
Minisatellite has small repeats (TTAGGG at telomere)
Microsatellite has tiny repeats (2-6 bp, CA)
CA repeat = 0.25% of human genome, AA repeat = 0.15% of human genome
Transposable elements
Transposable elements can chance their positions in the genome
Also known as jumping genes
Makes up much of the mass of DNA in eukaryotic cells
Class I - function via reverse transcription (retro-transposons)
Class II - encode for protein transposase (DNA transposons)
Most common TEs are LINES and SINES