Structure Of The Human Genome Flashcards
General distribution of genome
45% single sequence DNA (unique/contains gene coding regions)
45% intermediate repeat
10% highly repetitive
Gene
DNA sequence that contributes to phenotype of organism
Code for proteins or functional RNA molecules
Trans acting
Factors encoded by another gene, translated in cytoplasm and brought back into nucleus
Gene layout
Promoter (200 bp upstream) Tata box (30 bp upstream) Transcription start site Coding region Transcription stop site
Enhancers, may be located upstream, downstream or mid gene
Silencers: opposite of enhancer
Post transcriptional modifications
5’ cap added immediately
Cleavage ~ 30 bp downstream of AAUAAA > addition of 100-200 As
Splicing by splicosome
Intron = 5’ GT————AG 3’
Development of gene families
By duplication and divergence events
Pseudo genes
Arise from loss of function after duplication
Spacers in between genes may have a ____
Sequence independent function
Intermediate repeated sequence likely formed by ___
Transposition
Types of repeated DNA elements
LINEs long interspersed elements
SINEs short interspersed elements
LTR retrotransposons (long terminal repeat)
DNA transposons
LINEs
6-8 kB
Contain promoter for RNA POL 2, an ORF for protein similar to reverse transcriptase, and an ORF for an endonuclease for re-insertion
Because mRNA is copied from 3’ end, pol often doesn’t reach 5’ end therefore ‘functional’ LINE isn’t formed
SINEs
300-400 bp
Most were originally tRNA transcripts
Do not encode any proteins
Have promoter for pol 3
have similar 3’ sequence to LINEs so can be retrotransposed
3 SINE families in human genom
Alu (10% of genome)
MIR
MIR3
LTRs
Very similar to virus, just lacking envelope sequence
Encode for: reverse transcriptase, protease, RNAse H, and integrase
Occupy 8% of genome
DNA TRansposons
Encode for transposonase
Cut and paste
Copy doesn’t increase
Highly repetitive DNA
More dense than other DNA > forms bands in chromosomes aka satellites
Minisatellites
10-100 bp repeats in tandem arrays = 0.5-40 kb
Often occur near telomeres - limits use in mapping
Loci can be hypervariable - used in forensics
Microsatelites
2-4 bp repeats
Number of repeats varies - valuable genetic marker (more uniform distribution)
Likely arise by slippage during replication
Telomeres
Tandem repeats
Overhanging 3’ end (may fold back on itself)
Telomere is dissolved after ~ 50 cell cycles in eukaryotes - active telomerase can extend this
G-bands
Associate with low GC content
Visualized with Giesma stain
FISH
Fluorescent in situ hybridization
Can map chromosomal origin of a clone- important in mapping
Useful in karyotypes - painting
Histones
Octamer
2 (H2A + H2B + H3 + H4)
With H1 between ‘beads’
Giamsa stain protocols
Capture cells in prometaphase
Fix cells
Gentle digestion
Giamsa stain binds to AT rich regions
Why isn’t sequencing a good method for determining size
Repeat areas are shortened by incorrect overlap
Method for genome size determination
Feulgen stain:
Isolate nuclei and fix slide
Stain DNA
Image quantifies density
Converts density to pg (1pg = 1 Gbp)
Or flow cytometry
Reassociation test protocols
Extract DNA
Shear DNA to 400 bp
Boil in salty buffer to dissociate
Monitor over time
More small particle = higher concentration = faster reassociation
Human genome is CpG ____
Poor
Only 40%
Higher in gene rich areas (~50%)
Reason for CpG islands
Selection will maintain CG in gene coding regions by keeping them methylated
Outside of genes they undergo deaminatipn > change to uracil > 50/50 chance of being ‘fixed’ correctly
Ensembl stats
~ 20,000 coding genes
14,000 pseudogenes
200,000 gene transcripts
Duplication leads to _____
Gene families
Unequal crossing over leads to ____
Clustered gene duplication
Ex: his tone 1 cluster on chromosome 6
Interchromosomal crossovers can lead to ___
Segmental duplications
More common in high repeat areas (subtelomeric and pericentromeric)
Non-processed pseudo genes
Usually found side by side
Usually duplicated in tandem with original gene (may contain promoter)
Copied at genomic level therefore contains introns and is Collen non processed
Processed pseudogenes
Have no promoters or introns
Could be located anywhere
Formed by retrotransposition
Intermediate repeats
Interspersed throughout genome by jumping
Common human retrogenes
Escape from X to autosomal so genes can continue to be transcribed during replication (escaping prolonged condensed state)
Retrotransposons or rna transposons
Copy and paste
Increase in number
Alpha satellites
Bind to CenA (centromeric histone) have function in centromere kinetochore attachment
Beta satellites
Mostly near telomeres
Qualities of genetic markers
Sequence is known and location is known
May be polymorphic
Can be used for determining parentage, identifying individuals, quantifying diversity in population linking/mapping
Diseases caused by micro satellite repeat expansion
Huntington’s CAG
Myotonic dystrophy CTG
Fragile X CGG
Repeated RNA SEEMS TO TRAP MACHINARY PREVENTING MRNA PROCESSING