Gene Structure Flashcards
Bacterial genomes
- organised into operons with similar genes grouped
- distinct boundaries and regulation
- highly evolved
- economical/efficient
Eukaryotic genomes
- low numbers of protein coding genes
- non coding repeats made up 50% of genome
- small genes hard to locate
- gene density surprisingly low (introns, repeat regions)
- functional RNAs common
- uneven gene distribution with gene rich and gene desert regions
Chromosome Structure
- two sister chromatids held by a centromere
Other DNA
- B chromosomes: extra chromosomes to standard complement
- holocentric chromosomes: entire chromosome is a centromere
- extrachromosomal DNA: plasmids/organelle DNA
Organelle Genomes
- mitochondrial DNA: covalently closed circular DNA
- chloroplast DNA: single closed circular DNA
- photosynthesis genes
Genome Sectios
- unique and repeat regions
- 62% of genome is an intergenic region between genes (transposons, LINEs, SINEs, LTR elements
Repeated Intergenic Regions
- differs in GC content
- may have structural role (like telomeres)
- tandemly repeated DNA
- interspersed repeats (genome-wide repeats)
Satellite DNA
Satellite DNA (satDNA) is the highly repetitive DNA consisting of short sequences repeated a large number of times. It carries a variable AT-rich repeat unit - a and B satellite family
Tandemly Repeated DNA
- much shorter than satellite DNA
- minisatellites
- associated with structural features
- 10-100 bp
- microsatellites
- simple tandem repeats
- telomeres or dinucleotide repeats
- less than 13 bp
Satellite Formation
Replication slippage: daughter strand slips back 1 repeating unit
Tandemly repeated DNA: DNA recombination and unequal crossing over
Variable Number Tandem Repeats: used in identificatoin
Interspersed Repeats
- genome wide repeats (more than 100 bp)
- moderately repetitive
- transposon derived repeats: SINE, LINE, DNA transposons, LTR retrotransposons
DNA Transposon
- target DNA cut by transposases
- intermediate DNA inserted and ligated into sequence
- flanked by direct repeats
Maize Transposon
- excision of TE in somatic cells leads to pigment gene
- small spots: late excision
- large spot: excision early in development
- no excision autonomous element not in genome
- revertant: element excised and full expression restored
LINEs
long interspersed nuclear elements
- RNA transcribed and inserted back into genome at new site
- ORF1: RNA binding protein
- ORF2: reverse transcriptase
SINEs
short interspersed nuclear elements
- high copy number
- no genes
- transcribed by RNAPIII
- borrows transcriptase synthesized by LINEs
eg. Alu family
Retrovirus Related Sequences
LTR Retrotransposons
Human endogenous retrovirus
Purpose of Repeats
- metabolic burden
- rate of propagation must counteract rate of elimination
- maintenance suggests value
- nearly all genome is transcribed
- could cause evolutionary change
- could just be yet to be eliminated
How a genome acquires new genes
- horizontal gene transfer
- exon shuffling
- duplication
Benefits of Duplication
- new copy can acquire a new function due to selective pressure advantage
Recombination Duplication
- unequal crossing over (misalignment)
- unequal sister chromatid exchange
- DNA amplification during replication
- replication slippage (adds extra unit to short repeat)
- retrotransposons
Successful Gene Duplication
- 1 copy retains original sequence
- 2 copies may increase protein synthesis
- 1 copy may be neutral and become a nonfunctional pseudogene
- 1 copy may acquire new function (neofunctionalism)
Neofunctionalism
- gene duplicates gaining mutations
- new gene function
- expressed in different time/cell type
eg. chymotrypsin and trypsin are both proteases but recognise slightly different residues : duplication of common ancestor
Pseudogenes
- copies of functional genes with altered regions
- may contains frameshift mutations and have regulatory roles
- increase genome size
- if transcribed may form RNA homologous to functional RNA and regulate via interaction
Processed Pseudogene
- tandem duplication of genomic region
- 1 copy lack of selection
- inactivating mutations/incomplete duplication
- missing regulatory regions
Non-processed pseudogene
- reverse transcriptase activity
- mRNA to cDNA and genome integration
- lacks regulatory regions or introns
- derived from RNA
eg ribosomal pseudogenes: highly conserved large family
Multigene families
- multigene families can arise from beneficial duplications
eg rRNA genes - tandem gene family on same chromosome
- dispersed gene family on different chromosome
Globin Superfamily
- example of gene duplication and divergence
- colocation of dispersed clusters of genes
- natural selections coopts existing genes to innovate
- can create new useful properties
- having separate proteins for oxygen storage and transport with different binding affinities is useful
Whole Genome Duplication
- could be a source of speciation as duplicated genomes can reproduce with parents
Polyploidy
- multiple sets of chromosomes
- widespread in plant groups as having more DNA creates a larger fruit or flower product
- can induce genomic shock
Autoploidy
- multiplication of identical species within a single species
- fertilisation by unreduced gametes
- meiosis error accidentally producing diploid gametes so a triploid offspring
- more viable than allopolyploids
- can reproduce with each other permitting speciation
Alloploidy
- hybridisation between 2 reproductively compatible species
- via unreduced gametes from 2 diploid species (one step model)
- via hybridisation between haploid gametes and somatic doubling (two step model)
Triploid Wheat-Rye Hybrid
- yield of wheat with disease tolerance of rye
- tetraploid wheat crossed with diploid rye to give triploid triticale seed
- can artificially double chromosomes in sterile plant for fertilization with doubled gametes to give fertile offspring
Hox Gene Family
- WGD in metazoa led to evolution of organisational body plan complexity
- homeobox gene clusters in animal development are TF regulating developing with DBD
- different organisms have different repeats and clusters
Hox Gene Expression
- expression in different embryo regions
- gene order reflects expression order
- spatial and temporal colinearity
- each specifies body segment structures
2R/3R Theory
- Based on this model, two rounds of genome duplication occurred early in the vertebrate evolution [2, 3], but see also [4, 5]. An ancestral genome was duplicated to two copies after the first genome duplication (1R), and then to four copies after the second (2R) duplication [6, 7]. Recent data suggest that an additional whole genome duplication occurred in the fish lineage (3R or fish-specific genome duplication)
- Humans have 4 duplications
Benefits of WGD
- evolutionary diversification
- defence against mutation
- colonisation of new environments
- improved fitness
- duplication of regulatory regions generates biological complexity like body plans
Exon Shuffling Theory
- eukaryotic proteins are mosaics of motifs
- each domain has a specific function
- primordial exons correspond to domains
- duplication/insertion/rearrangements generate novel genes and proteins
Illegitimate Nonhomologous Recombination
- process by which two unrelated double stranded segments of DNA are joined.
Duplication - unequal crossing over
- replication slippage
- increases gene length
Shuffling - domains from different genes joined together
- retrotransposition or INHR
eg. aA-crystallin gene transfected into mouse - heat shock protein
- domain duplication by illegitimate recombination and unequal crossing over
- TOPOI nicks DNA and ligates non-homologous ends
LINEs and Gene Shuffling
- jump around using RNA intermediate and retrotransposition into genome
- transposition can take piece of DNA (3’ transduction)
- transcription reads through weak terminating signal
- domain retrotransposed
Transposons and Gene Shuffling
- mutator like transposable elements
- inverted terminal repeats flanking exons/introns
- encode transposase moving themselves and other MULEs
Intron Phases
- insertions and duplications affect reading frame
- need exact multiples of 3 in an exon to prevent this
- need intron phases the same
- either 2,1/1,2/3,0 insertions on either side
- Phase 0: introns lie between 2 codons (3,0)
- Phase 1: introns located after first nucleotide (1,2)
- Phase 2: introns located after second nucleotide (2,1)
Splice Frame Rule
following a successful shuffling event a newly acquired exon will be flanked by 2 introns of the same phase otherwise it will produce a frameshift in resulting coding sequence
- preserving reading frame makes less of a target for purifying selection (no loss of function with frameshift)
Evidence of Shuffling
multicellular structure
- adhesion/receptor molecules benefit from swapping domains to tune interactions
Franca et al.
- Phase 0 most common
- excess of symmetric exons compared to asymmetric
Exon 1-1 Shuffling
- phase 0 introns more ancient and prevalent in non-Metazoan lineages
- 1/1 shuffling associated with emergence of necessary features of multi-cellularity
- 1:1 shuffle compatible with glycine codons (interupted)
- easily split
- increases efficiency of creating mosaic proteins
- acquisition of domain must overcome structural limitations
- small and flexible domains fold independently
- linker sequences have glycine more frequently
Collagen Gene Shuffling
- a2 type 1 collagen gene
- highly repetitive sequence of glycine:x:y
- exons very glycine rich and can be split with 1:1 exon phase meaning they are compatible with large and repetitive protein collagen
Tissue Plasminogen Activator Shuffling
- found in blood clotting in vertebrates
- 4 exons
- growth factor domain and plasminogen gene
- upstream exon encodes finger module
- ancestors from 3 domains formed the TPA structure
- common ancestor formed by duplication and adding exons to genes
Exon Phases
These include the symmetric exons 0-0, 1-1, and 2-2 and the asymmetric exons 0-1, 0-2, 1-0, 1-2, 2-0, and 2-1, which can be joined into symmetric combinations. The symmetric exons (or exon sets) are the only ones that can be inserted into introns (of the same phase), undergo tandem duplication, or be deleted without disturbing the reading frame.
Protein Protein Interaction
- mosaic proteins can interact with several proteins to become hubs of PPI networks
- shuffling promotes self interaction capacity
- formation of homodimeric proteins with self interacting domains
- new connection networks
Amyloid Precursor Protein
- alternative splicing generates isoforms
- APP undergoes protease processing by a-secretase to give soluble APP
- processing by B scretase and gamma secretase to also give B-amyloid protein (amyloid plaques in Alzheimers)
- processing determined by KPI domain presence inhibiting a-secretase
- KPI domain gained by shuffling
- domain is flanked by introns
- similarity to metazoan protein