Eukaryotic Genome Structure Flashcards
What is c value paradox
• The genomes of eukaryotes are orders of magnitude larger than archea and bacteria
• Scientists suggested that this size range correlated with organismal ‘complexity’
• This would make sense as more genes would be required for this complexity
• However
• There exists variation in genome sizes within an organisms class
• E.g. approx 100 fold difference between the smallest and largest amphibian genome
• Even though they have same body plan and metabolism
• Gene numbers don’t scale with genome size
• E.g. yeast have many more genes than expected if comparing genome size to size of human genome
• The confusion was called the C-value paradox
• C-value = haploid DNA amount in the genome
What explains c value paradox
• Some organisms have large organelle genomes
• Some organisms have duplicated genomes (polyploidy)
• Non-coding DNA
• Less than 5% of human DNA contains the approx 25000 genes
• Amount of non-coding dna does increase dramatically with organism complexity (and can explain class differences)
What is non coding dna
• Eukaryotes have more dna that does not code for protein or for any other functional product molecule than prokaryotes
• Approx 98% of human genome is non-coding as opposed to 11% of e.coli
• “Junk DNA”
What do we think is the function of non coding dna
• Eukaryotes have evolved sophisticated gene regulation
• Correlates with biological examples
• Approx 9% of him sapiens genes encode transcription factors
• Approx 5% of drosophila and 3% of s.cerevisiae
• Other concepts including splicing and alternative splicing also explain increased complexity
Complexity of dna (C0t analysis)
• C is DNA conc, t is time taken to re nature
• Based on re-naturation of ssDNA
• Indicates type of unique and repetitive dna
• As dna cools complementary sequences find each other and base pair
• Since a sequence of ssDNA needs to find its complementary strand to reform a double helix, common and repetitive sequences re nature more rapidly than rare sequences
• The rate at which dna reanneals is a function of the species genome characteristics (size and complexity)
• The bigger the genome the longer it takes for two complementary sequences to meet
• Repetitive DNA will re nature at low C0t values
• Unique DNA re natures at high values
• Eukaryotic genomes have a range of sequences of different repitition levels
• 1) single copy (some functional genes)
• 2)Middle repetitive dna (100-5000bp) ,10^6 transposons
• 3) highly repetitive dna (up to 10bp, copies > 10^6) i.e. tandem repeats
What’s in a genome
• About 2% of eukaryotic dna is coding (encodes proteins)
• 25-50% of the protein-coding genes in eukaryotes are represented only once in the haploid genome
• But even they have non-coding dna associated with them
Pseudogenes
• Pseudogenes: once functional (can be again)
• Evolutionary relic
• Under some definitions non-coding RNAs (e.g. tRNAs etc.) are considered non-coding functional sequences (a very protein centric definition)
• Transcription factor binding sites such as enhancers and sequences are also non-coding but functional
Multigene families
• Groups of identical or very similar sequences
• Can be tandemly arrayed (head-to-tail fashion)
• Examples include the tRNA genes (at approx 50 sites, containing 10-100 genes), histones genes in some species
Dispersed multigene family
• Some genes have not been tandemly repeated but have become dispersed at several locations in the genome through chromosomal re-arrangements
• They may have different functions
• The Aldolase gene family has 5 members
• They are located on chromosomes 3,9,10,16 and 17
Transposons
• Some repeat sequences are transposable elements, which presumably have increased in copy number through transposition
• TE are found in all organisms and are trans-posed via a dna or RNA intermediate
• Some “retro transposons” resemble retroviruses but only move within a cell rather than between cells
• Genome wide repeats: several thousand repeats per element
• -LTR retroelements are important in some genomes (maize)
• (Degraded in others: endogenous retroviruses 4.7% of human genome)
• Not all types of RNA transposons have LTR elements
• In mammals the most important are LINES (long interspersed nuclear elements) and SINES (short interspersed nuclear elements)
• SINES: highest copy number in human genomes
• 1.7 million copies (14% of genome)
• LINES: less frequent but longer
• Approx 1 million copies (>20% of genome)
• DNA transposons are less common than retrotransposons
• The human genome contains 350000 copies but most are inactive
Why do we need dna condensation
• All life on earth uses dna to store genetic information
• Genetic info is often much longer than the cell it fits into
• Contour length- the length of the dna assuming a B-form double helix
• Result – need a division of the genome (e.g. linear double-helical molecules:chromosomes)
• All eukaryotes have at least 2 chromosomes
• Variability in chromosome number is unrelated to organisms biological features and genome size
Why do we need dna condensation in humans
• The 23 human chromosomes contain from 50 to 250 Mbp
• DNA molecules of this size are 1.7 to 8.4 cm long when uncoiled
• Typical human call contains 46 chromosomes equal to 6x10^9 Bp
• Cell nucleus has a diameter of 10-20 microm
• If chromosomes were not condensed it would be impossible to replicate and transcribe them correctly or segregate them to daughter cells
What is the nucleosome
• DNA is wrapped around nucleosomes
Each nucleosome consists of a little less than 2 turns of dna wrapped around a set of 8 proteins called a Histone octamer
• The nucleosome is an octamer of histones
• H2A, H2B, H3 and H4 (102-135aa)
• Histones are highly conserved
• H4 from pea and cow only differ by 2 aa
• Histones proteins form a barrel-shaped core octamer
• H3.H4 dimer forms
• Tetramer forms
• Interacts with H2A.H2B dimer
• Octamer interacts with 146bp of dna
Histone proteins form a barrel shaped core octamer
Nucleosome core particle (NCP) is 11nm diameter, 6nm height
• Histones contact minor groove, leaving major groove available for gene regulating expression
• Histones 1 (linker histones) locks the complex with 20-90 bp in place
• The chromatosome is the dna + octamer complex
• Nucleosome distribution varies between organisms and chromosomal locus
• DNA binding is sequence dependent
• Octamer can migrate (aids polymerase access etc.)
• 30nm fibre/solenoid:
• The poly nucleosome is thought to be an infrequent structure
• The chromatin condenses by zig-zag folding (the solenoid)
• Histones H1 stabilises this structure
• Histones H2A-H2B dimer and H4
• Sequential NCPs rotated approx 71 degrees
• Extent of compaction depends on coupling DNA around the fibre
• Responds to cell environment (pH, DNA binding proteins, etc.)
• Scaffold association:
• A 30nm fibre of a typical human chromosome would be 1mm long, more compaction needed
• 30nm fibre organised as looped domains
• Protein scaffold made of histone H1 and other proteins (Sc1 and Sc2)
• Scaffold attachment points (AT rich region)
• Radial arrangement of loops