Week 6 Flashcards
What are the key features of prokaryotic DNA?
Very economical
Compact Organisation
Operon structure
Lack of repeat elements
Strong correlation between genome size and gene number
What are the key features of eukaryotic DNA?
Large
Large number of non-coding introns (25% of DNA in humans)
Small number of exons (1.5% in humans)
Large number of repetitive elements
Poor correlation between genome size and protein coding gene numbers
Why is there a strong correlation between genome size and number of genes in prokaryotes?
There are very few empty regions which are quite small. Nearly all space is occupied by a gene. There is linear relationship between genome size and number of genes
What is contained within eukaryoic genomes?
Large proportion of transposable elements
Large proportion of other repetitive elements
Exons in genes represent a small proportion of the genome
What are the 3 main regions of eukaryotic chromosome?
Telomeres
Long/Short arm
Centromere
What type of ploid are most eukaryotes?
They are diploid - two sets of chromosomes
What are polyploid organisms?
They have more than 2 sets of chromosomes (can be specified further eg tetraploid- 4)
How many angiosperms are polyploid?
70%
Where does polyploid comes from?
From a process of whole genome duplication
What are examples of animal polyploidy?
Salmon 4x
Frogs 4x
What are examples of crop polyploidy?
Potato 4x
Strawberry 8x
Rice 12x
Wheat either 6x or 4x
What is autopolyploidy?
Polyploids with multiple chromosome sets derived from within a single species often a result of meiotic error where gametes fail to reduce
What is the process of autopolyploidy?
A karyotype of parent species wull undergo meiotic error. This means the gametes produced will have full number chromosomes rather than half. If the gamete undergoes self-fertilisation then it will form a 4n zygote.
What is allopolyploidy?
Polyploids result from a hybridisation event
These end up with two sub-genomes one from each of the progenitor species
Potential way for new species to arise
What is the process of allopolyploidy?
One organism will undergo miotic error to form unreduced gamete. Then the unreduced gamete is involved with a hybridisation event. The other species chromosome often only has a single copy. This odd numbered animal will reduce with a progenitor species so all chromosomes from orginal species are 2n. This produces a viable offspring that will be a hybrid of the 2.
What is the 2R hypothesis?
it proposes that the early vertebrate lineage underwent two complete genome duplications
1R- When Jawed fish broke off from Jawless vertebrates
2R- When bony jawed vertebrates broke off from cartiliagonous fish
What is FSGD?
Fish Specific Genome Duplication otherwise called the 3R hypothesis in teleosts (type of bony fish)
What is the benefit of genome duplication events?
Both veterbrates and teleost’s have demonstrated huge adaptive radiations which may be linked to genome duplication
What happens when an animal undergoes genome duplication?
Huge selective pressure to rediploidise. There are scars of the genome duplication with pseudogenes, which mirror genes but dont have a promoter.
What is duplication chromosomal rearrangement?
Increase in copy number of a chromosomal region (segmental) or single gene (local)
What is inversion chromosomal rearrangement?
Chromosomal segment is inverted due to breakage and rejoining
What is translocational chromosomal rearrangement?
A mutation causing one portion of the chromosome to move to a different part of the chromosome or onto a new chromosome (reciprocal, non reciprocal and Robertsonian/whole chromosome (2 chromosomes joined together))
What is transposition chromosomal rearrangement?
Movement of a short DNA segment around the genome
What are the uses comparative genomics?
Help identify:
Conserved protein coding regions
Conserved transciption control sequences
Mechanisms of chromosomal evolution
What are the similarities between mice and humans genomes?
Almost all genes found in one species is found in another
Protein coding regions of the mouse and human genomes are 85% identical
Around 217 conserved synthetic blocks have been found between human and mouse genomes
What is the difference between humans and great apes with chromome count?
Humans have 2n of 46
Great apes have 2n of 48
Why do humans have 2 fewer chromosomes than apes?
Chromosome 2 in humans formed as a result of fusion of two smaller chromosomes - Chimp chromosome 12 and 13
What is a difference on chromosome 3 between humans and Orangutan homologue?
There is chromosomal inversion in the Orangutan homologue of chromosome 3
How long is Human DNA?
Each diploid human cell contains around 2m of DNA
Human body contains roughly 50 trillion cells- enough to go from sun to earth 300x
How is DNA packaged?
DNA is complexed with positively charged histone proteins to generate chromatin
Histone sequences are highly conserved in eukaryote genomes
What are the two types of chromatin?
Heterochromatin is tightly packed - often where noncoding regions are
Euchromatin is more loosely packed - This allows for transcription to happen
What marks heterochromatin?
They are marked by histone-modifying enzymes
What is the structure of human centromere?
Highly repetitive
AT rich alpha satelite monomers (171 bps)
Satellites are tandemly repeated into high order repeats
Kinetochores assemble during cell division to link centromere to spindle fibres
Pericentric heterochromatin forms around centromere making genome silent compartments
What is the role of telomeres?
Essential for maintenance of linear chromosomes
In the abscence of telomeres chromosomes would shorten each replication cycle
Prevent DNA repair systems from mistaking end of chromosome for a double stranded break
What is the length of telomeres?
They are comprised of 250-1500 TTAGGG repeats
Why would human chromosome shorten each time without telomeres?
DNA polymerase cannot construct the 3’ end of new DNA strand
What are included as Repetitive elements?
Structural repeats (centromeres, telomeres etc)
Pseudogenes
Simple sequence repeats/ microsatellites (2-5bps in length)
Transposable elements (c.45% of human genome, though small proportion are active (less than 0.05%)
Who identified Jumping genes?
Barbara McClintock
What was the work of Barbara McClintock?
Indentified two dominant genetic loci names Dissociation and Activator
She noticed that Dissociation caused chromosomes to break and had effects on neighboring genes when in the prescence of Activator
She noticed that both loci could change position on chromosomes
That Activator controlled the transpostion of Dissociation and that when Dissociation was moved the chromosome broke
How did Barbara McClintock indentify Jumping genes?
She observed the effects of their movement through changing colour patterns in maize kernals over generations and controlled crosses
What was the name of Jumping genes that McClintock observed?
McClintock observe was Type II transposon
What are type 2 transposons?
These use a “cut and paste” mechanism to get around
Produce mutations and target sequence duplications when inserted into genes
These are able to replicate during S phase of the cell cycle when a donor site has been replicated but the target has not
What are type 1 transposons?
They use a “copy and paste” mechansism of replication
These have a similar characteristic as reteroviruses such as HIV
Produce mutations when they inset into genes
What caused humans and apes to lose their tail?
They lost their tale due to an insertion of an Alu element (transposible element) into the intron of TBXT gene lead to homonid specific alternative splicing event
How many protein coding genes are their in humans?
20,000 to 25,000
Around 100,000 predicted
These genes generate the proteome which is more complex than lower eukaryotes
What is the homologues for proteosomes in other model organisms?
61% for D.Melanogaster
43% for C.Elegans
46% for Yeast
What are non protein coding gene include?
tRNAs, rRNAs for RNA processing
snRNAs for intron removal
microRNAs which have a role in control of gene expression
How much of the human genome is made of introns?
20%
What are the 3 important motifs within the intron?
Donor site
Branch site
Splice site
What catalyses the removal of introns?
The spliceosome makes an insition at the donor and splicer site. The intron is then looped togther so it cant interfere with the mature mRNA
What can be seen with RNA splicing with the Human dystrophin gene?
Human dystrophin is spread over 2.5 Mb
Primary transcript contains over 80 introns
Mature RNA is only 14,000 bases long
What is alternative splicing?
Alternative splicing joins exons in various combination, this is an economical way to create protein diversity
How many human genes can be alternatively spliced?
65-70% of human genes
What is an example of alternative splicing in humans?
Multiple promoters for each of the three human Neurexin genes
There are five exons for which alternative splicing can occur
it isnt known how many of the varients are functional
Why is genome sequencing useful?
Helps us understand variation between individuals (eg SNPs and INDELs)
Important for understanding genetic diseases
Charcterise difference in strains/varities/populations
Develop diagnostic assays for pathogens
Marker assisted selection in plants and animals
What can be determined by genome sequencing?
Identify protein coding and non-protein coding genes
Map gene regulatory elements
Study genome organisation and function
Understand mechanisms of genomic function
Hoe might DNA size negatively impact genome sequencing?
DNA molecules are large
Bacterial genomes c4 million bases
Human genome 3000 million bases (3Gb)
Wheat genome 17 Gb
How can DNA sequence negatively impact genome sequencing?
Bias in GC/AT content (stability GC bond more stable)
Repeat element content (some genomes can have >80% repeats)
Paralogs
How can DNA sampling impact genome sequencing?
Pure samples are best- not too hard with fresh samples
Ancient DNA- much harder
What does Sanger sequencing require?
Reactions contain: template DNA, a primer, deoxynucleotide triphosphates (ATP, TTP, GTP, CTP), DNA polymerase - like normal PCR
Also di-deoxynucleotide triphosphates (ddNTPs) which lack a 3’OH grouo so cant form a phosphodiest bond- when these bind to the reaction stop
How would you run a sanger sequencing?
Add your PCR to different tubes with low concentration of either ddATP, ddTTP, ddGTP or ddCTP. Then run them on a gel electrophoresis, then at each position you will be able to identify what each nucleotide is and its sequence
What is the problem wuth sanger sequencing?
It is inefficient if you are lucky with both forward and backward reactions you may get 300 bp
It needs to be run 4 times so takes 1 days work for 300 bp and requires lots of DNA
How did they advance Sanger sequencing?
Fluorescent chain terminators menat the whole reaction could happen in a single tube
Each of the 4 ddNTPS is labelled with a different fluorescent dye which emits light at a different wave-length
Massively increasing efficiency
How does Capillary electrophoresis work?
Samples are seperated by size using a long thin capillary instead of electrophoresis gel
A sample is injected and forces through a capillary
Then lasers shoot through the capillary fiber casuing the colour tags to fluoresce which is detected by a camera
What is the disadvantages of Capillary electrophoresis?
It is only good for 300 bp while using both forward and backward reactions
What is the overview of Sanger sequencing?
Basic dideoxynucleoside termination chemistry was developed in 1977
First automated in 1990
If you run both direction you can get a 300bp sequence
Samples must be clean and you cant multiplex samples
Error rate of sequencing is <0.1% but will incorporate PCR error which is ~4% over 30 cycles
When did genome sequencing begin?
1990 when Sanger sequencin became automated
When was Drosophila melanogaster genome sequenced?
2000
When was E.Coli first sequenced?
1998
When was the first draft of human genome sequenced?
2001
What are the two strategies for organising DNA sequences?
Hierarchical
Shotgun
What is Hierarchical strategy?
Start with starter genomes and break them into larger fragments. These are put in a bacteria with artifical chromosome. You can make a linkage map of the large genome to understand where each large fragment relates to each other. You can them shotgun each large fragment and rebuild them to understand the sequence of the large fragments
What is the shotgun strategy?
You get your DNA strands that you are investigating and break them down into large numbers of overlapping numbers. You look at where they all overlap an after matching large numbers of them up multiple times you can recreate the overall finished DNA sequence.
What is the disadvantage of using the shotgun strategy?
It requires a large amount of computing power and in the early days it wasnt viable.
What are linkage maps?
Maps that define the order of DNA markers along the chromosome
Where does the DNA from the hierarchical structure come from to form the linkage map?
Amplified genomes are sheared into large chucks (50-200kb) and clone into a bacterial host to make a bacterial artificial chromosome (BACs)
The genomic chunks are sheared randomly so will have overlapping ends
How can the orientiation of BACs be determined?
They can be worked out by looking at overlapping sequence tag sites (STSs) or restriction sites
What are STSs?
STSs are short regions whose exact sequence is unique in the genome
What are restriction sites?
They are short sequences that bind to a given restriction enzyme
What happens when a physical map of the chromosome is established?
The BAC libraries can be prepared for shotgun sequencing to get the final sequence
What was the impact of Next Gen sequencing?
Increasing scale of data output per run
Introduction of new sequencing platforms
When did Illumina start?
It started as blue skies research in the department of chemistry at the Univeristy of Cambridge
What sparked the idea which lead to major productivity increases?
Disscussions in 1997 sparked ideas of using clonal arrays and massively parallel sequencing of short reads with solid phase sequencing by reversible terminators
When was Solexa formed?
Solexa was formed in 1998 for R&D
How did they increase fidelity and accuracy of gene calling?
In 2004 they bought into molecular clustering technology and found that by amplifying of single DNA molecules into clusters enhanced fidelity and accuracy
In 2005 how many bases could they sequence in a single run?
In 2005 they were able to sequence the complete genome of a bacteriophage they were able to deliver 3 million bases on a single run
When was the first solexa launched?
In 2006 the first Solexa sequencer was launched - 10 GB in a single run
When was Solexa bought out by Illumina?
In 2007 Illumina bought out Solexa- sequencing technology had outpaced Moores law more than doubling in output each year
What is the process for library prep?
Target DNA is randomly sheared c.500 bps with no overhangs (transposases are used by Illumina)
Ligase an A tail onto the 3’ end
A T adapter is then added on to the sequence joining the A tail
This is repeated on the 5’ end
You can add two different indices on the adapters, this allows you to multiplex as the different indices can be pulled apart
What happens to DNA created in the library prep?
You will put it onto a flow cell, this essentially copies your bit of DNA multiple times because the more copies you have the more likely to spot an error later on.
What happens to the DNA binded to the flow cell?
Polymerases then duplicate this piece of DNA so its now bound to the flow cell. It then forms a bridge structure. The forward strand curls over latching onto the reverse strand primers. Polymerase works its way down, meaning you have a forward and reverse copy of the target DNA. This process happens lots of time. creating a custering effect with the large amounts of DNA formed.
What happens to the forward and reverse strand on the flow cell?
The reverse strand is washed away. A blocker around preventing the forming of bridges between these molecules
How does the forward sequence on the flow cell get sequenced?
A sequence primer ligates onto the loose end of the target DNA.
Rather than PCR there are lots of free floating nucleotides that have fluoresce attatched to them.
When they bind to the target DNA they fluoresce which is read by a tiny computer in real time
Each different nucleotide has a different colour.
All the clonal strands are read by the computer at the same time requiring 24 to 48 hours
The index is read back allowing you to create an idea where those strands come from
What happens when the forward strand is sequenced?
The reverse strand is sequenced
How is the reverse strand sequenced?
The bridge like structure is formed again and the Index 2 is read
Polymerases create the reverse strand and the forward strand is washed away
The same process occurs again allowing for the reading the reverse strands sequence
What happens when both strands are fully sequenced?
You will get a series of DNA fragment sequences which using analytical methods the fragments can group similar fragments together and aligned correctly for a fairly decent size of genome
What is the overview for Illumina NovaSeq?
Runs over 13-44 hours
Produces up to 20 billion reads per run
Max read length is 2x 250bps
Illumina chemistry is variable but around 0.1% error rate
You can multiplex samples (run several samples at the same time)
What are the third generation sequencing?
Long range sequencing
Two main providers include Pacific Biosciences (PacBio) and Oxford nanopore
PacBio was founded in 2004- first sequencing products released in 2010
Oxford Nanopore founded in 2005- first product MinION released in 2015
How does PacBio work?
Target DNA strand is circularised by placing adapter on each end and a polymerase is added to one end.
Within the sequel cell this is a smart cell, this is covered with lots of tiny little pores in which a single DNA molecule and its polymerase is added
The polymerase wil bind the opposing strand with a fluorescent everytime a nucleotide is added
What is the advantage of using PacBio?
The circularisation of the DNA means you can do much bigger bits as the DNA is more stable. This means long range sequencing is possible
How does MinION work?
They created a pore like structure. Around the pore there is a small electric charge and everytime a molecule enters it slightly disrupts the electrochemical charge. By reading the changes to that charge they are able to deduce what nucleotide has passed through the pore at that time
How was MinION revolutionary?
It is the first sequencer that can be taken into the field. It can be plugged into your laptop with the USB connector allowing you to sequence anywhere
What is the overview of PacBio?
Typically 5-60Kb length reads (can be over 100Kb)
13-15% error in normal sequal cells
However with HiFi more like 0.1%
What is the overview of Nanopore?
Typically 10-30Kb in length
Record of 2.3Mb
1% error rate but can be smaller
What is the advantage of long reads?
Easier to understand repetitve elements (transposable elements, tandem elements)
Centromeres are easier to understand (they are highly repetitive)
Paraloges (from genome duplication)
All of those make genome assembley very hard when there is no reference genome but long reads push through repeats regions and give evidence for placement of duplicate genes
What are the application of long reads?
Specific loci in the genome
Looking at envrionmenal DNA
Metagenomics
Resequencing (population wide)
Large genomic chunks
Whole genome sequencing
What is needed when deciding a sequencing strategy?
Application
Sample size
Depth/ breadth of coverage needed
Is there already a reference? (Are you working with a model species)
What strategy can be used when looking at small single loci in a few samples?
Sanger is more than likely fine
What strategy is bets when looking at metagenomics, eDNAm a single or multi loci sample?
Massively parallel short-range sequencing (Illumina)
What strategy can be used if you want to resequence a genome across a population if there is a reference?
Illumina can be fine
What strategy is used with whole genome sequencing without a reference?
It gets messy so long range techniques are best
When was the first telomere to telomere human genome published?
2021
When scientists did a sequencing of bird of paradise what did they use?
5 sequencing techiniques with 4 DNA assembilies