Sorefan Flashcards
What is whole genome sequencing?
- complete genome seq of organism at single time
- inc seq of chromosomal DNA and mito/chloro etc DNA
What are the challenges for genome sequencing?
- NA extraction from cells –> needs high quality and conc
- fragmentation
- sub-fractionation size selection –> to isolate fragments of correct size
- separating indiv molecules
- amplification of signal
- reading signal
- data analysis
What were the 3 phases of human genome project?
- genetic and physical maps of human and mouse, seq yeast and worm
- -> technology dev
- draft seq –> inc many gaps and errors
- finished seq –> fill in gaps and correcting errors
How are genetic maps made?
- analyse genetic distance between genes by measuring recombination freq
- markers rely on variation of seq between parents and individuals
- distance measured in centimorgans
- mostly PCR based, eg. polymorphisms in genes and DNA markers
- linkage map by looking at relative distances of 2 or more polymorphic genes and measuring RFs
- DNA markers superseded phenotypic markers
- DNA based mol markers could be RFLPs
- -> methods to analyse are slow so moved onto using SSLPs as easy to analyse w/ PCR
What are SSLPs?
- simple seq length polymophisms
- repeat regions in genome that vary in length between pops
- usually mini and microsatellite seqs
What are minisatellites?
- repeat units up to 25bp
- not spread evenly around genome, mostly at telomeric regions
- several kb long
- difficult to PCR
What are microsatellites?
- usually di or trinucleotide repeats
- few 100 bases long
- easy to PCR
- 650,000 in genome
Why are genetic maps in humans limited?
- large pops of siblings don’t exist, so limited no. recombination events to study
- recombination events not at random genome positions –. recombination hotspots
How are physical maps created?
- restriction mapping locates relative positions on DNA molecule of recognition seqs for for REs
- FISH = map marker locations by hybridising probe containing marker to intact chromosomes
- STS = map positions of short seqs by PCR
What are the advantages of creating BAC libraries from indiv chromosomes?
- BAC clone library can be used to seq genome
- BACs w/ inserts from each chromosome could be shared across consortium
How is genome sequencing carried out clone by clone?
- extract DNA
- fragment DNA
- -> ideally completely random so no parts missed out
- -> by physical methods = sonication, hydrodynamic shearing, restriction enzymes and transposase
- -> by chemical methods (mostly used to fragment RNA) = heat and divalent cation (Zn and Mg)
- size selection –> gel electrophoesis
- clone 100-200kbp fragments into BAC plasmids to create library
- transformation of bacteria for BACs
- pick indiv colonies and extract vector (each tube has many copies of indiv DNA insert)
How are clones positioned on genetic and physical maps?
- test clones for PCR markers w/ known locations
- BAC end sequencing using Sanger
- -> known seq so can design primer
- -> denature vector and Sanger seq
- -> design primer to reverse strand to seq other direction
- -> end seqs from same insert, so are paired end read
Why are paired end read useful?
- can physically link 1 end of seq w/ another, so can be used to resolve seq gaps
How is it decided which BAC has insert next to insert of interest?
- gen contiguous set of clones
- if any of BACs inc end seq, then insert they contain must be next to it
- test BAC library for end seq from desired vector by PCR
- repeated over and over again until all BACs placed in order on each chromosome
- created contig
Why was shotgun seq of BAC clones needed?
- as BAC end seq leaves most of middle of genome insert to seq
How was shotgun seq BAC clones carried out?
- each BAC clone broken up into 5-10kb fragments
- cloned into diff vector that accepts smaller inserts
- if seq lots of paired end seqs can assemble large fragment (=consensus seq)
How did Celera seq human genome?
- fragmented genome into 2-50kbp fragments
- cloned 2, 10 and 50kbp fragments into plasmids to create library
- assemble reads to create consensus seq and seq contigs
- draft genome had 98% bases
Why did the IHGP use clone by clone instead of whole genome shotgun seq?
- to prove feasible for complex repeat rich genome
- assembly easier and could be performed confidently
- could target gaps for finishing
- better suited to diverse international consortium
What needed to be done to finish the human genome?
- fill in sequencing gaps and physical gaps
Why were gaps present in human genome, and how could these problems be solved?
- cloning bias
- no restriction sites –> use diff RE, use physical or chem fragmentation method
- insert unstable –> use diff vector
How were seq gaps closed?
- paired end seqs align to either side of gap
- if gap < 1kbp = PCR across gap
- if gap > 1kbp = sequential seq along insert
How were physical gaps closed if know order of scaffolds?
- if gap region absent from all gene libraries
- PCR used to amplify genomic DNA spanning gaps and amplified DNA seq directly w/ or w/o cloning into vector
- PCR products over 3kbp hard to amplify
How were physical gaps closed if don’t know order of scaffolds?
How do we know which pairs of primers are adj and will give product?
- try every poss and look for PCR reaction products using gDNA as template
- in singleplex PCR reaction each combo of primers tested w/ genomic DNA as template
- process sped up w/ multiplex PCR, as multiple pairs of primers tested in single PCR tube, so fewer reactions need to be performed
- use algorithm to decide min no. primer combos
Where are repetitive seqs found in genome?
- approx 45% of genome
- mini and microsatellites
- centromeres
- telomeres
- transposons
- duplicated genes
What are the problems w/ repetitive seqs?
- hard to assemble and usually resolved last
- many poor quality never get these regions resolved
- better if had tech to seq v long fragments that span repetitive seq
How can repetitive seqs cause rearrangements?
- if break up seq and reassemble can get shorter as don’t know original order
How can repetitive seqs confuse computer, and how can this be solved?
- tandem repeats can cause truncations –> computer can’t be sure where to map reads to
- worse if repetitive seqs on diff chromosomes
- seq across whole repetitive region to identify flanking seqs that are unique (IGHL couldn’t do this as not poss at time)
- anchor flanking seq regions to known positions of genome –> need genetic and physical maps
What factors influence the species chosen for genome projects?
- genetic model
- commercial/medical relevance
- genome size –> small easy, so bacteria most common
Why was NGS needed?
- Sanger and Celera slow, expensive, cloning bias, low coverage and 1 seq at time
How was NGS initially dev?
- 1st was 454 sequencing dev by Roche
- Applied biosystems dev competing tech called SOLiD
- both replaced by Solexa
How does NGS compare to Sanger seq?
- NGS involves fragmenting genome and seq all fragments or in parallel
- no plasmid cloning req and cheap
- NGS still has similar challenges –> 1 key challenge is how to separate indiv seqs as amplify insert so machine can measure signal
How is Illumina library prep carried out?
- need to create library of gDNA inserts flanked by adaptors of known seq
- fragment gDNA using physical/enzymic/chemical methods
- size selected on gel
- end repair DNA so has blunt ends
- -> DNA pol I fills in 5’ overhang
- -> exonuclease removes 3’ overhang
- add A tail so DNA ends not compatible and DNA cant concatermerise
- adaptors ligated to ends of fragments using DNA ligase (adaptors are ds oligonucleotides w/ T overhang)
- adaptor seqs elongated by PCR using extended oligonucleotide primers that inc unique 6 base index seq
- allows amplification and quantification of library
What are the functions of adaptor seqs?
- provide priming sites for PCR amplification
- allow index seqs to be added
- priming sites for bridge amplification
- priming sites for seq
What are indexes in Illumina indexed libraries, and why are these included?
- unique 6 base codes that identify each sample
- can distinguish diff samples, so multiple samples can run at same time
Why is Illumina bridge amplification necessary?
- Illumina sequencer not sensitive enough to measure signal from 1 DNA molecule
- so bridge amplification used to prod localised clusters of ≈1000 identical molecules on glass dide
How is Illumina bridge amplification carried out?
- after library prep, library diluted and denatured, ss library washed onto flow cell w/ lawn of oligonucleotides complementary to adaptor seqs
- indiv molecules hybridise to oligos on flow cell
- fragment strand then bends over, hybridising to complementary oligo on flow cell forming bridge
- pol used to create complementary copy of fragment strand
- original strand then washed away
- repeated to create cluster of identical molecules at discrete location
- next stage = seq fragments using fluorescently labelled nucleotides
Why is there an optimum length of fragments for bridge amplification?
- want 2 initial molecules to be far apart so don’t coalesce, so can be spread effectively
- done by washing over at low conc
- long arms mean could reach quicker and coalesce
How is Illumina sequencing by synthesis carried out?
- sequencing primer hybridised
- pol and 4 nts added
- fluorophores at each cluster read by lasers
- cleave fluorophore and unblock nt
- wash
- repeat (until achieve desired read length)
- index seq primer hybridised
Why are Illumina sequencers so expensive?
- optics, lasers and cameras req
What are the adv and disadv of Illumina seq by synthesis?
- enormous output
- accurate but not as accurate as Sanger
- all nts can be added simultaneously
- relatively slow
- short read lengths
- sample needs to be amplified –> PCR bias as doesn’t amplify GC rich seq efficiently
- ligation of adaptors biased –> ligases prefer certain seqs
What would the features of a better sequencing machine be?
- single molecule seq w/o amplification
- continuous reads (no stop starting)
- v long reads
- solid state e-s
- cheap
- small
What are some examples of 3rd gen seq?
- Helicos (now obsolete)
- Pacific Biosciences (PacBio)
- Oxford Nanopore
- NABsys
What are the advantages of PacBio?
- v long reads
- 1 mil reads
- v high accuracy
- shortest run time
- least GC bias (can easily seq through v high/low GC content)
- no amplification bias
- discover broad spectrum of DNA base mods
How is PacBio library prep carried out?
- prod genomic library of fragments
- fragment DNA
- repair DNA damage and ends A tailed
- ligate SMRTBell adaptors
- anneal seq primer to SMRTBell templates
- -> complementary, so adaptor partially ds and partially ss
- -> t overhand, complementary to A tail
- streptavidin tagged pol and primer bound to SMRTBell adaptor cloned insert
- sequence
Is PacBio library prep diff to other library preps?
- similar
- but genomic fragments ≈10kb and SMRTBell adaptors used
How is SMRT (single molecule real time) sequencing carried out?
- add diff fluorescent label to each type of nucleobase but attach it to terminal phosphate released during polymerisation
- measure fluorescence each time new base added, decays away when fluorescent tag released
- use zero mode waveguide chambers to improve detection from tiny signals
- read lengths prod of ≈500-3200 bases
How is Nanopore library gen?
- rapid
- DNA isolated using beads that bind DNA
- transposase complex used to simultaneously cut and ligate 1st adaptor
- then 2nd 1D adaptors and motor proteins added
- amplification nt req as device can read single molecules
- 2 diff types adaptor can be added
- 1D adaptors allow 1 strand to be seq
- 2D adaptors allow both strands to be seq
What does nanopore tech involve?
- protein nanopore
- -> heptameric protein α-hemolysin
- -> separated from bacteria allowing low cost and robust nanopores
- -> pore embedded into synthetic membrane w/ high electrical resistance
- -> 512 pores per minion cell
- synthetic polymer membrane
What occurs during 1D sequencing?
- DNA attached to pore and motor protein controls DNA translocation speed through pore
- 1 strand seq and other strand discarded
What occurs during 2D sequencing w/ hairpin adaptor?
- hairpin adaptor allows seq of both strands
- 1st strand seq then adaptor unwound and seq
- opp strand seq
- opp strand seq complementary to 1st strand, used to correct errors in seq to create ‘2 direction read’
- not same as paired end seq
What are the adv of Nanopore seq?
- no amplification –> decreases artifacts from PCR
- rapid
- long reads –> simplifies assembly
- solid state electronics –> cheaper and more reliable machines
- portable
- versatile –> can be changed to measure RNA, proteins or other compounds
What are the applications of NGS?
Research tools:
- de novo genome seq
- re seq genome and comparing to reference genome
- seq transcript
- methylation of DNA
- seq small RNAs
- protein binding sites
Clinical apps:
- diagnosis
- biomarkers
- prenatal testing
How can NGS be used as a research tool?
- seq genome of species
- cataloguing variation between individuals in species
- characterising differences between cells w/in individuals
- describing underlying cellular mechanisms
What is NGS now used in clinic for?
- personalised medicine