Sorefan Flashcards

1
Q

What is whole genome sequencing?

A
  • complete genome seq of organism at single time

- inc seq of chromosomal DNA and mito/chloro etc DNA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the challenges for genome sequencing?

A
  • NA extraction from cells –> needs high quality and conc
  • fragmentation
  • sub-fractionation size selection –> to isolate fragments of correct size
  • separating indiv molecules
  • amplification of signal
  • reading signal
  • data analysis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What were the 3 phases of human genome project?

A
  • genetic and physical maps of human and mouse, seq yeast and worm
  • -> technology dev
  • draft seq –> inc many gaps and errors
  • finished seq –> fill in gaps and correcting errors
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How are genetic maps made?

A
  • analyse genetic distance between genes by measuring recombination freq
  • markers rely on variation of seq between parents and individuals
  • distance measured in centimorgans
  • mostly PCR based, eg. polymorphisms in genes and DNA markers
  • linkage map by looking at relative distances of 2 or more polymorphic genes and measuring RFs
  • DNA markers superseded phenotypic markers
  • DNA based mol markers could be RFLPs
  • -> methods to analyse are slow so moved onto using SSLPs as easy to analyse w/ PCR
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are SSLPs?

A
  • simple seq length polymophisms
  • repeat regions in genome that vary in length between pops
  • usually mini and microsatellite seqs
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are minisatellites?

A
  • repeat units up to 25bp
  • not spread evenly around genome, mostly at telomeric regions
  • several kb long
  • difficult to PCR
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are microsatellites?

A
  • usually di or trinucleotide repeats
  • few 100 bases long
  • easy to PCR
  • 650,000 in genome
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Why are genetic maps in humans limited?

A
  • large pops of siblings don’t exist, so limited no. recombination events to study
  • recombination events not at random genome positions –. recombination hotspots
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How are physical maps created?

A
  • restriction mapping locates relative positions on DNA molecule of recognition seqs for for REs
  • FISH = map marker locations by hybridising probe containing marker to intact chromosomes
  • STS = map positions of short seqs by PCR
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the advantages of creating BAC libraries from indiv chromosomes?

A
  • BAC clone library can be used to seq genome

- BACs w/ inserts from each chromosome could be shared across consortium

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How is genome sequencing carried out clone by clone?

A
  • extract DNA
  • fragment DNA
  • -> ideally completely random so no parts missed out
  • -> by physical methods = sonication, hydrodynamic shearing, restriction enzymes and transposase
  • -> by chemical methods (mostly used to fragment RNA) = heat and divalent cation (Zn and Mg)
  • size selection –> gel electrophoesis
  • clone 100-200kbp fragments into BAC plasmids to create library
  • transformation of bacteria for BACs
  • pick indiv colonies and extract vector (each tube has many copies of indiv DNA insert)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How are clones positioned on genetic and physical maps?

A
  • test clones for PCR markers w/ known locations
  • BAC end sequencing using Sanger
  • -> known seq so can design primer
  • -> denature vector and Sanger seq
  • -> design primer to reverse strand to seq other direction
  • -> end seqs from same insert, so are paired end read
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Why are paired end read useful?

A
  • can physically link 1 end of seq w/ another, so can be used to resolve seq gaps
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How is it decided which BAC has insert next to insert of interest?

A
  • gen contiguous set of clones
  • if any of BACs inc end seq, then insert they contain must be next to it
  • test BAC library for end seq from desired vector by PCR
  • repeated over and over again until all BACs placed in order on each chromosome
  • created contig
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Why was shotgun seq of BAC clones needed?

A
  • as BAC end seq leaves most of middle of genome insert to seq
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How was shotgun seq BAC clones carried out?

A
  • each BAC clone broken up into 5-10kb fragments
  • cloned into diff vector that accepts smaller inserts
  • if seq lots of paired end seqs can assemble large fragment (=consensus seq)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How did Celera seq human genome?

A
  • fragmented genome into 2-50kbp fragments
  • cloned 2, 10 and 50kbp fragments into plasmids to create library
  • assemble reads to create consensus seq and seq contigs
  • draft genome had 98% bases
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Why did the IHGP use clone by clone instead of whole genome shotgun seq?

A
  • to prove feasible for complex repeat rich genome
  • assembly easier and could be performed confidently
  • could target gaps for finishing
  • better suited to diverse international consortium
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What needed to be done to finish the human genome?

A
  • fill in sequencing gaps and physical gaps
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Why were gaps present in human genome, and how could these problems be solved?

A
  • cloning bias
  • no restriction sites –> use diff RE, use physical or chem fragmentation method
  • insert unstable –> use diff vector
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

How were seq gaps closed?

A
  • paired end seqs align to either side of gap
  • if gap < 1kbp = PCR across gap
  • if gap > 1kbp = sequential seq along insert
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

How were physical gaps closed if know order of scaffolds?

A
  • if gap region absent from all gene libraries
  • PCR used to amplify genomic DNA spanning gaps and amplified DNA seq directly w/ or w/o cloning into vector
  • PCR products over 3kbp hard to amplify
23
Q

How were physical gaps closed if don’t know order of scaffolds?

A

How do we know which pairs of primers are adj and will give product?

  • try every poss and look for PCR reaction products using gDNA as template
  • in singleplex PCR reaction each combo of primers tested w/ genomic DNA as template
  • process sped up w/ multiplex PCR, as multiple pairs of primers tested in single PCR tube, so fewer reactions need to be performed
  • use algorithm to decide min no. primer combos
24
Q

Where are repetitive seqs found in genome?

A
  • approx 45% of genome
  • mini and microsatellites
  • centromeres
  • telomeres
  • transposons
  • duplicated genes
25
Q

What are the problems w/ repetitive seqs?

A
  • hard to assemble and usually resolved last
  • many poor quality never get these regions resolved
  • better if had tech to seq v long fragments that span repetitive seq
26
Q

How can repetitive seqs cause rearrangements?

A
  • if break up seq and reassemble can get shorter as don’t know original order
27
Q

How can repetitive seqs confuse computer, and how can this be solved?

A
  • tandem repeats can cause truncations –> computer can’t be sure where to map reads to
  • worse if repetitive seqs on diff chromosomes
  • seq across whole repetitive region to identify flanking seqs that are unique (IGHL couldn’t do this as not poss at time)
  • anchor flanking seq regions to known positions of genome –> need genetic and physical maps
28
Q

What factors influence the species chosen for genome projects?

A
  • genetic model
  • commercial/medical relevance
  • genome size –> small easy, so bacteria most common
29
Q

Why was NGS needed?

A
  • Sanger and Celera slow, expensive, cloning bias, low coverage and 1 seq at time
30
Q

How was NGS initially dev?

A
  • 1st was 454 sequencing dev by Roche
  • Applied biosystems dev competing tech called SOLiD
  • both replaced by Solexa
31
Q

How does NGS compare to Sanger seq?

A
  • NGS involves fragmenting genome and seq all fragments or in parallel
  • no plasmid cloning req and cheap
  • NGS still has similar challenges –> 1 key challenge is how to separate indiv seqs as amplify insert so machine can measure signal
32
Q

How is Illumina library prep carried out?

A
  • need to create library of gDNA inserts flanked by adaptors of known seq
  • fragment gDNA using physical/enzymic/chemical methods
  • size selected on gel
  • end repair DNA so has blunt ends
  • -> DNA pol I fills in 5’ overhang
  • -> exonuclease removes 3’ overhang
  • add A tail so DNA ends not compatible and DNA cant concatermerise
  • adaptors ligated to ends of fragments using DNA ligase (adaptors are ds oligonucleotides w/ T overhang)
  • adaptor seqs elongated by PCR using extended oligonucleotide primers that inc unique 6 base index seq
  • allows amplification and quantification of library
33
Q

What are the functions of adaptor seqs?

A
  • provide priming sites for PCR amplification
  • allow index seqs to be added
  • priming sites for bridge amplification
  • priming sites for seq
34
Q

What are indexes in Illumina indexed libraries, and why are these included?

A
  • unique 6 base codes that identify each sample

- can distinguish diff samples, so multiple samples can run at same time

35
Q

Why is Illumina bridge amplification necessary?

A
  • Illumina sequencer not sensitive enough to measure signal from 1 DNA molecule
  • so bridge amplification used to prod localised clusters of ≈1000 identical molecules on glass dide
36
Q

How is Illumina bridge amplification carried out?

A
  • after library prep, library diluted and denatured, ss library washed onto flow cell w/ lawn of oligonucleotides complementary to adaptor seqs
  • indiv molecules hybridise to oligos on flow cell
  • fragment strand then bends over, hybridising to complementary oligo on flow cell forming bridge
  • pol used to create complementary copy of fragment strand
  • original strand then washed away
  • repeated to create cluster of identical molecules at discrete location
  • next stage = seq fragments using fluorescently labelled nucleotides
37
Q

Why is there an optimum length of fragments for bridge amplification?

A
  • want 2 initial molecules to be far apart so don’t coalesce, so can be spread effectively
  • done by washing over at low conc
  • long arms mean could reach quicker and coalesce
38
Q

How is Illumina sequencing by synthesis carried out?

A
  • sequencing primer hybridised
  • pol and 4 nts added
  • fluorophores at each cluster read by lasers
  • cleave fluorophore and unblock nt
  • wash
  • repeat (until achieve desired read length)
  • index seq primer hybridised
39
Q

Why are Illumina sequencers so expensive?

A
  • optics, lasers and cameras req
40
Q

What are the adv and disadv of Illumina seq by synthesis?

A
  • enormous output
  • accurate but not as accurate as Sanger
  • all nts can be added simultaneously
  • relatively slow
  • short read lengths
  • sample needs to be amplified –> PCR bias as doesn’t amplify GC rich seq efficiently
  • ligation of adaptors biased –> ligases prefer certain seqs
41
Q

What would the features of a better sequencing machine be?

A
  • single molecule seq w/o amplification
  • continuous reads (no stop starting)
  • v long reads
  • solid state e-s
  • cheap
  • small
42
Q

What are some examples of 3rd gen seq?

A
  • Helicos (now obsolete)
  • Pacific Biosciences (PacBio)
  • Oxford Nanopore
  • NABsys
43
Q

What are the advantages of PacBio?

A
  • v long reads
  • 1 mil reads
  • v high accuracy
  • shortest run time
  • least GC bias (can easily seq through v high/low GC content)
  • no amplification bias
  • discover broad spectrum of DNA base mods
44
Q

How is PacBio library prep carried out?

A
  • prod genomic library of fragments
  • fragment DNA
  • repair DNA damage and ends A tailed
  • ligate SMRTBell adaptors
  • anneal seq primer to SMRTBell templates
  • -> complementary, so adaptor partially ds and partially ss
  • -> t overhand, complementary to A tail
  • streptavidin tagged pol and primer bound to SMRTBell adaptor cloned insert
  • sequence
45
Q

Is PacBio library prep diff to other library preps?

A
  • similar

- but genomic fragments ≈10kb and SMRTBell adaptors used

46
Q

How is SMRT (single molecule real time) sequencing carried out?

A
  • add diff fluorescent label to each type of nucleobase but attach it to terminal phosphate released during polymerisation
  • measure fluorescence each time new base added, decays away when fluorescent tag released
  • use zero mode waveguide chambers to improve detection from tiny signals
  • read lengths prod of ≈500-3200 bases
47
Q

How is Nanopore library gen?

A
  • rapid
  • DNA isolated using beads that bind DNA
  • transposase complex used to simultaneously cut and ligate 1st adaptor
  • then 2nd 1D adaptors and motor proteins added
  • amplification nt req as device can read single molecules
  • 2 diff types adaptor can be added
  • 1D adaptors allow 1 strand to be seq
  • 2D adaptors allow both strands to be seq
48
Q

What does nanopore tech involve?

A
  • protein nanopore
  • -> heptameric protein α-hemolysin
  • -> separated from bacteria allowing low cost and robust nanopores
  • -> pore embedded into synthetic membrane w/ high electrical resistance
  • -> 512 pores per minion cell
  • synthetic polymer membrane
49
Q

What occurs during 1D sequencing?

A
  • DNA attached to pore and motor protein controls DNA translocation speed through pore
  • 1 strand seq and other strand discarded
50
Q

What occurs during 2D sequencing w/ hairpin adaptor?

A
  • hairpin adaptor allows seq of both strands
  • 1st strand seq then adaptor unwound and seq
  • opp strand seq
  • opp strand seq complementary to 1st strand, used to correct errors in seq to create ‘2 direction read’
  • not same as paired end seq
51
Q

What are the adv of Nanopore seq?

A
  • no amplification –> decreases artifacts from PCR
  • rapid
  • long reads –> simplifies assembly
  • solid state electronics –> cheaper and more reliable machines
  • portable
  • versatile –> can be changed to measure RNA, proteins or other compounds
52
Q

What are the applications of NGS?

A

Research tools:

  • de novo genome seq
  • re seq genome and comparing to reference genome
  • seq transcript
  • methylation of DNA
  • seq small RNAs
  • protein binding sites

Clinical apps:

  • diagnosis
  • biomarkers
  • prenatal testing
53
Q

How can NGS be used as a research tool?

A
  • seq genome of species
  • cataloguing variation between individuals in species
  • characterising differences between cells w/in individuals
  • describing underlying cellular mechanisms
54
Q

What is NGS now used in clinic for?

A
  • personalised medicine