Sequencing DNA Flashcards by Rory B

primer

provides free 3’ OH for synthesis to begin from (5’-3’ synthesis - required free OH for addition of next nucleotide)

How well did you know this?

Not at all

Perfectly

sanger sequencing primer

need to know primer sequence to begin sanger sequencing

many organisms share primer sequences

can then infer the seqeunce after the primer

sometimes more difficult than this

How well did you know this?

Not at all

Perfectly

deoxynucleotides

added to growing chain during in vivo DNA synthesis

a diphosphate (PPi) is released and the nucleotide is covalently added leaving a hydroxyl OH available for the next base to be added (further synthesis)

How well did you know this?

Not at all

Perfectly

Terminating ddNT

lack 3’ OH
stop synthesis when they are added to growing chain

How well did you know this?

Not at all

Perfectly

sanger dideoxy sequencing

supply mix of 3 deoxy nucleotides and one dideoxynucleotide
(only one of bases is dideoxy)

synthesis reaction will terminate when a dideoxynucleotide is incorporated
measaure length of terminated fragment and this tells us where the termination happened
this is not so useful as only gives the first time this base appears

instead supply all 4 dNT and one type of ddNT
radiolabel the NT

run fragments on gel

How well did you know this?

Not at all

Perfectly

fluorescent label dideoxy sanger sequencing

radiolabelled ones not so good as radioactive substance

instead use 99% dNT and 1% differently fluorescently tagged (depending on base) ddNT

can then run on gel
measure length and colour of the terminated fragments
this tells us:
-where the terminations occured in the sequence
-what base was involved

can infer sequence from this

How well did you know this?

Not at all

Perfectly

automating sanger dideoxy sequencing

run many reactions in parallel capillaries
automatically recording the fluorescence signal
parallel decoding of fliorescence signals

gives us a sequencing chromatogram
with the fluorescence signal at each location and its corresponding base in sequence

How well did you know this?

Not at all

Perfectly

sanger dideoxy error rates

use PCR to amplify DNA samples for sequencing (need millions of copies to produce detectable signal)
PCR can introduce errors (about 1 in 10e4)
-meaning occasionally a base is misincorporated 1 time in 10e4

base call quality - reported as the probability of the call being an error

10 - 1 in 10 - 90% base call accuracy
20 - 1 in 100 - 99%
30 - 1 in 1000 - 99.9%
40 - 1 in 10,000 - 99.99%
50 - 1 in 100,000 - 99.999%

sanger sequencing reads usually 300-100 bases in length of >Q30 base calls
- so there is a probability of base calls being an arror
-and it is slow, serial, expensive

How well did you know this?

Not at all

Perfectly

illumina sequencing

short read sequencing

is NGS
invention of NGS tech (mainly illumina) caused insane drop in cost per megabase of sequencing

similar to sanger - as in it uses terminators
BUT it is REVERSIBLE terminator sequencing

uses fluorescently labelled “reversibly-blocked” nucleotides
allows the sequence to be read one base at a time
-incorporate fluorescently labelled base
-read fluorescence signal (different depending on base)
-remove block on 3’OH
-remove fluorophore
-then incorporate the next fluorescently labelled nucleotude

commercialised by Illumina

How well did you know this?

Not at all

Perfectly

Illumina set up

sequencing takes place in many flow cells
doesnt require knowledge of a primer seqeunce
instead uses adapters and primers are used for those adapters

lawn of adapters stuck to surface of slide act as primers to amplify the DNA fragments
BRIDGE AMPLIFICATION
clusters grow clonally from same individual fragment
need to do this as need to make many copies of sequence so signal can be seen by illumina machine detectors

each cluster identified by physical location on the slide
sequencing is detected by order of colout of fluorescence the cluster
gain a sequence for each cluster

How well did you know this?

Not at all

Perfectly

illumina benefits

dont need to know primer sequences

Illumina NovaSeq can generate up yo 3 terabases (3x 10^12) per run
up to 20 billion reads (2x 10^10) per run
150 bases per read each way - 300 total max length

average Q>=30

How well did you know this?

Not at all

Perfectly

Illumina drawbacks

other machines can produce longer reads - which are more useful for genome sequencing
so illumin not as good for that as them

errors occur
sometimes systematic - due to underlying properties of sample sequence

How well did you know this?

Not at all

Perfectly

long read technologies

2 main players:
-Pacific Biosciences single molecules long read - PacBio SMRT/ SEQUEL
-Oxford nanopore technologies (ONT). Synthetic nanopores and minION, PromethION instruments

promethion = multiple minION put totgether

difference - use fluorescently labelled dNT (no temrinators)

How well did you know this?

Not at all

Perfectly

PacBio SMRT / SEQUEL

single molecule sequencing using
FLUORESCENTLY LABELLED DEOXYNUCLEOTIDES

fluorescent label is on the PPi which is removed from dNT when incorporated into DNA chain

ssDNA input
dsDNA out

process of incorporating this fluorescently labelled dNT releases light which can be detected by the machine
-zero mode waveguide illumiation of the polymerase
-real time monitoring of nucleotide incorporation

DNA pol is fixed to bottom of the well
light of diff wavelength released for each base
can detect this peak and infer sequence from the order

How well did you know this?

Not at all

Perfectly

outputs of the three types of sequencing so far

sanger - labelled bars
illumina - pictures
PacBio - movie - lots of data to analyse - requires powerful computer

How well did you know this?

Not at all

Perfectly

PacBio SMRT data

Study These Flashcards

long - 1kb to 30kb - means of ~15kb

up to 30 Gbase per run

low quality - Q scores of 10-12 - error rate of 1/10 - 1/15
most errors are deletions, indels
but these can be corrected with HiFi
errors are mostly random with reference to underlying sequence so reads tend to correct for each other

PacBio best uses

Study These Flashcards

genome assembly
identifying duplications
identifying splice isoforms in mRNAs

PacBio HiFi libraries

Study These Flashcards

add adapters - no requirement of knowing sample sequence
adapters are in a loop at each end so give a circular molecule when added to end of dsFragment

Add polymersae
Fixed polymerase runs around the circle many times to produce many copies

Oxford nanopore technology ONT

Study These Flashcards

use real cells as starting point
cells have pores allowing trafficking of molecules in and out of cells

can use pores to traffic DNA molecules through
electrical signal will change depending on sequence of the DNA molecule

run current through pore
current changes depending on properties of the molecule passing through pore (size, charge)
different bases have diff size/charge

single membrane protein nanopore is embedded in a synthetic membrane
current passed through pore
use deviations in current to infer sequence
-single DNA molecule passes through pore occluding current flow
-current is affected differently depending on the sequence og about 6-7 DNA bases in the ssDNA
-the sequence is read by modelling the curent useing neural network computing

ONT nanopore sequencing types

Study These Flashcards

minION
-one cell
-500 pores
handheld sequencer

promethion
-up to 48 cells
-3000 pores per cell
production sequencer

ONT data quality

Study These Flashcards

low quality raw data
BUT errors also random so multiple copies correct for each other

long depending on input
means of about 50kb
longest read so far >2Mb

upt to 50 Gbase (minION) and 200Gbase (promethion) per run

low quality raw data - Q11-12 (1/10 - 1/15 error rate)(most errors are insertions and deletions, indels)

ONT nanopore sequencing best uses

Study These Flashcards

same as PacBio
Genome assembly
Identifying duplications
identifying splice isoforms in mRNAs (can directly sequence RNA different to PacBio)

Genome assembly

Study These Flashcards

telomere to telomere
sequence starts at one telomere and goes throug to the other at other end of chromosome
^^ideal genome assembly

problems in getting T to T

Study These Flashcards

rDNA
centromeric satellites
Censat and SDs
SDs (segmental duplications)
RepMask

repetitive units i guess
collapse in assembly to one region??

long read sequencing (pacbio, ont) helped to get T to T of all autosomes and x chromosome
~2022

trouble with sequencing Y chromosome

it is v repetitive only got around mid 2023 Y chromosome so full of repeats that they can only be resolved by vv long reads (cant line up sequences to assemble genome order from fragments

shotgun sequencing

genome shattered into many many pieces each piece is sequenced from both ends the sequence reads overlapped and aligned to each other eg illumina sequencing only get the shorter reads ~150bp in each direction

problem with short reads

since they are short then they can be present multiple times within the sequence dont know the whole sequence so cant know where they go OR how many times each read comes up assembled in wrong order long read sequencing allows resolving of repeats can know where fragments lie - line them up also more likely overlaps are found easier too? short reads can end up not overlapping with other ones - end up with fragmented assembly

unsolved problems - genome assembly

challenging due to biological issues repeats heterozygosity large genomes and technical issues large datasets small genomes (eg many bacteria) are easy to assemble but many plants and animals are not (particularly polyploids)

Sequencing DNA Flashcards

(29 cards)