Sequencing DNA Flashcards

1
Q

primer

A

provides free 3’ OH for synthesis to begin from (5’-3’ synthesis - required free OH for addition of next nucleotide)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

sanger sequencing primer

A

need to know primer sequence to begin sanger sequencing

many organisms share primer sequences

can then infer the seqeunce after the primer

sometimes more difficult than this

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

deoxynucleotides

A

added to growing chain during in vivo DNA synthesis

a diphosphate (PPi) is released and the nucleotide is covalently added leaving a hydroxyl OH available for the next base to be added (further synthesis)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Terminating ddNT

A

lack 3’ OH
stop synthesis when they are added to growing chain

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

sanger dideoxy sequencing

A

supply mix of 3 deoxy nucleotides and one dideoxynucleotide
(only one of bases is dideoxy)

synthesis reaction will terminate when a dideoxynucleotide is incorporated
measaure length of terminated fragment and this tells us where the termination happened
this is not so useful as only gives the first time this base appears

instead supply all 4 dNT and one type of ddNT
radiolabel the NT

run fragments on gel

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

fluorescent label dideoxy sanger sequencing

A

radiolabelled ones not so good as radioactive substance

instead use 99% dNT and 1% differently fluorescently tagged (depending on base) ddNT

can then run on gel
measure length and colour of the terminated fragments
this tells us:
-where the terminations occured in the sequence
-what base was involved

can infer sequence from this

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

automating sanger dideoxy sequencing

A

run many reactions in parallel capillaries
automatically recording the fluorescence signal
parallel decoding of fliorescence signals

gives us a sequencing chromatogram
with the fluorescence signal at each location and its corresponding base in sequence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

sanger dideoxy error rates

A

use PCR to amplify DNA samples for sequencing (need millions of copies to produce detectable signal)
PCR can introduce errors (about 1 in 10e4)
-meaning occasionally a base is misincorporated 1 time in 10e4

base call quality - reported as the probability of the call being an error

10 - 1 in 10 - 90% base call accuracy
20 - 1 in 100 - 99%
30 - 1 in 1000 - 99.9%
40 - 1 in 10,000 - 99.99%
50 - 1 in 100,000 - 99.999%

sanger sequencing reads usually 300-100 bases in length of >Q30 base calls
- so there is a probability of base calls being an arror
-and it is slow, serial, expensive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

illumina sequencing

A

short read sequencing

is NGS
invention of NGS tech (mainly illumina) caused insane drop in cost per megabase of sequencing

similar to sanger - as in it uses terminators
BUT it is REVERSIBLE terminator sequencing

uses fluorescently labelled “reversibly-blocked” nucleotides
allows the sequence to be read one base at a time
-incorporate fluorescently labelled base
-read fluorescence signal (different depending on base)
-remove block on 3’OH
-remove fluorophore
-then incorporate the next fluorescently labelled nucleotude

commercialised by Illumina

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Illumina set up

A

sequencing takes place in many flow cells
doesnt require knowledge of a primer seqeunce
instead uses adapters and primers are used for those adapters

lawn of adapters stuck to surface of slide act as primers to amplify the DNA fragments
BRIDGE AMPLIFICATION
clusters grow clonally from same individual fragment
need to do this as need to make many copies of sequence so signal can be seen by illumina machine detectors

each cluster identified by physical location on the slide
sequencing is detected by order of colout of fluorescence the cluster
gain a sequence for each cluster

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

illumina benefits

A

dont need to know primer sequences

Illumina NovaSeq can generate up yo 3 terabases (3x 10^12) per run
up to 20 billion reads (2x 10^10) per run
150 bases per read each way - 300 total max length

average Q>=30

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Illumina drawbacks

A

other machines can produce longer reads - which are more useful for genome sequencing
so illumin not as good for that as them

errors occur
sometimes systematic - due to underlying properties of sample sequence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

long read technologies

A

2 main players:
-Pacific Biosciences single molecules long read - PacBio SMRT/ SEQUEL
-Oxford nanopore technologies (ONT). Synthetic nanopores and minION, PromethION instruments

promethion = multiple minION put totgether

difference - use fluorescently labelled dNT (no temrinators)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

PacBio SMRT / SEQUEL

A

single molecule sequencing using
FLUORESCENTLY LABELLED DEOXYNUCLEOTIDES

fluorescent label is on the PPi which is removed from dNT when incorporated into DNA chain

ssDNA input
dsDNA out

process of incorporating this fluorescently labelled dNT releases light which can be detected by the machine
-zero mode waveguide illumiation of the polymerase
-real time monitoring of nucleotide incorporation

DNA pol is fixed to bottom of the well
light of diff wavelength released for each base
can detect this peak and infer sequence from the order

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

outputs of the three types of sequencing so far

A

sanger - labelled bars
illumina - pictures
PacBio - movie - lots of data to analyse - requires powerful computer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

PacBio SMRT data

A

long - 1kb to 30kb - means of ~15kb

up to 30 Gbase per run

low quality - Q scores of 10-12 - error rate of 1/10 - 1/15
most errors are deletions, indels
but these can be corrected with HiFi
errors are mostly random with reference to underlying sequence so reads tend to correct for each other

17
Q

PacBio best uses

A

genome assembly
identifying duplications
identifying splice isoforms in mRNAs

18
Q

PacBio HiFi libraries

A

add adapters - no requirement of knowing sample sequence
adapters are in a loop at each end so give a circular molecule when added to end of dsFragment

Add polymersae
Fixed polymerase runs around the circle many times to produce many copies

19
Q

Oxford nanopore technology ONT

A

use real cells as starting point
cells have pores allowing trafficking of molecules in and out of cells

can use pores to traffic DNA molecules through
electrical signal will change depending on sequence of the DNA molecule

run current through pore
current changes depending on properties of the molecule passing through pore (size, charge)
different bases have diff size/charge

single membrane protein nanopore is embedded in a synthetic membrane
current passed through pore
use deviations in current to infer sequence
-single DNA molecule passes through pore occluding current flow
-current is affected differently depending on the sequence og about 6-7 DNA bases in the ssDNA
-the sequence is read by modelling the curent useing neural network computing

20
Q

ONT nanopore sequencing types

A

minION
-one cell
-500 pores
handheld sequencer

promethion
-up to 48 cells
-3000 pores per cell
production sequencer

21
Q

ONT data quality

A

low quality raw data
BUT errors also random so multiple copies correct for each other

long depending on input
means of about 50kb
longest read so far >2Mb

upt to 50 Gbase (minION) and 200Gbase (promethion) per run

low quality raw data - Q11-12 (1/10 - 1/15 error rate)(most errors are insertions and deletions, indels)

22
Q

ONT nanopore sequencing best uses

A

same as PacBio
Genome assembly
Identifying duplications
identifying splice isoforms in mRNAs (can directly sequence RNA different to PacBio)

23
Q

Genome assembly

A

telomere to telomere
sequence starts at one telomere and goes throug to the other at other end of chromosome
^^ideal genome assembly

24
Q

problems in getting T to T

A

rDNA
centromeric satellites
Censat and SDs
SDs (segmental duplications)
RepMask

repetitive units i guess
collapse in assembly to one region??

long read sequencing (pacbio, ont) helped to get T to T of all autosomes and x chromosome
~2022

25
Q

trouble with sequencing Y chromosome

A

it is v repetitive
only got around mid 2023
Y chromosome so full of repeats that they can only be resolved by vv long reads (cant line up sequences to assemble genome order from fragments

26
Q

shotgun sequencing

A

genome shattered into many many pieces
each piece is sequenced from both ends

the sequence reads overlapped and aligned to each other

eg illumina sequencing
only get the shorter reads ~150bp in each direction

27
Q

problem with short reads

A

since they are short then they can be present multiple times within the sequence
dont know the whole sequence so cant know where they go
OR how many times each read comes up

assembled in wrong order

long read sequencing allows resolving of repeats
can know where fragments lie - line them up
also more likely
overlaps are found easier too?

short reads can end up not overlapping with other ones - end up with fragmented assembly

28
Q

unsolved problems - genome assembly

A

challenging due to biological issues
repeats
heterozygosity
large genomes

and technical issues
large datasets

small genomes (eg many bacteria) are easy to assemble
but many plants and animals are not (particularly polyploids)

29
Q
A