3.1 1st + 2nd Gen sequencing Flashcards

1
Q

Describe the main steps to Sanger Sequencing

A
  1. PCR amplify with dNTPs and ddNTPs that terminate when added
  2. Run a gel for 4 lanes; 1 per ddNTP
  3. Read gel to determine identity of ddNTP and thus sequence (bases counted from bottom up)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Describe how sanger sequencing was scaled

A
  1. each ddNTP had a fluorescent dye incorporated. Thus samples could be pooled and run in a single lane
  2. switch to capillary gel which is scanned with a laser to detect fluorescent nt
  3. development of more advanced instruments to accomplish this
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are some of the limits of sanger sequencing?

A
  1. Heterogeneity in DNA sequences are hard to resolve cause the signal/confidence to degrade
  2. Hard to resolve indels
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a phred score?

A

Is a quality measure for each base were each base is assigned an accuracy probability.
It is equal to = -10log10(P)
(negative log 10 transformed probability)
P = probability base is incorrect

Determine empirically by sequencing DNA of know sequences 1000s times

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Why are reference genomes important/what do they allow?

A
  1. enabled studies of genetic variation and genomic function
  2. gives framework for the development of tools for genome analysis
    ex: functional genomics, oligonucleotide arrays
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What us the minimum base quality in the human reference genome?

A

Phred scored of 20

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are some limitations of capillary analogy sequence reads?

A

Need to manually review heterozygote positions because technology sequences pools of fragments containing both heterozygotic variation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the big changes from 1st gen to 2nd gen?

A
  1. move from analogue to digital sequencing
  2. individual sequences are clonally amplified
  3. Sequence by synthesis approach from single strands (~1000)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the 3 methods developed for clonal amplification? Name an example for each

A
  1. Oil/aqueous emulsion
    ex: 454, ion torrent
  2. Solid surface - microfluidic slides
    ex: Illumina
  3. RCA
    ex: Complete Genomics
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What has the decrease in price in genome sequencing allowed?

A

emergence of new fields like epigenomics, metagenomics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is an application of genomics to covid?

A

can monitor the genomic evolution of sars-cov2 and predict regions that are more mutation prone/selected for

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what are the steps in 2nd gen sequencer library construction?

A
  1. genome fragments are generated by shearing
  2. adaptors ligated to fragment ends
  3. additional adaptors of known sequenced added to generate clonal copies`
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Describe the steps in oil/aqu. emulsion clonal amplification

A
  1. Emulsion contains aqueous phase micelles that have beads inside and act as rxn chambers
  2. Bead surface has oligos that hybridize to adaptors connected to DNA sequences (priming event)
  3. Copy of sequence made by PCR.
  4. Solution is heated to release amplified strand and new strand binds to oligo (repeating amplification
  5. All fragments remain in micelle. End up with bead covered in identical fragments
  6. Beads are purified, made single stranded, and put into a flow cell (1 bead/well)
  7. Sequence by synthesis done. Bases flowed over wells base by base and read off based on the light signal they give off when incorporated (pyrosequencing)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

T/F In emulsion a whole pool of fragments are sequenced

A

F. Only 1 DNA fragment taken from the library pool is sequenced

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Why are clonal copies needed in 2nd gen?

A

Clonal copies needed or else the light signal is too dim

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How does the Ion Torrent differ from the GS-FLX 454

A

Also based on oil/aqu. emulsion

Instead of detecting light, it detects changes in pH (H+ release) when nt are added during sequence by synthesis

Non-imaging based

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are the limitations of emulsion based platforms?

A
  1. relies on Poisson distr. to achieve 1:1 bead : sequence ratio
  2. single detection light/H+ significantly increases indel errors (Can’t resolve homopolymeric repeats)
  3. requires more DNA than other clonal strategies as lots of material lost during PCR rxn
18
Q

How does RCA work?

A
  1. DNA molecules with adaptors ligated into a circular template
  2. Phi enzyme amplifies around the circle creating connected clonal copies
  3. DNA balls hybridized to slide
  4. Primers added and sequence read by sequence by synthesis based on light signal
19
Q

Describe the general steps in bridge amplication

A
  1. DNA fragments created & adaptors ligated & made single stranded
  2. These hybridize to flow cells covered in oligos that recognize the adaptors (priming event)
  3. Primers added and copies made
  4. cycle is repeated
  5. Complementary strand removed and reverse strand sequenced
  6. Pattern of nt incorporation read
20
Q

What are some advantages of thee glass slide microfluidic approach?

A
  • generate clonal copies right on the slide
  • no issues with poisson distribution
  • can deal with polymeric repeats because of a reversible terminating group
21
Q

How does Sequence by Synthesis work?

A
  1. All bases are added at the same time
  2. After 1st base incorporated, rest are flushed away
  3. Base is excited and signal detected to know what base by taking a picture
22
Q

Describe the steps in paired end read chem

A
  1. 2 groups of oligos (blue & purple) that are cleavable are grafted to flow cell surface
  2. 1st cluster of strands created on flow cell via bridge amplification
  3. Uracil added allows 1 end cleaved (blue) Reverse strands cleaved & wash off leaving the forward strand
  4. 3’ end blocked using phosphorylation to prevent unwanted priming
  5. Primers added & strand sequenced from 1 end.

(Read products wash away & index1 read primer hybridized to template & read generated + washed)

  1. 3’ ends of template dephosphorylated

(Index 2 read)

  1. Double stranded bridge regenerated and then linearized + 3’ end blocked
  2. Forward strand washed away & reverse strand sequenced
23
Q

What is sequence indexing “barcoding”

A
  • a barcode is included in adaptors and allows for libraries to be pooled together since each sample as unique barcode
  • these barcodes are sequenced to separate samples
24
Q

What are the 3 main analysis steps for Illumina

A
  1. Base calling
  2. Reference Alignment/Assembly
  3. Application specific analysis
25
Q

What are the steps in Base calling

A

images –> .bcl files –> fastq files

26
Q

How is 2D position used in base calling

A

XY coordinates determine unique sequence read naming structure

27
Q

What is (pre)phasing

A

small number of mc in each cluster run ahead (prephasing) or fall behind (phasing) in the current incorporation cycle

It is avoiding using Base call corrections

28
Q

What is Chastity?

A
  • way of controlling for polyclonal clusters
  • (brightest intensity/(brightest + 2nd brightest intensity) >/= 0.6
  • chastity score of 1 = pass
  • if higher than 0.6 considered polyclonal & chastity fails
29
Q

How are phred scores stored

A

2 bit phred scores encoded using ASCII which compress 2 characters into a single character

ex: 9 is a glyph for 57
57 -33 = base quality 24

30
Q

What base are fastQ files encoded in and why?

A
  • base 33

- chosen because 32 and above are ASC11 characters in a keyboard

31
Q

what is a FASTA file?

A

contains sequence data

32
Q

what is a FASTQf file?

A

reports sequences and their base quality

33
Q

What are the 4 lines per sequence in a fastq file?

A
  1. @character followed by sequence identifier + optional description
  2. Raw sequence letters
  3. starts w ‘+’ followed by optional indentifier
  4. glphs encoding phred schore
34
Q

T/F Read 1 and 2 (or 3 if indexing) are generated on the same fast1 file

A

F

generated on independent fastq files

35
Q

how are paired sequences associated?

A

by read name

36
Q

What is FastQC?

A
  • program that returns variety of quality metrics
  • assumes and bases metrics off whole genome experiment

ex 1: per base sequence quality: how phred scores changes w read length

ex 2: per sequence GC content which plots normal distr of expected GC compared to actual

37
Q

Why might GC content of reads be skewed?

A
  • reads are skewed & 1 region enriched
  • genes enriched for GC content
  • if genomic GC skewed might indicated a problem occurred
38
Q

What are 4 characteristics of 3rd gen sequencing platforms?

A
  1. true single mc sequencing - no clonal amplification ***
  2. very long read lengths possible (can decouple library construction from sequencing)
  3. can detect base modifications
  4. High error rates (in bases & modifications)
39
Q

Name 2 3rd gen platforms

A
  1. Pacific Biosciences

2. Oxford nanopore technologies

40
Q

What are some limitations to base detection with the pacific biosciences approach?

A
  1. background noise form unincorporated bases
  2. fluorescent dyes impact synthesis thus lower fidelity
  3. no consensus call
  4. quality degradation b/c read on modified base is different in the pause
41
Q

What is one approach to get around low quality in the Pacific Biosciences systems?

A

Circular consensus sequencing, same base called over again increase measurements for that position and hence increase quality

42
Q

How are bases called in the the Nanopore system?

A
  • Uses machine learning strategies and large training sets (known sequence).
  • Teach the software patterns in differences in current to determine the sequence since same series of bases give same current change