Week 5 (Long Read and Element) Flashcards

1
Q

ability to resolve a _________ structure is dependent on the length of the molecules in your library

A

repetitive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

segmental duplications

A

“low copy repeats” blocks that range from 1 to 400 kb in length, occur at more than one site within the genome, and typically share a high level (>90%) of the sequence identity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

segmental duplications make up about ____% of the human genome

A

5

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

long read technology

A
  • Oxford nanopore (ONT) (protein nanopores)
  • Pacific BioSciences - PacBio (SMRT)
  • proximity ligation (assembly)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

ONT

A

Oxford nanopore (protein nanopores)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

SMRT

A

single molecule real time sequencing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

__________ is a heptameric protein pore with an inner diameter of a few nanometers

A

a-hemolysin

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

the diameter of a-hemolysin is the same scale as many single molecule, including DNA. Why?

A

so that DNA can be extruded from the membrane

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

ONT (protein nanopore) can be used real time in the field. 10-20 Gb are read in less than _______

A

24 hours (standard is 72 hours)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

where is a-hemolysin derived from?

A

it was discovered in staph, the pathogenic organism uses this protein ore to penetrate cells in the body

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

how is DNA extruded from the cell using protein nanopore?

A

the pore is in the membrane, there is a tether that holds the DNA on the pore and a motor protein allows the DNA to move through the pose

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How do we read the bases as the exit the protein pore?

A

as the DNA goes through the pore, each base has its own structure that will disrupt the charge in a base specific way (ion current), so we can estimate what is coming out based on the change in charge

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

using protein pores, _____ bases are read per second

A

400

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

long read sequencers are important for resolving ________ sequences

A

repeat

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

____ Mb is the largest long read that has been read (the largest read is the largest chromosome)

A

4.2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

selective sequencing

A

the protein nanopore is able to chose only the sequences that we are interested in, it will reject and eject the molecule if it has seen it already and then restart with a new sequence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

why is selective sequencing a really great tool?

A

it will save time and resources

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

what can protein nanopores read in one read?

A
  • bases sequenced
  • bases inserted
  • bases deleted
  • SNVs
  • CpG methylations
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

centromeres are found in the ________ of the chromosome

A

middle

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

telomere are found on the _____ of the chromosome

A

end

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

what is the difference between illumina sequencing and ONT (protein pore)’s average read length?

A
  • illumina = 150 bp
  • ONT = 33-35 kb
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

what is a major benefit of ONT (protein nanopore)?

A

read length

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

what is the average read length of ONT (protein nanopore)?

A

33-35 kb

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

in ONT, about _____ bp/sec

A

400

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

what type of error occurs in ONT (protein nanopore)? what is the accuracy?

A
  • homopolymers
  • accuracy: >99%
26
Q

homopolymers

A

types of repeats

27
Q

SINEs: _____ bp
LINES: _____ bp

A
  • SINEs: 500 bp
  • LINES: 5,000 bp
28
Q

What type of technology do LINEs use to get read, otherwise they will be mostly inaccurate?

A

LINEs will be read by long reads, short reads would mostly be inaccurate

29
Q

ONT resolves the problem of ___________ __________ while illumina does not due to short reading

A

segmental duplications

30
Q

nano-wells called zero-mode wave guides (ZMWs)

A

detecting wavelength, physically restricting the light when excited with a laser

31
Q

where is the fluor in PacBio to read the sequence? what is its formal name?

A

on the phosphate of the nucleotide (phospholinked nucleotide)

32
Q

the mean read lengths of pacbio are >_____ kb

33
Q

what makes pacbio really accurate?

A

CCS / high fidelity system, it is able to go over the strand multiple times

34
Q

would you rather have systematic error or random error?

A

random error is better than systematic because you can
overcome random error with depth of sequencing
(rerunning it) than systematic error that will not be
overcome

35
Q

what is a Q value?

A

probability of error per base

36
Q

PacBio reads ~____bp/sec

37
Q

what are common errors in PacBio?

A

homopolymers and indels

38
Q

what is a homopolymer?

A

same sequence repeated over and over again

39
Q

what is an indel?

A

(insertions and deletions) - a genetic mutation that occurs when one or more DNA bases are inserted or deleted from a genome

40
Q

what is PacBio’s accuracy?

41
Q

what is the common error type in illumina? Why?

A

base substitution because it sequences one nucleotide at a time

42
Q

what is the common error type in ONT? Why?

A

homopolymers and indels because it is working so fast that it isn’t catching all of the sequence as it records the charge

43
Q

what is the common error type in PacBio? Why?

A

homopolymers and indels because it is working so fast that isn’t catching all of them

44
Q

what is “proximity ligation: Hi-C”?

A

it is a library preparation step, genome scaffolding

45
Q

when a linear line is coiled up, is it more likely to be closer to something a couple Kbs away or several mb away?

A

it will be closer to something a couple kb away (the closer in linear space, the closer in 3D space it should be)

46
Q

when DNA is compacted into chromatin (a 3D structure), the DNA that is close together is ______________, trapping sequence interactions across the entire genome and between different chromosomes

A

cross linked

47
Q

what does cross linking do?

A

cross linking traps sequence interaction across the entire genome and between different chromosomes

48
Q

crossliked DNA is fragmented with ____________

A

endonucleases

49
Q

what is proximity ligation?

A

after crosslinks are fragmented, they are then biotenelated and ligated creating chimeric junctions between adjacent sequences

50
Q

what is proximity ligation used for?

A

to assign context to chromosomes and order and orient them along chromosome scale scaffolds

51
Q

_______ is currently the long read technology of choice

52
Q

PacBio increases read lengths and increases throughput = ___________ cost per genome

A

decreasing

53
Q

ONT has extremely long reads but relatively few making it ___________

54
Q

which technology has no upper limit on the size of template that can be sequenced giving it huge potential?

A

Oxford nanopore (ONT)

55
Q

what is the purpose of HiC?

A

order and orient contigs

56
Q

Element is a competing ________ read sequencing

57
Q

in element, an ______ is a dye-labeled polymer with multiple nucleotide arms carrying the same nucleotide base

58
Q

how are the bases detected in element sequencing?

A

florescent signals in 4 channels correlate with A, T, C, or G avidities.

59
Q

steps of element sequencing:

A
  1. bind avidite
  2. wash away unbound avidites
  3. bases are detected
  4. remove avidite
  5. step and block
  6. remove blocks
  7. repeat
60
Q

element is extremely good per base accuracy and does well with ____________. This is a great method for reducing error.

A

homopolymers