Lecture 17 - Assembly Flashcards

1
Q

what does whole genome shotgun sequencing use

A

short reads sampled from all chromosomes of a given genome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

how long are the short reads used for whole genome shotgun sequencing

A

100-250 bp

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what makes assembly possible

A

over-sampling, reads will overlap with other reads

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what is sequencing coverage

A

the number of times a specific region in a genome is sequenced

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is de novo sequence assembly

A

the reconstruction of a sequence up to chromosome length, without reference to an existing genome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

why do errors in sequencing occur

A

genome sequences typically contain repeat regions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what are paired-end reads used for

A

to help resolve local repeats

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what are paired-end reads

A

sequences from either end of a longer sequence of known length

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

why do paired ends help map reads over repetitive regions more precisely

A

the exact length is known

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

why are paired ends used over longer contigs

A

longer reads are more expensive and less accurate at the ends

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

how do greedy assembly methods work

A

joining best overlapping reads if consistent with existing assembly

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what is the disadvantage of greedy assembly methos

A

the final result is not guaranteed to be optimal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what do nodes represent in the context of graphs for assembly

A

sequences

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what do edges represent in the context of graphs for assembly

A

directional (3’ -> 5’) overlap

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what is the goal in sequence overlap when represented as a graph

A

to find a single, non-overlapping path connects nodes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what does overlap-layout consensus create

A

a graph with a node for each read and edges connecting overlapping reads

17
Q

what do edges represent in overlap-layout-consensus

A

pairs of reads that overlap sufficiently well (e.g. at least 20 bp overlap)

18
Q

what is the disadvantages of overlap-layout-consensus

A

may have high computational overhead for paired overlap calculations, lots of memory is required

19
Q

what do De Bruijn Graphs (DBG) use

A

exact substrings of length k

20
Q

what does DBG create

A

a graph of k-mers that overlap by k-1 letters

21
Q

what are the differences between overlap-layout-consensus and DBG

A
  • overlap-layout-consensus uses long reads while DBG uses k-mers
  • the rules for overlap are different
  • finding a path for overlap-layout-consensus is more difficult than finding a path for DBG
22
Q

what should the sequencing depth be for DBG

23
Q

if k-mers are of length 31 how many possible k-mers are there

24
Q

what does the DBG graph look like if there are no repeat k-mers and no errors

A

a single long chain

25
what happens if there's an error in DBG
two separate paths are created
26
what happens if there are repeat k-mers in DBG
a "join" will be create, two chains representing different parts of the genome join together
27
what is required to resolve repeats in DBG to two separate paths
information from sequences longer than the k-mer length (reads or paired-end reads)
28
what are the three methods to resolve graph complexity for DBG
- read threading - mate threading - path following
29
how does read threading work
join paths across repeats that are shorter than read lengths
30
how does mate threading work
join paths across repeats shorter than paired-end distances
31
how does path following work
chooses the path if length fits paired-end constraint, pick path length compatible with paired end distance; if it is not possible to select the correct path, the graph can be cut into multiple contigs
32
what is the runtime of OLC
exponential time
33
what is the runtime of DBG
linear in terms of the number of k-mers; length of genome - k
34
what is SPAdes
a single-cell assembler that builds multiple DBG graphs
35
how long do the reads have to be for SPAdes
at least the length of the longest k-mer