Lecture 17 - Assembly Flashcards by Lina Zhuge

what does whole genome shotgun sequencing use

short reads sampled from all chromosomes of a given genome

How well did you know this?

Not at all

Perfectly

how long are the short reads used for whole genome shotgun sequencing

100-250 bp

How well did you know this?

Not at all

Perfectly

what makes assembly possible

over-sampling, reads will overlap with other reads

How well did you know this?

Not at all

Perfectly

what is sequencing coverage

the number of times a specific region in a genome is sequenced

How well did you know this?

Not at all

Perfectly

what is de novo sequence assembly

the reconstruction of a sequence up to chromosome length, without reference to an existing genome

How well did you know this?

Not at all

Perfectly

why do errors in sequencing occur

genome sequences typically contain repeat regions

How well did you know this?

Not at all

Perfectly

what are paired-end reads used for

to help resolve local repeats

How well did you know this?

Not at all

Perfectly

what are paired-end reads

sequences from either end of a longer sequence of known length

How well did you know this?

Not at all

Perfectly

why do paired ends help map reads over repetitive regions more precisely

the exact length is known

How well did you know this?

Not at all

Perfectly

why are paired ends used over longer contigs

longer reads are more expensive and less accurate at the ends

How well did you know this?

Not at all

Perfectly

how do greedy assembly methods work

joining best overlapping reads if consistent with existing assembly

How well did you know this?

Not at all

Perfectly

what is the disadvantage of greedy assembly methos

the final result is not guaranteed to be optimal

How well did you know this?

Not at all

Perfectly

what do nodes represent in the context of graphs for assembly

sequences

How well did you know this?

Not at all

Perfectly

what do edges represent in the context of graphs for assembly

directional (3’ -> 5’) overlap

How well did you know this?

Not at all

Perfectly

what is the goal in sequence overlap when represented as a graph

to find a single, non-overlapping path connects nodes

How well did you know this?

Not at all

Perfectly

what does overlap-layout consensus create

Study These Flashcards

a graph with a node for each read and edges connecting overlapping reads

what do edges represent in overlap-layout-consensus

Study These Flashcards

pairs of reads that overlap sufficiently well (e.g. at least 20 bp overlap)

what is the disadvantages of overlap-layout-consensus

Study These Flashcards

may have high computational overhead for paired overlap calculations, lots of memory is required

what do De Bruijn Graphs (DBG) use

Study These Flashcards

exact substrings of length k

what does DBG create

Study These Flashcards

a graph of k-mers that overlap by k-1 letters

what are the differences between overlap-layout-consensus and DBG

Study These Flashcards

overlap-layout-consensus uses long reads while DBG uses k-mers
the rules for overlap are different
finding a path for overlap-layout-consensus is more difficult than finding a path for DBG

what should the sequencing depth be for DBG

Study These Flashcards

~40x

if k-mers are of length 31 how many possible k-mers are there

Study These Flashcards

4^31

what does the DBG graph look like if there are no repeat k-mers and no errors

Study These Flashcards

a single long chain

what happens if there's an error in DBG

two separate paths are created

what happens if there are repeat k-mers in DBG

a "join" will be create, two chains representing different parts of the genome join together

what is required to resolve repeats in DBG to two separate paths

information from sequences longer than the k-mer length (reads or paired-end reads)

what are the three methods to resolve graph complexity for DBG

- read threading - mate threading - path following

how does read threading work

join paths across repeats that are shorter than read lengths

how does mate threading work

join paths across repeats shorter than paired-end distances

how does path following work

chooses the path if length fits paired-end constraint, pick path length compatible with paired end distance; if it is not possible to select the correct path, the graph can be cut into multiple contigs

what is the runtime of OLC

exponential time

what is the runtime of DBG

linear in terms of the number of k-mers; length of genome - k

what is SPAdes

a single-cell assembler that builds multiple DBG graphs

how long do the reads have to be for SPAdes

at least the length of the longest k-mer

Lecture 17 - Assembly Flashcards

(35 cards)