Algos Problem 3: Assembly of Genomes Flashcards

1
Q

Hamiltonian path / cycle ?

A

Path: visits each vertex exactly once.

A Hamiltonian cycle (or Hamiltonian circuit) is a cycle that visits each vertex exactly once.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How do we represent graphs in computer
code / data structures?

A

Adjacency matrix [aka array]
(also called transition matrix for directed graphs)
(Q: anything special about the adjacency matrix for
undirected graphs?)

Adjacency list (queue)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

When is a graph eulerian?

A

A Graph is Eulerian if It Contains an Eulerian Cycle.

Every Eulerian graph is balanced

A graph is balanced if indegree = outdegree for each node

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Eulerian trail?

A

For the existence of Eulerian trails it is necessary that zero or two vertices have an odd degree.

If such a walk exists, the graph is called traversable or semi-eulerian.[4]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a heuristic?

A

In computer science, a heuristic is a problem-solving strategy or method that is not guaranteed to find the optimal solution, but is designed to find a satisfactory solution in a reasonable amount of time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Sometimes there is more than one solution to a de bruijn graph for a genome assembly. What can you do to avoid the repeat problem?

A

DNA Sequencing with Read-pairs
- Multiple identical copies of genome
- Randomly cut genomes into large equally sized fragments of size InsertLength
- Generate read-pairs: two reads from the ends of each fragment (separated by a fixed distance)

Paired de Bruijn graph for a collection of paired k-mers:
– Represent every paired k-mer as an edge between its paired prefix and paired suffix.
– Glue ALL nodes with identical labels.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Contig?

A

A contig (from contiguous) is a set of overlapping DNA segments that together represent a consensus region of DNA.[1] In bottom-up sequencing projects, a contig refers to overlapping sequence data

Contig: maximal non-branching paths (i.e. fragments) for which in(v) = out(v) = 1 for all nodes except first and last

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Four Ridiculously Unrealistic Assumptions in genome assembly

A
  • Perfect coverage of genome by reads (every k-merfrom the genome is represented by a read)
  • Reads are error-free.
  • Multiplicities of k-mers are known
  • Distances between reads within read-pairs are exact.

also: We don’t know the orientation of reads (forward/reverse) Etc., etc., etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Errors in reads –> what in the graph?

A

Errors in Reads Lead to Bubbles in the
De Bruijn Graph

How well did you know this?
1
Not at all
2
3
4
5
Perfectly