Genome Assembly Flashcards

Question 1

Q

How are reads, contigs and scaffolds related?

Answer

A

Reads are assembled into Contigs, Contigs are organised into Scaffolds

Question 2

Q

What do N50 and L50 tell us about a dataset?

Answer

A

Broadly, the quality of the dataset

Question 3

Q

What is the definition of N50?

Answer

A

The sequence length of the shortest contig at 50% of total assembly length

N50 gives the shortest contig required to match or go past 50% of the total length of the dataset, if they are added longest to shortest

Question 4

Q

What is the definition of L50?

Answer

A

Count of smallest num of contigs whose length sum makes up at least half of the genome size

L50 gives us the position of the contig given by the N50 parameter

Question 5

Q

How do reads get transformed into contigs?

Answer

A

They are overlapped, using the longest common substring method:

ATTGGC
  TTGGCT
	  TGGCTC
ATTGGCTC - Output

Question 6

Q

What can we determine from overlaps in reads when constructing a contig?

Answer

A

The coverage for each base - Gives us a measure of how confident we should be about a base being correct

Question 7

Q

What is k-mer analysis?

Answer

A

When we analyse a sequence of DNA for a set of substrings of length k.

For 1-mer analysis we are looking for the number of A,T,G,C bases, but for 2-mer we are looking for the number of AA,AT,AG,AC strings etc.

Question 8

Q

How do we develop a Hamiltonian graph of k-mers?

Answer

A

Order k-mers alphabetically
Only consider distinct k-mers
Each k-mer is a node
Add a directed edge between nodes where the suffix is equal to the prefix.

i.e. TAA -> AAT NOT
AAT -> TAA

Question 9

Q

What are the steps to develop a De Bruijn graph?

Answer

A

Decompose every sequence into k-mers
Divide each k-mer into its prefix and suffix, each of length (k-1)
Construct the graph
- Each (k-1)-mer is a node
- Prefix connects to its matched suffix via a directed edge
Euclidean walk through the graph

Question 10

Q

What is a Euclidean graph?

Answer

A

A graph where you can traverse each edge exactly once.

Question 11

Q

What is a Euclidean walk?

Answer

A

A route through a graph where you visit each edge exactly once.

Question 12

Q