Genome Assembly Flashcards

1
Q

How are reads, contigs and scaffolds related?

A

Reads are assembled into Contigs, Contigs are organised into Scaffolds

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What do N50 and L50 tell us about a dataset?

A

Broadly, the quality of the dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the definition of N50?

A

The sequence length of the shortest contig at 50% of total assembly length

N50 gives the shortest contig required to match or go past 50% of the total length of the dataset, if they are added longest to shortest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the definition of L50?

A

Count of smallest num of contigs whose length sum makes up at least half of the genome size

L50 gives us the position of the contig given by the N50 parameter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do reads get transformed into contigs?

A

They are overlapped, using the longest common substring method:

ATTGGC
  TTGGCT
	  TGGCTC
ATTGGCTC - Output		
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What can we determine from overlaps in reads when constructing a contig?

A

The coverage for each base - Gives us a measure of how confident we should be about a base being correct

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is k-mer analysis?

A

When we analyse a sequence of DNA for a set of substrings of length k.

For 1-mer analysis we are looking for the number of A,T,G,C bases, but for 2-mer we are looking for the number of AA,AT,AG,AC strings etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do we develop a Hamiltonian graph of k-mers?

A
  • Order k-mers alphabetically
  • Only consider distinct k-mers
  • Each k-mer is a node
  • Add a directed edge between nodes where the suffix is equal to the prefix.

i.e. TAA -> AAT NOT
AAT -> TAA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the steps to develop a De Bruijn graph?

A
  1. Decompose every sequence into k-mers
  2. Divide each k-mer into its prefix and suffix, each of length (k-1)
  3. Construct the graph
    • Each (k-1)-mer is a node
    • Prefix connects to its matched suffix via a directed edge
  4. Euclidean walk through the graph
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a Euclidean graph?

A

A graph where you can traverse each edge exactly once.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a Euclidean walk?

A

A route through a graph where you visit each edge exactly once.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly