L6 Flashcards

1
Q

Fastq format

A

-A FASTQ file normally uses four lines per sequence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Line 1 of fastq format

A

begins with a ‘@’ character and is followed by a sequence identifier and an optional description.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Line 2 of fastq format

A

raw sequence letters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Line 3 of the fastq format

A

begins with a ‘+’ character and is optionally followed by the same sequence identifier (and any description) again.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Line 4 of the fastq format

A

encodes the quality values for the sequence in Line 2, and must contain the same number of symbols as letters in the sequence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What do genome assembly programs do

A

stitch together an organism’s
chromosomes from fragmented reads of DNA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Read

A

a DNA “word” that comes out of sequencer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Contig

A

a contiguous sequence formed by several overlapping reads with no gaps that represent a consensus region of DNA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Supercontig

A

an ordered and oriented set of contigs, usually by mate pairs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

N50

A

contig size of N means that 50% of the assembled bases are contained in
contigs of length N or larger.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Coverage

A

The number of times a genome has been sequenced (the depth of
sequencing). C = LN / G

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Assembly size

A

Number of nucleotides successfully assembled

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

A scaffold is made of

A

contigs and gaps

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How can gap length be guessed correctly

A

by incorporating information from paired ends or mate pairs of
different insert sizes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Resequencing

A
  • Allows us to investigate potential SNPs associated with
    disease.
  • Allows us to investigate potential SNPs associated with
    individual populations.
  • Allows us to investigate potential SNPs associated with
    niche specification.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

De novo sequence assembly

A

assembling reads together so that they form a new, previously unknown sequence. orders of magnitude slower and more memory intensive than mapping assemblers. No reference genome

17
Q

Comparative sequence assembly

A

assembling reads against and existing backbone or reference sequence, building a sequence that is similar but not necessarily identical to the backbone sequence.

18
Q

In absence of reference genome, what do we rely on

A

de novo assemblers

19
Q

What do de novo assemblers rely on

A

fact that 2 reads that overlap significantly in their sequence are likely to represent neighboring segments of a
genome. (Kmer value)

20
Q

When do problems arise with de novo assemblers

A

when overlapping regions belong to
repetitive regions.

21
Q

What is the popular sequencing choice for De novo assembly

A

PacBio sequencing due to low costs

22
Q

Greedy assembly algorithm

A

It is used for organisms such as bacteria, single-celled eukaryotes as they have single genomes and aren’t repetitive. It has some efficiency limitation.

23
Q

What has the greedy algorithm been superseded by

A

Graph methods

24
Q

Steps of greedy algorithm

A

Calculate pairwise alignments of all fragments.
(2) Choose two fragments with the largest overlap.
(3) Merge chosen fragments.
(4) Repeat step 2 and 3 until only one fragment is left.

25
Q

Alternative to greedy algorithm

A

de Brujin graphs- ask if its eulerian

26
Q

What is an eulerian graph

A

A graph is considered Eulerian if the graph is both connected and has a closed trail (a walk with no repeated edges)
containing all edges of the graph.

27
Q

What does it mean when a graph is connected

A

if each node can be reached by some other
node.
* Node is balanced if indegree equals outdegree.
* Node is semi-balanced if indegree differs from outdegree by 1.
* A directed, connected graph is Eulerian if and only if it has at most
2 semi-balanced nodes and all other nodes are balanced.