Bioinformatics methods for analysis of bacterial genomes Flashcards

1
Q

What is bioinformatics?

A

It is computational techniques for solving biological problems and includes among other things programming, maths, statistics, biology, machine learning, multiomics, and DNA sequencing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why is bacterial genomes dynamic?

A

Due to mobile genetic elements that can be taken up or lost due to horizontal gene transfer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is de novo assembly?

A

De novo assembly is like doing a jigsaw puzzle without the picture on the box.

Reads -> contigs -> scaffolds -> chromosome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What to consider about quality of assembly?

A

1) Size of the assembly: does it match estimates from other means?
2) Size of the contigs/scaffolds: are they reasonably long?
3) Are the expected “core genes” present in the assembly?
4) What fraction of reads map to the assembly?
5) Does the assembly contain sequences of contaminating organisms?
6) Is the assembly consistent with independently derived data?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What tool is used for assembly quality?

A

QUAST

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the values considered for assembly quality?

A
  • N50: A measure of the average size of contigs and scaffolds.
  • Maximum/median/average contig size after removal of the smallest contigs.
  • Number of Ns.
  • Total length of all contigs.
  • Genome coverage: The number of bases in the reference covered by the assembled contigs.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is coverage?

A

The number of reads that support a certain position

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is reference-guided assembly?

A

It is a slightly different, easier problem analogous to knowing what the puzzle should generally look like.

Output: BAM/SAM file (alignment) or FASTA file (consensus)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are BAM/SAM files?

A

It contains reads with mapping information.
SAM = sequence alignment map
BAM = binary SAM

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is BLAST?

A

A web based tool for sequence similarity. It gives a query cover: how much of the sequence is covered, % identity.

1) The query sequence is broken into “words” that will act as seeds in alignment
2) BLAST searches for matches (or synonyms) in target entries in the database
3) If a target entry has two or more matches to “words” from the query, the alignment is extended in both directions looking for additional similarity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are some limitations about databases?

A

1) Databases will have different structure, content, and level of curation
2) Tools only detect what is in the particular database
3) Interpretation requires knowledge of tools and bacteria
4) Annotation software and database used may affect results/outcome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is multilocus sequence typing (MLST)?

A

It is used to define groups within a species.
Is useful for surveillance of which types of strains that are present in a population.
General MLST analysis: 7 loci of housekeeping genes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

In Galaxy, how is filtering of low quality reads and bases performed?

A

By adding fastp to the pipeline

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Why do we add FastQC in Galaxy?

A

For filtering reads

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What Galaxy tool is used for taxonomy/contamination and reporting of taxonomy/contamination?

A

Kraken2 and Kraken

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What Galaxy tool is used for assembly quality?

A

Quast

17
Q

What Galaxy tool is used to get sequence type?

A

MLST

18
Q

What Galaxy tool is used for antibiotic resistance?

A

ABRicate with Resfinder database

19
Q

What Galaxy tool is used for virulence genes?

A

ABRicate with vfdb database

20
Q

What Galaxy tool is used for plasmids?

A

ABRicate with PlasmidFinder database

21
Q

What is the Shovill assemblies containing?

A

Contigs in fasta

22
Q

What Galaxy tool is used for annotation?

A

Prokka

23
Q

What Galaxy tool is used for pangenome analysis?

A

Roary

24
Q

What Galaxy tool is used for phylogeny?

A

Fasttree