Exam IDB Flashcards
What is GWAS and how is it used?
Genome-wide association studies
Hypothesis free method that test variants across the genome to identify alleles that are associated with a phenotype as specific resistance
Run by scoary
What is Prokka in pangenome analysis
Prokka is a software rapidly annotates genes and identify coding sequences which can be used by Roary in nemt step
What is Roary in pangenome analysis?
Roary is a software used to construct pangenome based on annotation from Prokka, identifying orthologous groups of genes using a fast clustering algortihm, making a pangenome matrix representing presence or absence of the groups in each genome for analysis of downstream gene function and evolution
What is OLC and how does it work?
Overlap consensus sequence, Identifying overlaps between reads and join them together to form a consensus sequence
Alignment to identify regions of similarity (overlaps) that are used to make an overlap graph. Each read is represented as a node and overlaps are represented as edges that connect the nodes. Its used to identify clusters of reads that might represent the same DNA fragments and assemble the clusters into contigs
What is the conclave algorithm used for?
Resolve multimapping sequences from redundant sequence patterns
The ConClave scheme can be used, when you have reads that map to multiple locations. The read is then mapped to the location/ template with the highest scoring alignment.
Modern sequence alignemt methods includes?
Mapping of sequences to reduce the search space.
Chaining of maximum exact matches.
Adjust the expectations of outcome between two groups.
Mapping is the first step of alignment, so that the MEM (maximum exact matches) can be chained together
Which identification method has the highest discriminatory power?
Whole genome sequencing
This analyzes the whole genome which ofc is the highest discriminatory power because it can discriminate between all genes.
If you cannot find expected resistance. What could be the explanation for lack of resistance genes with a resistance phenotype?
You may forget to include point mutations
The strain could be intrinsic resistance to the antibiotic.
When a strain of bacteria is said to be intrinsically resistant to an antibiotic, it means that the bacteria naturally possesses mechanisms that prevent the antibiotic from working against it, without the need for the bacteria to acquire additional resistance genes or mutations.
Its a new gene or new mechanism
What is a replicon?
Molecules of DNA or RNA that are capable of survival and replication
How can a resistance gene be linked to a plasmid?
Its only possible if resfinder identifies the resistance gene on the same contig as plasmidfinder identifies the plasmid replicon (THIS REQUIRES THE ANALYSES RUN ON ASSEMBLED SEQUENCES
What can you use as input in BEAST program?
Aligned aa sequences
Aligned nucleoetide sequences
Concatenated SNPs
After running BEAST but the log file from BEAST has very low ESS (Effective sample site) for most of specified parameters
Increase number of BEAST runs
Change prior model
Change clock model
What can BEAST not do?
Identify recombinations, because BEAST models evolution as a tree-like process, where sequences evolve along a single branching path without any exchange of genetic material between lineages.
Recombination, on the other hand, involves the exchange of genetic material between different lineages, which can result in sequences that do not conform to a simple tree-like pattern of evolution. Therefore, detecting recombination requires methods that can identify and model patterns of genetic exchange between lineages, such as methods based on genetic linkage or network analyses
What program do you use for summarising the information from a sample of trees produced by BEAST onto a single “target” tree ?
TreeAnnotator
What is the difference between maximum likelihood and maximum parsimony in phylogeny?
Maximum likelihood is most accurate, dont provide SNPs distances, so finding difference in same species, whereas maximum parsimony is minimum evolution providing SNPs distance underestimating actual evolution and can be used for different species
How do you infer phylogeny from SNPs?
We call SNPs for each isolate using the same reference
Concatenate SNPs into ‘SNP sequences’ one per isolate
Create a tree using a chosen algorithm
CSI phylogeny for raw reads pipeline looks like?
Map reads to reference (BWA is used)
Call all possible SNPs (using samtools)
Filter positions and SNPs using: coverage, quality and z-score.
Prune SNPs. This removes SNPs that are in close proximity to remove mobile elements and repeat sequences.
Output is VCF file (variant calling format)
CSI phylogeny of assembly pipeline looks like?
NUCMER (part of MUMMER) that aligns all contigs to the reference to find SNPS
Pruning
It is preferred to use raw reads because then we can validate the SNPs that we are calling.
pMLST portable genome sequence multi locus sequence typing is?
A method type plasmids to characterize bacterial isolates. The database has the plasmid alleles and can give you the ST type for the plasmid. For this analysis we need to know the plasmid type and therefore plasmidfinder should be done first
What is the purpose of the PlasmidFinder?
Does not take the entire plasmid, but plasmid replicons, if we know the replicons of the plasmid, we know the type of plasmid we are working with and if they have the same plasmid. (Input is FASTA as well as a database of plasmid replicon sequences for comparison, and Output is TSV file)
What is the purpose of the ResFinder?
Resistant gene detection, showing which bacteria show to be antibiotic resistant and which exact phenotype is being resistant. The program can give you the gene, class of gene or genome. It can detect the whole resistance gene and chromosomal point mutations causing resistance in the whole genome sequence
What is KMA?
KMA is a mapping method designed to map raw reads directly against redundant databases, in an ultra-fast manner using seed and extend. KMA is particulary good at aligning high quality reads against highly redundant databases, where unique matches often does not exist.
What is chaining?
MEMs (maximal exact match) are likely to belong together and produce high quality alignments, they can therefore be chained together.