Bioinformatics Tools Flashcards
What bioinformatics tool used for predicting pathogenic variants uses human phenotype ontology?
Exomiser
What tool can be used to identify phenotype ontology terms using semantic relationships between clinical features.
Phenomizer
What is FastQC?
FastQC is a bioinformatics QC tool which aims to provide a QC report which can spot problems which originate either in the sequencer or in the starting library material.
What does bcl2fastq do?
BCL files are base calls per cycle for Illumina sequencing They contain base call and quality for each tile in each cycle. Illumina software (bcl2fastq) converts BCL to FASTQ (demultiplexing). Other platforms have different types of raw data.
What does demultiplexing do and what is needed as input?
Multiplexing allows multiple samples to be run simultaneously on the same lane of a flowcell.
Each sample has a unique tag.
Sample sheet (.csv file) contains details of run including samples and tags.
This tag is then used to sort FASTQ data into files for each sample – demultiplexing
Bcl2fastq software does demultiplexing.
What are the features an effective alignment algorithm should have?
- Highly accurate
- Be able to deal with problems in the data such as mismatches, errors and gaps
- Needs to run fast enough to be useful
- Has reasonable memory requirements
What do alignment algorithms do?
Alignment algorithms construct indices for read sequences, reference sequence (or both)
Based on type of index, alignment algorithms divided into three catagories:
Based on hash tables
Based on suffix trees
Based on merge sorting (Slider/SliderII)
Whats the difference between global and local alignment?
Global (Needleman-Wunsch) uses entire lengths of sequences involved,
Local (Smith Waterman) only uses parts of sequences
What are the two methods of variant calling?
Probabilistic – e.g. freebayes
Heuristic – e.g. VarScan
What took can be used to assess the mapping quality?
Picard
What is a probabilistic variant calling method based on?
Bayes Theorem
What is a heuristic variant calling method based on?
Instead of modelling the distribution of the observed data and using Bayesian statistics to calculate genotype probabilities, variant calls are made based on a variety of heuristic factors, such as minimum allele counts, read quality cut-offs, bounds on read depth, etc. Although they have been relatively unpopular in practice in comparison to probabilistic methods, in practice due to their use of bounds and cut-offs they can be robust to outlying data that violate the assumptions of probabilistic models.