BPG Flashcards

Question 1

Q

According to the ACGS guidelines, what should be carried out for an initial pipeline validation?

Answer

A

The initial validation should be a ‘dry’ validation which assesses the pipelines output against a truth set, for example the genome in a bottle truth set.

Users should be aware that there are potential biases in this data towards regions that are easier to sequence with current NGS technologies, and also a potential bias towards current variant callers (e.g. the GATK suite [13]). However, the data is useful as a baseline measure of sensitivity and specificity. A “wet” validation using Genome in a Bottle reference material 8398 [14] should also be considered.

Question 2

Q

Following baseline sensitivity calculation with GIAB (dry or wet) what should be done to further validate the pipeline?

Answer

A

The sensitivity of the pipeline shall be determined using clinical data, i.e. variants detected by Sanger sequencing as part of a diagnostic service which are then also identified with NGS. The resulting sensitivity should have a 95% confidence interval >0.95. This can be achieved if the NGS pipeline detects all 60 of 60 Sanger variants, with no false negatives [3]. In addition, detecting 300 of 300 will achieve a 95% confidence interval >0.99. These variants shall be derived from at least 10 individuals.

Question 3

Q

How do you calculate the 95% CI?

Answer

A

The 95% CI rule can only be used when all of the variants are successfully detected.

The 95% can be calculated by 1 - (3/No of TP)

Question 4

Q

What tools can be used to carry out comparison between the pipeline results and the truth set and why are these useful?

Answer

A

Tools such as hap.py [15] and vcfeval [16] may be useful for normalisation and comparison of vcf files. There is a benchmarking task team within the Global Alliance for Genomics and Health that is developing further tools and standards.

The comparison of called variants against a truth set is a non-trivial task in cases of indels and complex variation, due to alternative representations of
variants in vcf files.

Question 5

Q

How should pipelines be re validated after a new minor version release?

Answer

A

Prior to any changes being merged into production code, a round of validation shall be
performed as per the initial validation detailed above. Therefore, a validation dataset
should be maintained to standardise and simplify this process.

Question 6

Q

How should pipelines be re validated after a new major version release?

Answer

A

If substantive changes are made to a pipeline, e.g. implementing a new variant caller to improve detection of indels, the existing dataset for validation shall be assessed for relevance and addition of further data shall be considered.

Laboratories may want to consider using data derived from a haploid cell line, e.g.
CHM1 [18–20], in some cases.

Question 7

Q

What are the steps which should be carried out when fixing a pipeline issue before a new piece of code is released?

Answer

A

Raise Issue
Commit changes to Dev Branch
Issue pull request
Code Review
Merge to master branch

BPG Flashcards

(7 cards)