Informatics In Sequencing Flashcards

Question 1

Q

NGS bioinformatics workflow

Answer

A

Step 1: analysis of raw data

Step 2: read alignment and variant calling

Step 3: annotation and variant prioritisation

Question 2

Q

Base calling

Answer

A

Conversion of fluorescence signals into actual sequence data with quality scores

A Q-score of 30 (Q30) corresponds to a 0.1 percent error rate in base calling, and is widely considered a benchmark for high quality data.

Question 3

Q

Fastq files

Answer

A

Fastq files are important to the first quality control step, as contains all the raw sequencing reads, the file names and quality values, with higher numbers indicative of higher qualities

Sequences SMA phred based quality in one file

Question 4

Q

Read alignment

Answer

A

A) Align and create Binary Alignment Map (bam) files

B) post alignment processing

its objective is to increase the variant call accuracy and quality of the downstream process, by reducing base call and alignment artifacts
it consists of filtering of duplicate reads, intensive local realignment and base quality score recalibration

C) variant calling

aims to identify variants using the post- processed BAM file.
from basic comparing the sample to reference, to advanced algorithms which include machine learning statistical methods
joint cohort variant calling allows identification of systematic errors/biases but also better noise- signal separation and in turn calling of rare variants with more certainty

Question 5

Q

Variant annotation

Answer

A

integral part of data interpretation
adds biological context in an automated way
gene overlap, functionality, conservation, overlap with disease database

Common tools:
- VEP
- ANNOVAR
- snpEff

Common databases:
- Ensembl
(Predicts variant consequences, protein function prediction, linkage disequilibrium data and variant conservation across species)
- RefSeq
- UCSC genome browser

One basic step in the annotation is to provide the variants context. That is in which gene the variant is located, it’s position within the gene and the impact of the variation
(Missense, nonsense, synonymous, stop-loss, etc)

Question 6

Q

Other databases and functions

Answer

A

Population frequency databases:
- gnomAD

Disease database:
- OMIM
- PanelApp
- Clinvar

Question 7

Q

Variant filtering and prioritisation

Answer

A

Phenotype and MOI
Population frequency
Pathogenicity prediction scores and conservation

Variant of interest

| |

Question 8

Q

Clinical interpretation

Answer

A

Allele frequency
Computational data
Functional data
Segregation data
Genotype phenotype correlation

Scale —> good to bad
Benign
Likely benign
Uncertain significance
Likely pathogenic
Pathogenic