Informatics In Sequencing Flashcards

1
Q

NGS bioinformatics workflow

A

Step 1: analysis of raw data

Step 2: read alignment and variant calling

Step 3: annotation and variant prioritisation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Base calling

A

Conversion of fluorescence signals into actual sequence data with quality scores

A Q-score of 30 (Q30) corresponds to a 0.1 percent error rate in base calling, and is widely considered a benchmark for high quality data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Fastq files

A

Fastq files are important to the first quality control step, as contains all the raw sequencing reads, the file names and quality values, with higher numbers indicative of higher qualities

Sequences SMA phred based quality in one file

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Read alignment

A

A) Align and create Binary Alignment Map (bam) files

B) post alignment processing

  • its objective is to increase the variant call accuracy and quality of the downstream process, by reducing base call and alignment artifacts
  • it consists of filtering of duplicate reads, intensive local realignment and base quality score recalibration

C) variant calling

  • aims to identify variants using the post- processed BAM file.
  • from basic comparing the sample to reference, to advanced algorithms which include machine learning statistical methods
  • joint cohort variant calling allows identification of systematic errors/biases but also better noise- signal separation and in turn calling of rare variants with more certainty
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Variant annotation

A
  • integral part of data interpretation
  • adds biological context in an automated way
  • gene overlap, functionality, conservation, overlap with disease database

Common tools:
- VEP
- ANNOVAR
- snpEff

Common databases:
- Ensembl
(Predicts variant consequences, protein function prediction, linkage disequilibrium data and variant conservation across species)
- RefSeq
- UCSC genome browser

One basic step in the annotation is to provide the variants context. That is in which gene the variant is located, it’s position within the gene and the impact of the variation
(Missense, nonsense, synonymous, stop-loss, etc)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Other databases and functions

A

Population frequency databases:
- gnomAD

Disease database:
- OMIM
- PanelApp
- Clinvar

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Variant filtering and prioritisation

A

Phenotype and MOI
Population frequency
Pathogenicity prediction scores and conservation

Variant of interest

| |

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Clinical interpretation

A

Allele frequency
Computational data
Functional data
Segregation data
Genotype phenotype correlation

Scale —> good to bad
Benign
Likely benign
Uncertain significance
Likely pathogenic
Pathogenic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly