QC of NGS data Flashcards

1
Q

What is a FASTQ file?

A

A FASTQ file is a text-based format used to store nucleotide sequences along with their corresponding quality scores.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the four lines in a FASTQ file sequence entry?

A
  1. Sequence Identifier: A header line starting with ‘@’ followed by a unique identifier.
  2. Raw Sequence: The actual nucleotide sequence (A, T, C, G).
  3. Quality Identifier: A line starting with ‘+’ that may or may not repeat the sequence identifier.
  4. Quality Scores: A string of ASCII characters representing the quality scores for each nucleotide in the sequence.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What do Phred quality scores indicate?

A

Phred quality scores indicate the confidence level of each base call made by the sequencer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How is a Phred quality score calculated?

A

The Phred quality score Q is calculated using the formula: Q = -10 * log10(P), where P is the probability that a given base call is incorrect.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does a Phred score of 30 indicate?

A

A Phred score of 30 corresponds to a 0.1% chance of error (or 99.9% accuracy).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the two main phases of the NGS data analysis workflow?

A

The two main phases are Primary Analysis and Secondary Analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does the Primary Analysis phase involve?

A
  1. Raw Data Processing: Converting raw instrument signals into sequence data (FASTQ files).
  2. Quality Control (QC): Assessing and filtering data based on quality scores to ensure high-quality reads.
  3. Genome Mapping: Aligning sequences to a reference genome to identify their locations.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are key steps in Primary Analysis?

A
  1. Quality Check: Evaluating the quality of sequences and removing low-quality reads.
  2. Adapter Removal: Filtering out sequences that contain adapter contamination.
  3. High-Quality Filtered Data: Producing cleaned data with statistics on quality metrics.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What applications are focused on in the Secondary Analysis phase?

A
  1. ChIP-Seq: Analyzing transcription factor binding sites and identifying motifs.
  2. RNA-Seq: Investigating differential gene expression and transcript analysis.
  3. Whole Genome DNA Sequencing: Performing variant calling and genomic feature identification.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are good characteristics of sequencing data?

A
  1. High quality scores (indicating low error rates).
  2. Low adapter contamination.
  3. Low duplication rates (to avoid biases in data interpretation).
  4. No GC bias (ensuring even representation across GC content).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are some trimming techniques?

A
  1. Filtering: Removing all reads below a certain quality threshold.
  2. Cropping: Trimming bases from both ends of reads to remove low-quality regions.
  3. Removing Short Reads: Discarding reads that are too short after cropping.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are common tools for trimming?

A
  1. Trimmomatic
  2. Trim Galore
  3. FASTX Toolkit
  4. Galaxy Trimming Tools
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Why is proper handling of primers and adapters important?

A

Proper handling of primers and adapters is crucial during library preparation and sequencing to prevent contamination and ensure accurate results.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly