Genomics: Reference Mapping Flashcards

quality of HGS Reads and SNP Analyses

1
Q

What are the different sequencing read formats?

A

Sequencing read formats are file types used to store DNA sequence data generated by sequencing technologies.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Sequence Reads: What are the 2 formats?

A

Binary Standard Flowgram Format (SFF): used for sequencing data from older platforms like Roche 454.

Includes sequence data, quality scores, and flowgram information.

FASTQ Format: A text-based format widely used in modern sequencing technologies like Illumina.

Contains four lines per read - Readable, widely compatible with bioinformatics tools, and contains both sequence and quality information.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Why is it important to have quality score in sequencing reading?

A

Quality scores indicate the accuracy of each base call in sequencing reads. They help in detecting errors, filtering low-quality data, improving confidence in variant calling, optimizing data usage, and ensuring consistency across experiments. A higher quality score means greater confidence in the sequencing data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does a Q30 score mean in sequencing?

A

A Q30 score means there is a 1 in 1,000 chance of an incorrect base call, representing 99.9% accuracy for that base.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How are quality scores used in filtering sequencing data?

A

Quality scores allow researchers to remove low-quality bases or reads from data to ensure only high-confidence sequences are used in analysis, reducing errors in downstream processes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Why are quality scores important for variant calling?

A

Quality scores provide confidence in the base calls, helping distinguish between true genetic variants and sequencing errors, which is critical for accurate mutation detection.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How do quality scores improve comparability across experiments?

A

By providing a standard measure of sequencing accuracy, quality scores ensure that data from different experiments or platforms can be compared and assessed consistently.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

QC: What is the per base quality distribution?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

QC: When would you see poor distribution?

A

The per-base quality distribution shows the range of quality scores for all bases at each position in the sequencing reads. It helps evaluate the reliability of the sequencing data by visualizing how quality varies across the length of the read.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

When would you see poor quality distribution?

A

Poor quality distribution is typically observed:

  1. At the ends of long reads: Quality often decreases due to instrument or chemistry limitations.
  2. In degraded samples: DNA degradation or poor library preparation can reduce overall quality.
  3. After long sequencing runs: General quality degradation over time during sequencing.
  4. In reads with high error rates: Caused by factors like incorrect adapter ligation or contamination.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How can poor quality distribution be addressed?

A

Poor quality can be mitigated by applying quality trimming, which removes low-quality bases (e.g., from the ends of reads) before downstream analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

QC: Why is per tile sequence quality useful?

A

It helps detect localized problems on the flow cell, such as:

Uneven illumination.
Contamination or damage.

This ensures problematic tiles are excluded from downstream analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How is per tile quality assessed?

A

It is visualized using heat maps or graphs that show quality distribution. Uniform colour indicates consistent quality, while deviations highlight problem areas.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the role of quality scores in quality control?

A

Quality scores help identify low-quality reads caused by poor imaging or technical issues during sequencing. These scores ensure only high-quality reads are used for downstream analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Why must low-quality reads be removed from downstream analysis?

A

Low-quality reads can introduce errors (e.g., wrongly called bases) that bias the results of downstream processes, such as alignment, assembly, or variant calling.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How are low-quality reads identified?

A

Low-quality reads are identified using in-built software tools (e.g., FastQC) that evaluate quality scores and flag sequences with overall poor quality values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What could cause poor-quality reads?

A

Poor-quality reads may result from:

Poor imaging during sequencing runs.

Technical issues with the sequencing machine.

Degradation of the DNA sample.

17
Q

How are quality scores typically visualized?

A

Quality scores are often displayed in a graph, such as a histogram, where sequences with high-quality scores cluster toward the higher end of the graph, and low-quality sequences are flagged.

18
Q

What does a sharp peak in GC content indicate?

A

A sharp peak in the GC content distribution suggests:

Possible contamination, such as adapter dimers or non-target sequences.

Overrepresentation of specific sequences in the library.

19
Q

What could overrepresentation of a single sequence in the GC content plot indicate?

A

It may indicate the library is:

Contaminated with specific sequences (e.g., from adapters).

Imbalanced, such as from amplification biases during library preparation.

20
Q

How is per base GC content different from per sequence GC content?

A

Per base GC content measures GC content at each position across all reads, helping detect biases at specific regions.

Per sequence, GC content evaluates the overall GC composition of each read compared to a normal distribution.

21
Q

What does a normal GC content distribution look like?

A

n a normal library, GC content follows a smooth, bell-shaped curve centered around the average GC percentage of the species being sequenced.

22
Q

Why is it necessary to remove adapter sequences?

A

Adapter sequences need to be removed because:

  • They contaminate sequencing data.
  • They interfere with alignment and downstream analyses.
  • They introduce bias, affecting the accuracy of results.
23
Q

What is the role of FastQC in adapter sequence removal?

A

FastQC flags overrepresented sequences (including adapters) and provides details about the type of adapters present (e.g., Illumina Universal Adapter). Users can then use trimming software to remove them.

24
Q

QC: What is per base sequence content?

A
25
Q

QC: What is sequence dupilication?

A
26
Q

QC: Why do we look at per base N content?

A
27
Q

Examples of bad Illumia run, why would they occur?

A
28
Q

What is de novo assembly?

A
29
Q

de novo assembly: what can help with filling the gaps?

A
30
Q

Explain the principle behind de novo assembly.

A
31
Q

Why is repetitive sequence the most common source of assembly errors?

A
32
Q

What is reference mapping?

A
33
Q

What is mapping output?

A
34
Q

What is single-nuckleotide polymorphism (SNP)

A
35
Q

How can SNPs present itself in coding sequencing?

A
36
Q

Why can’t the order of gene be predicted in the mapped sequence?

A
37
Q

What are the different types of SNP?

A
38
Q

What is TRAMS?

A
39
Q

What is the workflow for TRAMS?

A