trimming Flashcards

1
Q

Question 1

Define the trimming process

A

Trimming refers to the process of removing low-quality or artefactual sequences or portions of sequences prior to downstream analysis — essentially by surgically eliminating only low quality regions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Question 2

Trimmomatic

What is the advantage of trimming ?

A

Trimming has been shown to improve the overall data quality and to enable better results in downstream analysis

Il a été démontré que le découpage améliore la qualité globale des données et permet d’obtenir de meilleurs résultats dans l’analyse en aval.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Question 3

Trimmomatic

What is the drawback of excessive trimming ?

A

Excessive trimming may reduce the quality of downstream results

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Question 4

What is quality trimming ?

A

A typical approach towards quality filtering is to assess the quality of bases and determine where to truncate the read, retaining the 5’ portion, and discarding the lower quality 3’ portion.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Question 5

Trimmomatic

How trimmomatic performs the quality trimming ?

A

Trimmomatic applies a sliding window approach that examines the AVERAGE quality of a set of contiguous bases by sliding a window over the read starting at the 5’ end and trimming if the (average) quality falls below a threshold.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Question 6

Trimmomatic

What is the simple mode of trimmomatic in the adapter trimming ?

A

In the simple mode (which is most useful for single-end reads), each read is scanned from the 5’ end to the 3’ end to determine if any of the user-provided adapter sequences are present. If the adapter overlaps with the 5’ end of the read, then the entire read is discarded. Otherwise, the 3’ terminus of the read is discarded starting from the first overlapping nucleotide.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Question 7

When the trimming must be performed ?

A

Quality trimming should be applied especially if the overall quality is poor towards the 3’ end of reads

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Question 8

What is the palindrome mode of trimmomatic ?

A

Trimmomatic has a “palindrome mode” that is optimized for the detection of “adapter read-through”. When “read-through” occurs, both reads in a pair will comprise the same sequence (in reverse com- plementary orientation) followed by contaminating sequence from the “opposite” adapter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Question 9

Trimmomatic

What is the seed mismatches ?

ILLUMINACLIP:<fastawithadaptersetc>:<seed>:\ <palindrome>:<simple></simple></palindrome></seed></fastawithadaptersetc>

A

The parameter seed mismatches controls the maximum number of mismatches allowed between the adapter sequence and a subsequence of the read to still be considered a match.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Question 10

Trimmomatic

When the trimmimg occurs in the single-end case ?

ILLUMINACLIP:::\ :

A

The match calculated by the full alignment(between adapter and read subsequence) must exceed the simple clip threshold in order for trimming to be performed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Question 11

How the alignment score between adapter and subsequence of the read is calculated ?

ILLUMINACLIP:::\ :

A

The full alignment score is calculated by increasing the alignment score by 0.6 for each matching base and by reducing the alignment score by Q/10 for each mismatched base (where Q is the Phred encoded quality score of the mismatched base). A perfect match of a sequence with a length of n bases is thus nx0.6, which is about 7 for a 12 base perfect match and about 15 for a 25-base perfect match.

Therefore, values of between 7–15 are recommended for this parameter.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Question 12

What does each parameter mean ?

java -jar trimmomatic-0.36.jar SE \

-phred64 \
-threads 2 \
-trimlog son.log \
Sons_exome_fastq_file_1.fq \
trimmed_output.fq \
ILLUMINACLIP:./adapters/TruSeq2-SE.fa:2:30:10
LEADING:3 \
TRAILING:3 \
SLIDINGWINDOW:4:15 \
MINLEN:36 \
TOPHRED33

A

SE

-phred64

Quality scores in the FASTQ file were encoded with Phred+64.

  • threads 2The number of threads to be used by Trimmomatic.
  • trimlog

Write a log to the indicated file.

Sons exome fastq file 1.fq and trimmed output.fq
The input and output files are indicated at this point in the command line. These files are required to be in FASTQ format and may be compressed (gz).

ILLUMINACLIP:./adapters/TruSeq2-SE.fa:2:30:10

The location of the file with the Illumina adapters is given, fol- lowed by seed mismatches, palindrome clip threshold, simple clip threshold, i.e., we allow up to 2 mismatches to the adapter se- quence, and require a score of at least 10 for the alignment be- tween any adapter sequence against a read. The value of 30 is for the palindrome clip threshold, but that is not used in SE mode.

LEADING:3 and TRAILING:3

Specifies the minimum quality required to keep a leading (5’) or trailing (3’) base (here, a minimum Phred score of 3 is indicated).

SLIDINGWINDOW:4:14

Window size of 4, minimum mean quality in window 14.

MINLEN:36

Discard all sequences that are smaller than 36 base pairs after the other trimming operations.

TOPHRED33

Convert quality scores to Phred+33 in the output file.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Question 13

Trimmomatic

Consider the average scores of several windows of quality characters (this can be done using the Unix command echo -n “5:4(” | od -A n -t d1 to get the scores for 5:4(, i.e., 53, 58, 52, 40.

Subtracting 33 from each number and taking the average gives us and average score of 17.75

The mean average score of (80( is 13

In which case have you the trimming (13 or 17,75)?

SLIDINGWINDOW:4:15

A

Consider the average scores of several windows of quality characters (this can be done using the Unix command echo -n “5:4(“ | od -A n -t d1 to get the scores for 5:4(, i.e., 53, 58, 52, 40. Subtracting 33 from each number and taking the average gives us and average score of 17.75, which is above threshold. However, the average score of (80( is only 13, which is below threshold and triggers the trimming of the read as shown in the illustration.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Question 14

In this case where the trimming starts ?

A

Trimming starts at the 3’ base of the first below-threshold window, which in our example corresponds to the “T” of the ACCT with quality string (80(. Note that the middle portion of the sequence and the corresponding quality string have been omitted for better legibility.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly