Lecture 3 Flashcards
Define “massive parallel sequencing” in NGS:
Massive: several regions at once; parallel: several samples at a time
What are the 2 main NGS platforms currently used?
Ion torrent and illumina
What is ion torrent also referred to as?
“Label-free sequencing” - fluorescence or spikes of light are not used
What does ion torrent measure?
Changes in ph
What is ion torrent characterized by?
High accuracy and good coverage
How long are the reads sequenced by ion torrent?
About 200 bp
How long are the average Sanger sequence reads?
600-800 bp up to 1000
What is ilumina characterized by?
Bridge amplification
How is illumina visualized?
By the use of fluorescence → each nucleotide is linked to a different fluorophore which emits a unique signal
Describe the library structure of an illumina sample:
A DNA insert with “read1” and “read2” ( primers similar to the forward and reverse primers in Sanger) on either side, two indexes on either side that serve as an 8 bp “barcode” exclusive to each sample, and 2 adaptors complimentary to those linked on the flow cell called p5 and p7
What is the process called in which all samples collected by illumina sequencing are pulled together to observe associations between the obtained sequence and the sorted samples?
Demultiplexing → each read is associated to a unique sample
What is cluster generation?
Amplification of the flow cell
What is a quality score?
A prediction of the probability of an error in base calling
When measuring Phred quality score, what probability corresponds to high accuracy?
Low probability
What is read depth?
The total number of bases sequenced and aligned at a given reference base position
What is coverage defined as?
The average number of sequenced bases that align to each base of the reference DNA → ex: a whole genome sequenced at 30x coverage means that each base in that genome analysis was sequenced 30 times on average
What does NGS accuracy depend on?
Coverage
How can using paired-end sequencing reduce the chance of introducing an error in the base calling?
We can check if there is a balanced calling of a genetic variant that is defected
Where is the variant calling base balanced between in illumina paired-end sequencing?
Between read 1 and read 2 → otherwise the variation might be an error
What are some pros of paired-end sequencing?
Millions of parallel sequencing reactions are performed and can lead to the identification of changes in the number of copies
What are some cons of paired-end sequencing?
A very large amount of data is collected and both false positives and false negatives are possible
Currently, what is required at the end of data analysis?
The use of Sanger sequencing because the results obtained by NGS require validation