Core lecture Flashcards
Which format usually stores sequencing data?
FASTA
What does “>” mean in the FASTA files?
Indicates the start of a new sequence entry
What does the ASCII character and Phred score do?
It calculates the quality of the data, meaning it helps determining the reliability of each nucleotide sequenced in a run.
What does Q20 mean?
Q 20 means qualituy value of 20, P error can be calcaulted: 1/10^Q/10 = 1/10^20/10 = 1/100 = 0.01
What is the typical purpose of base calling in NGS analysis?
To identify the nucleotide sequence from the raw data.
What are paired end reads?
Sequences read from both ends of a DNA fragment
What is the significance of multiplexing?
It allows multiple samples to be sequenced together
What is main advantage of storing quality scores as ASCII characters?
It decreases the file size by only using 1 byte per score
What is the implication of having a “mate-pair”
The pair of reads are far apart and face away from eachother
What is the “Seed and Extend algorithm”
The seed and extend algorithm are used to align DNA sequenced reads to a reference genome.
The algorithm starts by finding a small piece of the sequencing read( seed) that matches the reference genome. This is usually done quickly using a hash table.
The extend; once a seed match is found, the algorithm tries to extend this match in both directions, to align the rest of the sequence to the genome. It continues to match the extend, until too many mismatches are found.
What does NGS stand for?
Next-generation sequencing
Which method is commonly used for sequencing DNA in NGS?
PCR amplification
What is a primary application of NGS technology?
Genetic disorder diagnosis
How does NGS differ from traditional Sanger sequencing?
NGS allows parallel sequencing of multiple fragments
What is a significant advantage of NGS over previous sequencing technologies?
Higher throughput
In NGS, what is the purpose of using barcodes in sequencing libraries?
To track the sample origin
What type of biological molecules can be sequenced using NGS?
DNA and RNA
How has NGS impacted the field of personalized medicine?
It has enabled tailored treatments based on genetic makeup
What is one of the challenges in handling NGS data?
The high volume of data
What does the term “read length” refer to in NGS?
Length of the DNA fragments sequenced
How has NGS technology influenced cancer research?
By identifying genetic mutations associated with cancers
What is the role of bioinformatics in NGS
To analyze and interpret the vast amount of sequencing data
What does de novo sequencing mean in the context of NGS?
Sequencing without a reference genome
How does NGS contribute to the study of rare genetic disorders?
By enabling the identification of genetic mutations responsible
In NGS, what does the term “coverage” refer to?
The number of times a nucleotide is sequenced
What is the significance of multiplexing in NGS?
Allows sequencing of multiple samples simultaneously
Which is an important consideration in NGS data analysis?
The accuracy and integrity of the data
How does NGS facilitate the study of microbial communities?
Through metagenomics, sequencing DNA from environmental samples
What role does NGS play in agricultural genetics?
Assisting in the development of genetically modified crops
What is one of the future directions or potentials of NGS technology?
Enhanced understanding and application in clinical settings
What is the main technique used in 1st generation sequencing?
Sanger sequencing
Which generation of sequencing is known for introducing massively parallel sequencing?
2nd generation
What is a key characteristic of 3rd generation sequencing technologies?
Single-molecule sequencing
Which technology is typically associated with 2nd generation sequencing?
Illumina HiSeq
What was a major limitation of 1st generation sequencing technologies?
Low throughput
Which is a benefit of 3rd generation sequencing?
Real-time sequencing
2nd generation sequencing is also known as:
Next-Generation Sequencing
Which generation of sequencing first introduced the concept of ‘reads’?
1st generation, but it was evolved in 2nd generation
In which generation does sequencing occur without the need to stop and start the process?
3rd generation (3rd generation technologies like PacBio and Oxford Nanopore offer real-time sequencing without the need to stop and start)
Which sequencing technology typically generates the longest read lengths?
Oxford nanopore
What is the first step in assessing NGS data quality?
Generating FastQC reports
Why is trimming performed on NGS data?
To remove low-quality bases from the ends of reads
When is k-mer correction typically performed in the NGS data preprocessing pipeline?
Before de novo assembly
What does a sliding window in quality control processing do?
Identifies and trims low-quality regions in reads
Why is it important to remove adapters in NGS data preprocessing?
They interfere with alignment and de novo assembly
In the context of NGS, why might sequences be overrepresented?
several reasons, library preperation bias, sequencing bias, biological factors, contamination
Which file format is often used for storing compressed NGS data?
gzip
What is an outcome of merging paired-end reads in NGS data preprocessing?
Error correction for overlapping regions
What does it mean if the FastQC report indicates poor quality at the start of reads?
The first few bases may need to be trimmed
What is a common method to handle large amounts of NGS data?
Keep data compressed whenever possible and use workflow managers like Snakemake
What is a Single Nucleotide Variation (SNV)?
A change in a single nucleotide
What does a Polymorphism (SNP) imply?
Presence in more than 1% of the population
What is the significance of transition/transversion ratio in human genetics?
Indicates mutation types
What are the consequences of coding mutations?
Can change amino acid sequences
What differentiates germline mutations from somatic mutations?
Germline mutations can be passed to offspring
What is the role of untranslated regions (UTRs) in gene expression?
Regulating gene expression
How do pathogenic mutations affect organisms?
Can lead to diseases or disorders
What is a consequence of non-coding mutations?
Can lead to changes in gene expression
What is a polygenic risk score used for?
Predicting the risk of complex diseases
What is the focus of personalized medicine?
Tailoring medical treatment to individual genetic profiles
What is the primary goal of variant calling in NGS data analysis?
Identifying differences from a reference genome
Which format is commonly used to store variant calling data?
VCF
What does ‘hard filtering’ in variant calling refer to?
Applying strict criteria to identify true variants
What is a ‘genotype’ in the context of variant calling?
The genetic makeup at a particular locus
How does ‘base quality score recalibration’ enhance variant calling?
By recalibrating the probability of base call errors
In VCF files, what does the ‘INFO’ field provide?
Metadata about the variants
What role do known polymorphic sites play in variant filtration?
Help distinguish between true and false variants
What is indicated by a high Phred quality score in variant calling?
High confidence in the variant call
Why is ‘depth of coverage’ important in variant calling?
Indicates how many times a base is sequenced
What is the significance of ‘allele frequency’ in variant analysis?
It indicates the rarity of the allele in the population