Bioinformatics Exam Questions Flashcards
During Sanger sequencing, it is commonly observed that base call quality deteriorates toward the terminal regions of sequencing reads. What is the main technical factor contributing to this decline
in data quality at the read termini?
A) Low concentration of chain-terminating nucleotides, leading to incomplete termination events.
B) Increased likelihood of dNTP misincorporation near the end of the sequence.
C) Variability in fragment mass and electrophoretic mobility affecting resolution.
D) Reduced signal intensity due to the lower quantity of fragments.
D) Reduced signal intensity due to the lower quantity of fragments.
Select all advantages of Sanger sequencing.
A) Enhanced scalability and throughput suitable for high-volume sequencing projects.
B) Superior accuracy in homopolymeric tracts due to lower indel error rates.
C) Ability to generate the longest read lengths among sequencing technologies.
D) Greater cost efficiency when performing large-scale sequencing.
B) Superior accuracy in homopolymeric tracts due to lower indel error rates.
What is the primary functional role of Illumina adapters within the sequencing process?
A) They incorporate fluorescent markers essential for the detection of base incorporations during
sequencing.
B) They facilitate the binding of DNA fragments to complementary oligonucleotides on the flow cell.
C) They protect DNA fragments from degradation during the sequencing process by providing a
stable binding interface.
D) They allow for the simultaneous sequencing of multiple DNA fragments by providing unique barcode sequences.
B) They facilitate the binding of DNA fragments to complementary oligonucleotides on the flow cell.
Which issue is generally not associated with standard base call accuracy concerns for Illumina sequencing?
A) Cross-talk between adjacent detection channels.
B) Delayed or incomplete polymerase activity.
C) Variability in the removal of fluorescent terminator molecules.
D) Temperature fluctuations affecting sequencing cycle rates.
D) Temperature fluctuations affecting sequencing cycle rates.
What is sequencing coverage, and how does it affect downstream genomic analyses?
A) The proportion of the total genome size that has at least one read aligned, affecting assembly
completeness.
B) The mean number of sequencing reads encompassing each nucleotide position, influencing confidence in base accuracy.
C) The evenness of read distribution across the genome, impacting variant detection reliability.
D) The overall fidelity of nucleotide identification during sequencing, affecting error rates.
E) The maximum read length achieved during sequencing runs, affecting the ability to span large
genomic features.
B) The mean number of sequencing reads encompassing each nucleotide position, influencing confidence in base accuracy.
In high-throughput sequencing data, which of the following provides essential per-base error probability metrics for quality control?
A) The N50 contiguity statistic.
B) The sequence in FASTQ file format.
C) Phred scores embedded in FASTQ file format.
D) Coverage depth information within assembled contigs.
E) Per-sequence GC content from the sequencing summary file.
C) Phred scores embedded in FASTQ file format.
What is the immediate biochemical consequence when a ddNTP is incorporated by DNA polymerase?
A It terminates DNA strand synthesis, preventing further elongation.
B It allows the DNA strand to continue synthesizing until the next dNTP is encountered.
C It enhances sequencing accuracy by preventing incorporation of incorrect nucleotides.
D It labels the DNA strand with a fluorescent marker that signals successful base pairing.
A It terminates DNA strand synthesis, preventing further elongation.
What is the main purpose of performing bridge amplification during high-throughput sequencing?
A To minimize the occurrence of sequencing errors by proofreading base incorporation through
amplification cycles.
B To generate localized clonal clusters of DNA fragments for robust signal detection.
C To enable high-throughput sequencing by amplifying template DNA fragments, ensuring sufficient quantities for downstream enzymatic reactions.
D To selectively amplify longer DNA fragments, allowing improved sequencing coverage.
E To enhance base calling accuracy by reducing background noise during sequencing.
B To generate localized clonal clusters of DNA fragments for robust signal detection.
Which of the following best defines a contig in the context of genome assembly?
A A contiguous region within the genome that exhibits high sequencing coverage.
B A sequence of nucleotides constructed by overlapping sequencing reads.
C A segment of the genome prone to errors due to consistently low base quality scores.
D A repetitive genomic region that complicates the assembly process.
B A sequence of nucleotides constructed by overlapping sequencing reads.
What is the most accurate definition of a “k-mer”, and how does it contribute to the construction of
de Bruijn graphs in genome assembly algorithms?
A A substring of fixed length derived from reads and form de Bruijn graph nodes.
B A short sequence of nucleotides from a reference genome and represent de Bruijn graph edges.
C A DNA fragment of variable length for constructing nodes and edges between reads in de Bruijn
graphs.
D Sequences with lengths less than five derived from reads to build a de Bruijn graph.
A A substring of fixed length derived from reads and form de Bruijn graph nodes.
How is the N50 value defined, and what does it convey about the quality of an assembly?
A The sum of all contig lengths divided by the number of contigs, reflecting the average contig size
across the assembly.
B The length of the smallest contig in a sorted list of contigs that reaches 50% of the assembly
length.
C The contig length at which 50% of the total reads have been mapped during the assembly process,
reflecting sequencing coverage uniformity.
D The number of contigs whose cumulative length adds up to 50% of the total genome size, reflecting contig distribution in the assembly.
B The length of the smallest contig in a sorted list of contigs that reaches 50% of the assembly
length.
Which of the following genomic characteristics is most likely to increase the complexity of de novo
genome assembly efforts?
A Genomic regions with high GC content.
B Genomes with extensive repetitive elements and high repeat content.
C Sequencing data with minimal depth of coverage across the genome.
D Use of sequencing platforms that generate only short read lengths.
B Genomes with extensive repetitive elements and high repeat content.
What fundamental principle is the basis for building a de Bruijn graph from sequencing reads?
A Aligning sequencing reads to an existing reference genome to identify overlaps and reconstruct
the target sequence.
B Identifying and utilizing overlapping k-mers to construct nodes and edges within the graph.
C Aggregating short sequencing reads to form longer contigs by prioritizing overlaps between
reads.
D Incorporating base quality scores into the assembly algorithm to weigh k-mers based on read
accuracy.
B Identifying and utilizing overlapping k-mers to construct nodes and edges within the graph.
In graph-based genome assembly approaches, which of the following actions is typically not incorporated into the graph traversal algorithms?
A Initiating traversal from nodes with optimal coverage levels to avoid erroneous low-coverage
paths.
B Extending the current path in the graph until a termination node, such as a dead-end or circular
path, is encountered.
C Employing a linear search algorithm to examine all nodes in islands for potential paths systematically.
D Implementing backtracking strategies to resolve ambiguous branches or repetitive regions in the
sequencing graph.
C Employing a linear search algorithm to examine all nodes in islands for potential paths systematically.
Within De Bruijn graph-based genome assembly frameworks, selecting shorter k-mer lengths facilitates which of the following aspects?
A Enhanced resolution in repetitive genomic regions by reducing the number of ambiguous nodes.
B Increased number of overlaps due to the higher frequency of shorter k-mers, which can complicate the identification of unique sequences.
C Improved management of regions with sparse sequencing coverage, as shorter k-mers provide
more reliable read alignment.
D Increased connectivity in the assembly graph, allowing for better handling of sequencing errors
and structural variations
B Increased number of overlaps due to the higher frequency of shorter k-mers, which can complicate the identification of unique sequences.
OR
C Improved management of regions with sparse sequencing coverage, as shorter k-mers provide
more reliable read alignment.
When determining the most favorable pathway for contig extraction in genome assembly, which of
the following factors is considered least critical?
A Length of the traversed sequences.
B Uniformity of read coverage across the path.
C Frequency of branching points within the graph.
D Existence of unique, linear sequencing paths.
C Frequency of branching points within the graph
What is the primary rationale for employing paired-end sequencing reads to assemble genomes?
A To decrease computational processing time during assembly.
B To supply information regarding inter-contig spacing.
C To rectify low-quality nucleotide bases at read termini.
D To prevent misassembly of repetitive elements.
B To supply information regarding inter-contig spacing.
What is the main repercussion of retaining untrimmed adapter sequences?
A Causing misalignment of reads, resulting in the generation of shorter contigs.
B Leading to erroneous gene annotations and the omission of open reading frames.
C Inflating estimates of the total genome size.
D Introducing inaccuracies in assembly due to incorrect indels.
D Introducing inaccuracies in assembly due to incorrect indels.