Exam 1 Practice Questions Flashcards

1
Q

Above are two figures representing the base quality score statistics from two illumina sequencing runs. Left is A, and right is B. Relative to each other, which run would you consider to be “good data”? Why?

A

A. Because we have higher quality scores making the base call is more reliable and more likely to be correct.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Above are two figures representing the base quality score statistics from two illumina sequencing runs. Left is A, and right is B. Relative to each other, which run would you consider to be “good data”? Why?

A

A. Because we were able to read the sequence longer until reaching the adapter.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

If the objective of your research was to sequence the mitochondrial genome of many individuals you would likely choose to use PCR to amplify mtDNA genome from each individual and then sequence it using Sanger Sequencing. WHY would this approach work for mtDNA but not work for the nuclear genome?

A

the mitochondrial genome has a much smaller (~16kb) genome size allowing it to be fully amplified using PCR. this would not work for nuclear genome because it is much bigger (billions) making it impractical to use PCR.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

nucleic acids are composed of three basic parts, what are the specific names of those parts?

A
  1. phosphate group
  2. nitrogenous base
  3. pentose sugar
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

approximately how many protein coding genes are in the mammalian genome?

a) 5,000
b) 10,000
c) 20,000
d) 50,000

A

c) 20,000

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

the short read archive (SRA) at the national center for biotechnology information contains approximately how many bases of sequence data?

a) 91 trillion
b) 910 trillion
c) 9 quadrillion
d) 91 Perta

A

d) 91 Petra (91 quadrillion)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

during DNA replication in the cell, which class of enzymes is responsible for adding a complementary base?

a) primase
b) helicase
c) topoisomerase
d) polymerase

A

d) polymerase

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

nucleic acids are polymerized in the 5’ to 3’ direction. what is the name of the covalent bonds that are formed between the 5’ and 3’ carbons in a nucleic acid chain? are these considered strong or weak bonds?

A

phosphodiester bonds; strong

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

during transcription of eukaryotes, a pre-mRNA transcript in the nucleus generally has several modifications made to it. we discussed two of the modifications that can be used to target nucleic acid molecules for sequencing. what are the names of the two specific modifications that we discussed? (bonus: what are the names of the sequencing applications that use each of those modifications, “something-seq”)

A
  1. 5’ cap (CAGE-seq)
  2. 3’ Poly A tail (RNA-seq)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

sequencing errors can be random or systematic. choose the correct letter(s) that represent systematic sequencing error.

A

A, B, C

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

While we try to minimize errors during experimental design or data collection, errors always occur. Which type of error would be more impactful on your analysis, systematic or random, and why?

A

systematic is more impactful on analysis. systematic errors can cause issues with downstream analysis. systematic errors remain a problem even at high coverage.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

the polymerase chain reaction produces exponential amplification of DNA. If you started with one copy of a particular DNA molecule, how many copies would you have after 5 cycles of PCR?

a) 10
b) 20
c) 8
d) 16
e) 32
f) 64

A

32 (starting copies x 2^n = 1x2^5)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

You start with 1 DNA molecule and run 4 cycles of PCR. How many copies do you have?

A

16

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

You start with 3 DNA molecules and run 5 cycles of PCR. How many copies do you have?

A

96

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

You start with 2 DNA molecules and run 3 cycles of PCR. How many copies do you have?

A

16

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

the PCR process has three stages, denaturation, annealing and extension that occur at different temperatures. What is the normal temperature for the denaturation step? The normal temperature range for the annealing step? the optimal temperature for the extension step?

A
  • denaturation: 94 degrees C
  • annealing: 55 degrees C
  • extension: 72 degrees C
17
Q

for PCR, what is the denaturation temperature and why?

A
  • 94 degrees C
  • to melt the two strands apart
18
Q

for PCR, what is the annealing temperature and why?

A
  • 55 degrees C
  • a lower temperature, warm enough to keep the DNA from coming back together but cool enough to have primers bind to the DNA
19
Q

for PCR, what is the extension temperature and why?

A
  • 72 degrees C
  • optimal temperature for DNA polymerase
20
Q

one of the keys to a successful PCR is designing primers that are specific for that region if interest to be amplified. for a genome of approximately 3 Gb such as a mammalian genome, what is the minimum length of a PCR primer that would be necessary to have a reasonable expectation that it will be specific enough to occur only once within the genome?

a) 10 bases
b) 15 bases
c) 17 bases
d) 20 bases
e) 25 bases

A

c) 17 bases

21
Q

one of the keys to a successful PCR is designing primers that are specific for that region of interest to be amplified. for a genome of approximately 3 Gb such as a mammalian genome, what is the minimum length of a PCR primer that would be necessary to have a reasonable expectation that it will be specific enough to occur only once within the genome? WHY??

A

17 bases; a primer with 17 bases has enough possible unique sequences that to ensure that it will only occur once in the genome, thus reducing the risk of non-specific amplification

22
Q

what is the term used for a unique DNA sequence ligated to DNA fragments within an illumina sequencing library that allow downstream in silicio sorting and identification of individual samples from a pool of many samples?

a) library
b) adapter
c) barcodes/index
d) cluster

A

c) barcodes/index

23
Q

_______________ is the process where a single molecule from an illumina sequencing library is amplified on a flow cell to form a cluster that can be imaged by the sequencer.

A

cluster generation

24
Q

what are the 4 steps of illumina sequencing? which step is no longer done and why?

A
  1. sample prep
  2. cluster generation
  3. sequencing
  4. data analysis
    - cluster generation is no longer done because we are able to use patterned flow cells
25
Q

which is better, patterned flow cells or random flow cells? why?

A

patterned flow cells because they have faster scan times due to ordered cluster positions, less cluster overlap, and more clusters

26
Q

illumina sequencing requires that the library contain DNA fragments less than 600 bp. Why?

A

because of bridge amplification. when the library bends over to amplify, it must be less than 600 base pairs so that it can be in close proximity with another oligo to make that bridge and be able to amplify.

27
Q

this image represents a typical illumina library construct. what is the function of each of these portions of the construct? stated differently, what is their purpose, why are they important?

A

A. flow cell attachment sites
B. index, used for identification and silico sorting
C. sequencing primer binding sites

28
Q

if you were somehow able to construct an illumina sequencing library such that every molecule in the library was the same length of 150 bases, and you sequenced from both ends of the library (paired end), what would be the optimum number of bases you would want to sequence from end to end? explain your answer.

A

75 bases, because the entire 150-base fragment is covered giving the full span of the fragment without exceeding fragment length causing what may be unnecessary overlap

29
Q

in 2015 illumina released a new technology with flow cells that contain billions of nano wells at fixed locations across both surfaces of the flow cell. the technology delivered significant advantages such as increased number of clusters and reduced scan (run) times. what is the name of this technology?

A

patterned flow cell

30
Q

the bionano technology does not actually read DNA sequence. the technology produces a __________ by labeling specific sequence motifs across the genome which can be used to aid in the ordering and orientation of sequence data produced by other platforms, for example to evaluate de novo assembly of genomes. we described a different technology that could also be used to aid in the ordering and orientation of contigs. What is the name of this second technology? __________________

A
  • genome map
  • Hi-C
31
Q

the illumina sequence by synthesis (SBS) process relies on two key technologies associated with the nucleotides, A) the cleavable flour on the nucleotide bases and B) the reversible blocking group on the 3’ hydroxyl group. WHY are these two technologies important to the SBS process? in other words what is their function?

A) cleavable fluor

A

it is a chemical structure attached to the nitrogenous base that emits light at a certain wavelength when excited by laser. this can be attached and then read by the laser to get a read on the sequence. then it can be cleaved so that the next flour can be attached and read.

32
Q

the illumina sequence by synthesis (SBS) process relies on two key technologies associated with the nucleotides, A) the cleavable flour on the nucleotide bases and B) the reversible blocking group on the 3’ hydroxyl group. WHY are these two technologies important to the SBS process? in other words what is their function?

B) the reversible blocking group

A

it can momentarily back transcription to allow one nucleotide to be added at a time. then the blocking can be reversed restoring the free 3’ hydroxyl allowing the next nucleotide to be incorporated.

33
Q

PacBio long read sequencing is called single molecule real time (SMRT) sequencing because the DNA polymerase processes the DNA strand and the instrument “reads” the sequence in real time by creating a “movie” that represents the DNA sequence. This process is enabled by two key technologies, what are they?

A
  • zero-mode wave guides (ZMW’s)
  • phospholinked nucleotides
34
Q

the average read length produced by the PacBio sequencer for HiFi reads is:

a) 75-300 bp
b) 800-1,000 bp
c) 1,000-5,000 bp
d) 10-20 kb+
e) 100 kb+

A

d) 10-20 kb+

35
Q

all the sequencing technologies we have discussed in this class such as Sanger, Illumina, and Pacbio, use some form of flour attached to a nucleotide in order to detect which base was added by the polymerase and thus “read” the DNA sequence. Oxford nanopore uses a technology that does not incude the use of a fluor. HOW does the Oxford nanopore sequencer detect (read) the DNA sequence?

A

nucleotides are forced through a motor protein and through the membrane, each nucleotide has a unique chemical structure that emits a different electrical signal once it passes through the motor protein

36
Q

this figure was discussed repeatedly in relation to the different different sequencing platforms. What is the significance of this figure as it related to the different sequencing technologies that we discussed?

A

the ability to resolve a repetitive structure is dependent on the length of the molecules in your library

37
Q

all variant callers produce errors. these errors can be classified as false positives and false negatives. when performing a genomic analysis, or any similar analysis for that matter, on has to balance sensitivity and specificity, what do the terms sensitivity and specificity mean in the context of variant calling?

A
  • sensitivity: trying to discover all the real variants
  • specificity: trying to limit the false positives that creep in when filters get too lenient
38
Q

your favorite species!
- common name:
- genus:
- species:
- haploid number of chromosomes (N):
- haploid genome size:
- Taxon ID:

A
  • domestic dog
  • canis
  • canis familiaris
  • 39
  • 2.4 billion (Gb)
  • 9615