Exam 1 Quizzes Flashcards
The Human Genome project was launched in ______, produced the first draft assembly in ________ and was “finished” in ________.
1990; 2001; 2003
put these abbreviations in order from smallest to largest: Mb, Tb, Pb, bp, Gb, Kb
smallest bp<Kb<Mb<Gb<Tb<Pb Largest
your favorite species information: common name
domestic dog
your favorite species information: genus
canis
your favorite species information: species
canis familiaris
your favorite species information: haploid number of chromosomes (N)
39
your favorite species information: haploid genome size
2.4 billion (Gb)
your favorite species information: taxon ID
9615
the polymerase chain reaction has four “key” ingredients necessary to replicate DNA in a tube. List these 4 key elements
- DNA template strand
- nucleotides
- primers
- DNA polymerase
.
what are the three stages of the polymerase chain reaction?
- denaturation
- annealing
- extension
Sanger sequence differs from PCR in one key element, what is that key element?
ddNTP (dideoxynucleotide triphosphate)
illumina sequencing requires that the library contain fragments in certain size range (bp). what size range are typical whole genome sequencing libraries?
<600 bp
PacBio and Oxford Nanopre sequencing are very different sequencing technologies but they share one characteristic in common, and this characteristics differentiates them from illumina technology. What is this characteristic?
they have high molecular weight, use very long DNA strands
Given two files containing the same number of bases of sequence information, one being in FASTA format and the other in FASTQ format, which file would be larger and why?
FASTQ, because it includes the quality of the data
We discussed several metrics that could be examines for illumina sequence data as part of quality control. What are the two most important that we discussed?
- Base Quality (phred quality scores)
- Adapter Content
You are interested in impressing your friends at a social gathering and told them that you can convert between phred quality values and estimated sequencing error rates without using a calculator. What is the per-base error rate for a phred value of 20?
0.01
You want to identify all of the DNA variants in your pet. There is already a reference genome available for your pet species. You determined that the most cost-effective way to do this is to use illumina sequencing technology. You create a library and sequence it with paired end 150 base reads. This analysis revealed that exactly 50% of your reads contained adapter sequences. Based on this information, what can you say about your sequencing library?
a) 50% of the DNA molecules are <300 bases in length
b) 50% of the DNA molecules are <150 bases in length
b) 50% of the DNA molecules are <150 bases in length
You want to identify all of the DNA variants in your pet. There is already a reference genome available for your pet species. You determined that the most cost-effective way to do this is to use illumina sequencing technology. You create a library and sequence it with paired end 150 base reads. This analysis revealed that exactly 50% of your reads contained adapter sequences. Based on this information, what can you say about your sequencing library? ANSWER IS: 50% of the DNA molecules are <150 bases in length. WHY??
Think of your DNA pieces like little strings, and you’re trying to read them from both ends. Each time you read, you can go up to 150 letters in one direction.
Now, if a string is longer than 150 letters, you’ll only read the DNA and never reach the extra stuff (adapters) at the ends.
But if a string is shorter than 150 letters, you’ll finish reading the DNA before hitting 150, and then you’ll start reading into the extra stuff (adapters).
Since half of your reads contain adapter sequences, that means half of your DNA pieces must be shorter than 150 letters, because those are the ones that didn’t have enough DNA to read the full 150 bases before hitting the adapter.