Exam 1 Homework and Quizzes Flashcards
The human genome project was launched in ____, produced the first draft assembly in _____, and was finished in _____
1990
2001
2003
Abbreviations in order from smallest to largest
bp Kb Mb Gb Tb Pb
approximately how many bases are in a typical diploid mammalian genome
6 million
approximately how many bases are in a typical mammalian mitochondrial genome
16,000 bp
Approximately what proportion of a mammalian genome codes for proteins
2%
Approximately 50% of a mammalian genome is comprised of what type of DNA element
repetitive DNA
The polymerase chain reaction has four key ‘ingredients’ necessary to replicate DNA in a tube
DNA template
DNA polymerase
Nucleotides
Primers
What are the three stages of the polymerase chain reaction
denaturing
annealing
extending
Sanger sequencing differs from PCR in one key element, what is that key element
sanger uses dideoxynucleotides along with the deoxynucleotides
Illumina sequencing requires that the library contain fragments in a certain size range. what size range are typical of whole genome sequencing libraries
300-350 bp
PacBio and Oxford Nanopore sequencing are very different but share one characteristic in common, and this characteristic differentiates them from Illumina technology. What is that characteristic?
they need a long DNA template strand to start sequencing
Five domains of genome research
- understanding the structure of genomes
-understanding the biology of genomes
-understanding the biology of disease
-advancing the science of medicine
-improving the effectiveness of health care
What is the largest current bottleneck in genomics
analyzing the stream of data from technological advances
In the Illumina process, the nucleotides are very specialized. they have two key attributes, what are they
flour specific for the identity of the nucleotides
3’ hydroxyl group is blocked with a chemical blocker so next step can be accurately detected
____ of reads to the reference sequence is the first step to identify variation of all types
alignment
Long read sequencers such as the PacBio instrument are a departure from short read sequencings such as Illumina. What is the first major requirement for these long read technologies that is different from short read technologies
high molecular weight genomic DNA
must be sufficient quality to allow for >30Kb shearing to produce PacBio continuous reads
A typical workflow of whole exome sequencing analysis consists of the following steps
-raw data QC
-Pre-processing
-mapping
-post-alignment processing
-variant calling
-annotation
-prioritization
standard preprocessing procedure includes
-3’ end adapter removal
-trimming of low quality bases at the ends of the reads
many different tools have been developed for short reads mapping. In general, they use two algorithms for aligning sequences
-Burrows-Wheeler transformation- compression technique
-smith-waterman- dynamic programming algorithm
Of the sequence aligners they evaluated which two were the fastes
Bowtie 2
BWA
After mapping reads to the reference genome, a three-step post-alignment processing procedure is recommended to minimize the artifacts that may affect the quality of downstream variant calling. It consists of
-read duplicate removal
-indel realignment
-base quality score recalibration (BQSR)
Variant analysis consists of
genotyping
variant calling
annotation
prioritization