Exam 1 Homework Flashcards

1
Q

The Human Genome Project started in the year ______________.

A

1990

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Dr. Green presented five domains of genome research:

A
  1. understanding the structure of genomes
  2. understanding the biology of genomes
  3. understanding the biology of disease
  4. advancing the science of medicine
  5. improving the effectiveness of healthcare
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the largest current bottleneck in genomics? Three were mentioned but all 3 fall under one broad category

A

the data analysis bottleneck

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Dr. Green talks about “Why the world has changed” in the last 10 years. What are the five areas that
he described?

A
  1. genomics
  2. electronic health records
  3. technologies
  4. data science
  5. participant partnerships
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Massively parallel DNA sequencing instruments all have the following steps/characteristics:

A
  • a library obtained by either amplification or ligation with custom linkers
  • library fragments amplified on a solid surface with adapters
  • direct step-by step-by-step detection of the nucleotide base
  • lots of reactions detected per instrument run
  • digital read type that enables direct quantitative comparisons
  • shorter read lengths than capillary sequencers
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

In the Illumina process, the nucleotides are very specialized. They have two key attributes:

A
  1. a fluor that is specific to the identity of the nucleotide
  2. the 3’ hydroxyl group is blocked with a chemical blocker
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

__________________ of reads to the reference sequence is the first step to identify variation of all types.

A

alignment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Long read sequencers such as the PacBio instrument are a departure from short read sequencers such
as Illumina. What is the first major requirement for these long read technologies that is different from the short read technologies?

A

very long read length sizes, high molecular weight of genomic DNA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

A typical workflow of whole exome sequencing analysis consists of the following steps:

A
  1. raw data QC
  2. preprocessing
  3. mapping
  4. post-alignment processing
  5. variant calling
  6. annotation
  7. prioritization
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Standard preprocessing procedure includes:

A
  1. 3’ end adapter removal
  2. trimming of low quality bases at the ends of the reads
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Many different tools have been developed for short reads mapping. In general, they use two
algorithms for aligning sequences:

A
  1. Burrows-Wheeler Transformation (BWT)
  2. Smith-Waterman (SW) Dynamic
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Of the sequence aligners they evaluated, which two were the fastest?

A
  1. Bowtie 2
  2. BWA
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

After mapping reads to the reference genome, a three-step post-alignment processing procedure is
recommended to minimize the artifacts that may affect the quality of downstream variant calling. It
consists of:

A
  1. read duplicate removal
  2. indel realignment
  3. base quality score recalibration (BQSR)

.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Variant analysis consists of:

A
  1. genotyping
  2. variant calling
  3. annotation
  4. prioritization
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

The authors mention three sequencing coverage levels High, Medium and Low. What are the
coverage ranges for these three levels?

A
  • low: <5 x coverage
  • medium: 5-20 x coverage
  • high: >20 x coverage
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the formula for a Phred score: Qphred =

A

-10log(error)

17
Q

What is the Phred quality value corresponding to a 1% error:

18
Q

Alignment is more difficult in which regions of the genome?

A

regions with higher levels of diversity between the reference genome and the sequenced genome

19
Q

Why should per-base quality scores be recalibrated?

A

the raw pared-scaled quality scores produced by base-calling algorithms may not accurately reflect the tru base-calling error rates. So the raw quality scores need to be recalibrated do that a phred score of Q more accurately corresponds to an error rate of 10^(-Q/10)

20
Q

Several probabilistic methods have been developed that use the quality score to provide a posterior
probability for each genotype. What is the name of this value that is estimated for a genotype call?

A

genotype likelihood