01: DNA Sequencing Flashcards
true/false The “health and ancestry” commercial DNA analysis available to the public are for whole genome sequencing rather than genotyping
- false
- the other way around
true/false Most current methods of manipulating DNA, RNA, and proteins rely on prior
knowledge of the nucleotide sequence of the genome of interest
true
what is the most widely used method to determine nucelotide sequences in a genome of interest
- dideoxy sequencing
- aka sanger sequencing
what is used in sanger sequencing
- DNA polymerase
- dideoxyribonucleoside triphosphates (special-terminating nucelotides)
how does sanger sequencing work
- they produce a collection of different DNA copies that terminate at every position in the original DNA sequence
- these are then visualized to see where each nucleotides are
what is the key difference between how sanger sequencing used to work, and how it does now
- originally 4 diff sequencing reactions were performed, each w a diff dideoxyribonucleotide
- the DNA copies were labeled with radioactivity
and separated on polyacrylamide gels - these were then exposed to film to produce
four ladders of bands that were read manually to reveal the sequence - now robotic devices mix the reagents, including the four different chain-terminating dideoxyribonucleotides,
- each one is tagged with a different-coloured fluorescent dye
- these are loaded onto capillary gels, which separate the reaction products into
a series of distinct bands - A detector then records the colour of each band, and a computer translates the information into a nucleotide sequence
Automated dideoxy sequencing was used to determine the nucleotide sequences of which genomes
- e coli
- fruit flies
- nematode worms
- humans
- many others
Due to _______ the cost of sequencing DNA has decreased dramatically, and the number of sequenced genomes has increased enormously
“second-generation sequencing technologies”
what do second-generation sequencing technologies allow us to do
- multiple genomes to be sequenced in a matter of weeks
- catalog the variation in nucleotide sequences from people around the world
- uncover the mutations that increase the risk of various diseases, from cancer to autism
- made it possible to determine the genome sequence of extinct species
- helped us understand the molecular basis
of key evolutionary events in the tree of life
what is the most common second-generation sequencing method
illumina sequencing
how does illumina sequencing work
- begins with the construction of libraries of small DNA fragments that represent the entire genome
- this is made via PCR amplification
- it is done in a way that keeps all of the produced DNA fragments close to the original fragment
- sequencing is done with chain-terminating nucleotides w uniquely coloured fluorescent tags
- DNA polymerase adds the fluorescent nucleotide
- a photo of the reaction records the colour to reveal the identity of the nucleotide that was added
- coloured label and chain-terminating group are removed, allowing the polymerase to add the next nucleotide
- this cycle is repeated hundreds of times
- the computer stiches together all the fragments, using the overlaps between them as guides, to reconstruct the full genome sequence
true/false similar to conventional dideoxy sequencing, the fluorescent tag and the chemical group that blocks elongation are both removable in illumina sequencing
- False
- this is true for illumina, but not for dideoxy
what is special about third-generation sequencing methods
capable of sequencing much longer DNA molecules
what are the 2 promising third-generation sequencing methods
- single-molecule real-time (SMRT) sequencing
- Nanopore sequencing
describe single-molecule real-time (SMRT) sequencing
- carried out in an array of tiny wells, each containing a single DNA polymerase anchored to the bottom
- it uses deoxyrubonucleoside triphosphates where the fluorescent dye is attached to the terminal phosphate
- as the polymerase copies the template DNA, the binding of a fluoresent nucleotide generated a colour signal to allow us to identify it
- the signal disappears when the terminal phosphate is released during its incorporation
true/false it is possible to use circular DNA templates that are sequenced repeatedly on both strands with single-molecule real-time (SMRT) sequencing
- true
- this greatly improves the accuracy of the
resulting sequence
describe nanopore sequencing
- involves the transport of a single-strand DNA molecule through a tiny protein pore in a membrane
- voltage is applied across the membrane, resulting in current through the pore
- the passage of the nucleotides through the pore results in tiny shifts in electrical current across the membrane
- measurement of these tiny current changes reveals the identity of each nucleotide
which form of sequencing does not require DNA synthesis
nanopore sequencing
using which sequencing methods can very long DNAs be sequenced
- SMRT
- nanopore
what are unique advantages to nanopore sequencing, that do not exist with SMRT
- can identify modified nucleotides
- their effect on the current differs slightly from that of the unmodified
- can be performed with portable, handheld instruments that can be taken into the field
in SMRT sequencing, how are circular DNA templates used
- by attaching hairpin adaptor DNAs to each end of the DNA to be sequenced
- a primer is used that matches the adaptor
- an enzyme called strand-displacing polymerase separates the double-stranded DNA as it moves along the template, allowing it to continue around the entire molecule many times
what allows the experimenter to eliminate sequence errors that arise from random mistakes made by the polymerase.
the fact that both strands of the DNA are sequenced repeatedly
true/false sequencing genomes has gotten more expensive with these new methods
- false
- its gotten cheaper
how is RNA sequencing done as of right now
- by converting the RNA to cDNA (via reverse transcriptase)
- and then one of the methods we’ve learnt about for DNA
what is a valuable tool for annotating genomes
RNA-seq
true/false long strings of nucleotides, at first glance, reveal nothing about how this genetic information directs the development of a living organism
true
what does the process of genome annotating attempt to do
- attempts to mark out all the genes (both protein-coding and noncoding) in a genome and ascribe a role to each
- also tries to understand the more subtle types of genome information
what is an example of the more subtle types of genome information
- the cis-regulatory sequences that specify the time and place that a given gene is expressed
- whether its mRNA undergoes alternative splicing to produce diff protein isotopes
what is the first step in trying to make sense of a genome sequence
to translate in silico the entire genome into protein
how many different reading frames are there for any piece of double-stranded DNA
6
how many different reading frames are there for any piece of single-stranded DNA
3
what are open reading frames (ORFs)
protein coding regions, with much longer stretches without stop codons (longer than 20 AA)
open reading frames (ORFs) often signify what
bona fide protein coding genes
how is the determination of an ORFs typically double-checked
- by comparing the ORF AA sequence to the many databases of documented proteins from other species
- if a match is found (even imperfect) then its very likely that the ORF will code for a functional protein
when does the “double-checking” strategy work best
for compact genomes (where introns are rare and ORFs extend for many hundreds of AA)
when does the “double-checking” strategy not work too well
- since it works best w compact genomes, when it’s not compact it;s not as effective
- the average exon size is 150–200 nucleotide pairs for many animals and plants, and additional information is usually required to unambiguously locate all the exons of a gene
what do we do when the genome is not compact, and we want to sequence it
- can search genomes for splicing signals and other features to help identify codons
- most powerful method though is to sequence all RNA produced
what can RNA-seq information be used to accurately locate
all introns and exons of even complex genes
true/false RNA-seq identifies noncoding RNAs produced by a genome
true
what is the main reason for why we only know the approx. number of genes in the human genome
The existence of the many noncoding RNAs and our relative ignorance of their function
we know from __________ that many organisms share the same basic set of proteins
comparative genomics
true/false the functions of few identified proteins remain unknown
- False
- a very large number are unknown
approximately how many proteins encoded by a sequenced genome do not clearly resemble any protein that has been studied biochemically
approx. one third
what is a key limitation regarding the emerging field of genomics
- comparative analysis of genomes reveals a great deal of information about the relationships between genes and organisms
- BUT it often does not provide immediate information about how these genes function or what roles they have in the physiology of an organism