Exam 3 Flashcards
454 GS20 sequencing. How many reads and how long?
The first next generation DNA sequencing platform. Produces 200,000 sequence reads of 100 bp each.
What is Next Generation Sequencing?
Sequences many DNA molecule in parallel. Bypasses the need to individually clone and grow each molecule prior to sequencing.
Good for sequencing complex mixtures of DNA molecules, not for individual plasmids or PCR products. Produces up to billions of sequence reads in one run.
compare Sanger vs Next Gen sequencing
Sanger needs to be cloned in vivo and amplified then is cycle sequenced with labeled ddNTPs. Next Gen just ligates adaptors and then generates a polony array that is read with cyclic array sequencing
common features of all next generation sequencing
library preparation and clonal amplification (except PacBio or nanopore)
describe library preparation for next gen sequencing
RNA or genomic DNA is fragmented and sequencing adaptors are added to the ends of fragments. These adaptors are used for clonal amplification and sequencing
describe clonal amplification for next gen sequencing
Two ways:
- Emulsion PCR: takes place in small droplets in an emulsion with a small bead and one template molecule. result is a bead coated with copies of that one molecule (slide 15)
- Bridge Amplification: individual molecules are amplified on a substrate coated with oligonucleotides. result is many small spots of DNA, all DNA in one spot is identical. (slide 16)
Very sensitive to amount of input! too much gives mixed sequences, too little gives little data
describe 454
Older tech, support ended 2016. Yields up to 1 million reads per run of 1000 bp length. Used mostly for sequencing amplicons (PCR products) from complex samples. Runs take 1 day. cost per base is high compared to other technologies. Has problems with homopolymers stretches
- Emulsion PCR, beads deposited in plate with small wells
- polymerase bound to each molecule and NTPs flow across one by one.
- polymerase adds if it can, light is produced. camera records light production, determines sequence of DNA in each well
describe ion torrent
clonal amplification is the same as 454 emulsion PCR. sequencing is same as 454 except detects pH change instead of light production. Yields up to 80 million reads of 200-400 bp length. Cost is much less that 454. Runs very fast (2 per day). Has problems with homopolymers stretches
- Emulsion PCR, beads deposited in plate with small wells
- polymerase bound to each molecule and NTPs flow across one by one.
- polymerase adds if it can, pH is changed. meter records pH, determines sequence of DNA in each well
describe Illumina sequencing
Dominant technology right now. Yields up to 20 billion reads of 150-300 bp (depends on instrument). Takes up to 2 weeks. Cost is very low
- Bridge Amplification
- Addition of fluorescent nucleotides (one added per molecule)
- slide is scanned and color of each cluster recorded
- nucleotides are unblocked
- repeat addition of nucleotides and detection for up to 300 cycles
describe Pacific Biosciences sequencing
Single-molecule real time sequencing (SMRT). No clonal amplification, detects incorporation of fluorescent nucleotides on a single strand. DNA library molecules are circular. reads dozens of kb in length. Used when very long reads are needed, but accuracy is lower. Produces up to 4 million reads of 20,000 bp lengths. Fast, only 4 hours
describe Oxford Nanopore
super new technology. DNA strand is pulled through a pore in a membrane and electric current is passed through pore. Changes in current depend on the base passing through the pore. Changes in current are translated into sequence. No cloning, no background noice, length is theoretically unlimited. Number of reads is variable. May be able to read RNA directly (unique!) and can distinguish between modified and regular nucleotides. High error rate!
Overview comparison of Next Gen sequencing technologies
454: Emulsion PCR, max 1 million reads, max 1000 bp length
Ion Torrent: Emulsion PCR, max 80 million reads, max 400 bp length
Illumina: Bridge Amplification, max 10 billion reads, max 600 bp length
PacBio: No amplification, max 500,000 reads, max 30,000 bp length
Nanopore: No amplification, max ?? reads, max 200,000+ bp length
How are next gen sequence data sets analyzed
Large output file (FastQ common) is not analyzed individually like Sanger, but may be aligned to one another to creat a consensus sequence or to compare to a reference
What are barcodes (multiplexing)?
Most next gen sequencing technologies produce more data in one run than is needed for many applications. Barcodes or Indexes allow for multiplexing of samples (running many samples at once). Barcodes are short nucleotide stretches that are added to samples to many samples can be sequenced together and then separated by barcode.
draw the structure of a typical completed sequencing template (dual indexed)
P5 tail (for bridge amplification), Index 2, PE adapter, DNA fragment, PE adapter, Index 1, P7 tail (for bridge amplification)
Slide 5
how was the human genome project accomplished?
Genomic DNA cloned into many thousand BAC clones. Each clone was sequenced individually via primer walking. Sequenced BACs are assembled using matching sequences at ends.
Took 13 years, $3 billion, and 20 universities in several countries
what is shotgun sequencing? How was the human genome sequenced differently with this method
Whole genome is broken into small pieces and sequenced individually. then pieces are put together computationally. This is much faster and took about 5 years (finished around same time as BAC clones were finishing)
Nowadays, when whole genome sequencing is needed, we use shotgun approach
De novo vs resequencing
de novo: “from the beginning”. No prior genomic information. Assembly is difficult, data is made into contains (longest possible stretches of assembled data possible) and then aided by mate-pair or long reads.
resequencing: you already have a genomic sequence for the species and data is mapped back to the reference. Used to look for differences from the reference. No assembly required, short reads with enough data to reliably place them in the genome are sufficient
What is target capture
sequence a small portion of the genome but the same portion of each sample. Reduces the amount of sequencing required.
- block fragments with oligos
- hybridize targets to capture probes
- incubate with magnetic beads which bind the hybridized fragment
describe RNA-Seq
Sequencing of transcribed genes. Can be mRNA or can include small, non transcribed RNA. Gives quantitative information about gene expression in sample
- Fragment RNA before reverse transcription (most common) or cDNA may be fragmented
- sequencing adaptors are ligated to ends of fragments
- library is size selected to get fragments the right size for sequencing
- library is sequenced
- Reads are mapped to genome and number of reads for each gene are counted. counts measure expression levels
heat maps vs volcano plots
heat maps group samples by similarities. Good for finding patters of similarly regulated genes
Volcano plots show fold change/difference of group means (x axis) vs p-value (y axis). Good for identifying significant genes
define metagenomics
Study of genetic material from environmental samples, an entire microbial community can be studied without culturing individual species.
All DNA from sample is isolated and sequenced together, sequences are assembled into genomes to study the species present. May sequence RNA instead for metabolic activity.
what is 16S and how is it used
a Ribosomal RNA gene with several hyper variable regions. it can be used for species identification. Instead of sequencing whole genomes, we can focus just on 16S gene.
Primers are designed to match conserved regions on either side of hyper variable region. PCR products from samples are sequenced so species present and relative abundance can be learned.
What is genotyping and how is it done
A genotype is the genetic makeup of a sample/individual usually with reference to a specific trait/gene. Samples are genotyped at specific loci (genes) for research or clinical purpose.
Most genotyping is done by PCR and sequencing. Simple genes with a small number of alleles can be Sanger Sequenced, but more polymorphic genes cannot (eg Major Histocompatibility Comples has thousands of alleles). These are amplified and sequenced like 16S so alleles can be determined
describe genotyping via GBS or “reduced representation sequencing”
Without PCR. Focuses on restriction fragment sites. Genome is digested with some RE and then adaptors are ligated and library is size selected and sequenced. Selects pieces of the genome where the distance between to RE sites is in the range selected for. Focuses on part of genome around RE sites instead of entire genome, simplified data reduces cost and complexity
describe ATAC-Seq
Asssay for Transposase Accessible Chromatin using Sequencing. Transposace inserts sequencing adapters at regions of open chromatin. Good for nucleosome mapping and transcriptional activity
describe DNAse I Hypersensitive Site Sequencing
Identifies regions of DNA that have fewer nucleosomes (eg active regulatory regions). DNA is digested with DNAse I (only cutes where there are no nucleosomes) then fragment ends are isolated and sequenced. After sequencing, fragments are mapped to the genome.
what is epigenetics
factors that are beyond the genetic code. Modifications that do not change the DNA sequence, but change the function. E.g. methylation, histone modification, regulatory RNA.
Epigenetics may be heritable and are affected by environment/chemical exposure. Abnormal epigenetic may lead to disease
what is chromatin? type of chromatin
Chromatin: DNA wound with histones for packing
heterochromatin: tightly wound. less accessible to polymerase
euchromatin: more loose and open. transcriptionally active
What affects chromatin packing?
Histone modifications:
Methylation causes transcription silencing. Acetylation adds negative charge and opens chromatin structure, increasing expression
what is ChIP
Chromatin Immunoprecipitation. Uses an antibody specific for each histone modification to isolate and study DNA that is associated with each histone. Use qPCR to look at changes at specific site or sequence to look at genome wide changes.
describe CpG methylation
Cytosine may be methylated. CpG (CG in DNA sequence) islands are regions that have more CpG dinucleotides than expected and are often found in or near promoter regions. Methylation can occur at these islands and leads to gene silencing
what are methyltransferases? Name a few key ones
Methyltransferases methylate unneeded genes as cells are specified and differentiate, which is crucial for cell identity/differentiation.
DNMT: Dimethyl Nucleotide Methyl Transferase
DNMT3a/b: lays down initial methylation pattern for cell identity/differentiation
DNMT1: methylates the other C when it find hemimethylated CpG. maintains the methylation pattern after cell division (heritable!)
what are imprinted genes
Genes which are differently methylated depending on which parent they come from. Maternal and paternal alleles will be differently expressed. Only a small number of genes are imprinted. Imprinted genes are often found in clusters