LEC48: Applications of Next Gen Sequencing Flashcards
how does sanger sequencing work
take DNA, compartmentalize/target regions of interest in genome
generate sequence-specific primers
these modified DNA bases lack 3’-OH on ribose moiety, so any replicating DNA chain into which they’re added by DNA pol will beu nable to be further extended
these terminator bases are added into individual elongating DNA molecules
produces ladder of DNA chains of specific lengths
each chain is tagged by terminator molecule w/ unique flourescent molecule tag that’s detected by reading device
how does next gen DNA sequencing process work
physically shear DNA, produce fragments
size select fragments for enzymatic attachment of adaptor primers
denature the DNA into single strands, adaptor primers used to capture individual fragments onto a sequencing matrix
amplify these molecules to make library, by PCR
add DNA polymerase + modified bases that aren’t extendable but are tagged w/ a flourescent color, 1 for each base
generates the series of individual rxns that’re located at a specific location on matrix and emit a specific color, allow ID of base added
what is massively parallel sequencing
once have done 1 round of next gen sequencing,
wash, removing all reagents
chemically modify, remove flourescent tags, DNA can be elongated again
add second reagents cycle
generates a 2nd piece of seuqnece info for each position
repeat this cycle multiple times, generate sequence data in parallel at massive scale
how are clusters visualized w/ next gen
measure dNTP-specific flourescence at individual rxn centers b/c each piece of light represnts an individual sequencing rxn
this is advanced microscopy
tihs requires lots of computational power
once use next gen sequencing to sequence, how do you apply that info
massively parallel sequence data analysis: probabilistic approach, align fragments against reference genome and find variance w/in that fragment of DNA
how can computer faster analyze next gen genome
1) split genome by chromosome, create many jobs
2) run jobs concurrently on diff cluster nodes
3) combine results into single output for further analysis
what is the computational challenege re: input of next gen sequencing
input n bp long sequences from sample, as short reads
map those back to reference genome to align them, map out genome
can map your reference back to specific range
but **repetitive regions are a problem for this **b/c hard to unambiguously assign to reference genome
what are limitations to computational abitilies of next gen
1) space needed is massive for storing image files and subsequent data
2) processing power needed for aligning huge number of relatively short sequence fragments (reads) thatre generated in order to ID positions w/ sequence variatns (polymorphisms, mutations)
need **high performance computer assays, sophisticated computational algorithms **to minimize the processing time needed to accomplish these tasks
sequence alignment vs databse mapping?
sequence alignment: comparing 2 sequences of DNA
database mapping: comparing many small sequences to one really big sequence
what happens w/ DNA fragment sequence once generated in next gen seq
must align sequence unambiguously to a specific chromosomal position
use coputer to generate best fits of each fragment to a genome region
highly repetitive regions are difficult to align well, arent sequenced this way
also can not align regions of genome which arent efficienctly amplified by PCR b/c of sequence identity (i.e. high GC content)
what is the significance of variants in next gen sequencing?
if sample fragments have variation from reference genome, could be incorrect seq assignment, poor alignment, experimental noise if in region w/ few seq reads
HOWEVER if true variation, hard to interpret b/c of:
1) incomplete knowledge of fxn of all genes
2) incomplete knowledge of range of tolerated variation in human populations
3) incomplete knowledge of effect of individual amino acid changes on protein fxn
4) incorrect assignments of pathogenicity in current mutation databases
what is the exome
protein coding portion of the genome
better understood than regulatory regions of genome that’re noncoding
why might do exome sequnecing?
to reduct amount of variation to be interpreted in clinical seq sample for next gen seq
can capture **specific DNA fragments representing the coding part of the genome, using specially designed primers that incorporate tag molecules **
how does exome sequencing work
what can it be used on?
primers confer specificity to your target
primers incorporate tag molecule and use its physical characteristics to yield **enrichment in DNA fragmnets of interest **
cna capsure: whole exome, subset of known disease-associated genes (medical exome), or panel of genes for a specific condition (i.e. epilepsy)
efficacy of exome sequencing?
v helpful for clinical test especially in undiagnosed, mendelian disease - good for rare disease detection
has been used at Baylor CoM, 25% hit rate on first 250, exomes done