Final Exam Flashcards
DNA sequencing set-up
- Start with bacterial culture for product of interest
- Separate cells from media via centrifuge
- Keep DNA by breaking open cells via lysing
- Isolate and purify DNA using liquid-liquid extraction (aq layer has DNA)
chemical lysis
destabilizes the lipid bilayer and denatures proteins
surfactants
one hydrophobic tail, which allows them to further penetrate molecular structures as compared to phospholipids with 2 tails
Similar to phospholipids, but break through barrier and destabilize proteins better
Main problem of determining the order of nucleotides
DNA elongation happens rapidly and continually
Uses DNA polymerase and excess of nucleotides to make copies of DNA
3’ OH is required for DNA elongation
Di-deoxynucleotides stop replication bc it lacks 3’ OH so polymerase cannot add another nucleotide to it
sanger sequencing
- accurate, long reads, but resource consuming
- use one beaker and fluorescence to distinguish between the ddNTPs
– Fragment separation can be automated via capillary gel electrophoresis
– Separates molecules by size based on their charge-to-mass ratio
Smaller molecules move more freely through the gel and migrate faster than larger molecules
molecules must be charged through tagging
– Unique signal per ddNTP products chromatogram
Building strand from fragments
Sort DNA fragments by length to see what the last nucleotide was
Line up the last 5’ nucleotide; gradually builds the 3’ end up to get strand
Original Set up →
- Split sample into 4 beakers
- Add all 4 ddNTPS into each beaker & radioactive ddNTP
Need separate beakers bc cannot differentiate between them - Add Taq polymerase
- Separate by length using gel electro.
Shortest lengths travel the farthest; associate them with a beaker
Good vs Bad chromatogram
Good:
- Variation in peak high is less than 3-fold.
- Peaks are evenly distributed and one color
- Baseline noise is absent
Interpreted nucleotide sequence is 5’ → 3”
Bad:
- Significant noise up to ~20 bps in (unreliable transport properties)
- Dye blobs occur from unused ddNTPs
- Fewer longer fragments so signal is weaker
Illumina
short reads, but high throughput
- Adapter ligations attach P5 and P7 oligos to facilitate binding to flow cell
- Primers are not complementary, so they do not base pair
- Fragments become bound somewhere in the flow cell
- locally amplify bound DNA fragments to get clusters of the same sequence
- Bridge amplification creates double-stranded bridges
- Double-stranded clonal bridges are denatured with cleaved reverse strands
- uses pair-end sequencing
***clusters will give off a stronger signal compared to a single fragment
Illumina stepwise
- Add labeled dNTPs into flow cells
- Incorporate a complementary nucleotide
- Remove unincorporated fluorescent nucleotides
- Capture fluorescent signal & image clusters
- Cleave the fluorophores and the protecting group
Pair-end sequencing
generated from both ends of a DNA fragment with known insert size
enables both ends of the DNA fragment to be sequenced
Distance between each paired read is known, alignment algorithms can use this info to map the reads over repetitive regions more precisely.
Results in much better alignment of the reads, especially across difficult-to-sequence, repetitive regions of the genome
** more expensive but ideal for genome assembly
Nanopore
Longer reads, more accurate for assembling reads into genome
Very expensive, low throughput
single-end reads
- generated from only one end of a DNA fragment
- Simpler, fast, more cost-effective
- Limited context for structural variations or duplications
- Used for small genomes and RNA seq where contiguity is less critical
Genome assembly
- process of combining our sequencing reads into a continuous DNA sequence
(Sequencing provides short, overlapping reads of DNA)
Having multiple fragments that contain the same portion of the sequence improves our coverage
reads
raw sequences coming the experiments
Contigs
continuous stretches of DNA seq from overlapping seq reads
Ambiguous assembly
contigs put together in an unknown order
Accounts for differences in scaffolds; Assemble using reference genome
Scaffold
contigs put together overlapping with estimated gaps in a known order
main challenges for deNovo genome reconstruction
Repeats: create ambiguity and can cause assemblies; inflate genome size
High Coverage: sequencing the genome multiple times, resulting in a greater number of reads that overlap any given region of the genome
greedy overlap
deNovo genome reconstruction
Goal is to assemble the strings (reads) into a continuous, single string (contig)
Want the shortest possible superstring
- Overlap maximization
– Reduces redundancy, maximizes confidence with highest overlap - Repeat resolution
– Resolves repeats by favoring collapsed arrangements - Evolutionary pressure
– Most genomes have selective pressure to be efficient
how to do a greedy assembly?
merge by highest overlap!!
Repeats ruin assembly ⇒ can cause missing reads
Increase K to overcome repeats
de Brujin graphs
- help for to visualize relationships/overlaps between the strings
- Node = single entity [k-1]
- Edge = represents a connection between entities (can have direction) [k]
- uses direct edges to specify overlap and concatenation
- Each unique k-mer is a node. (K-mer = substring of length k)
- A node is balanced if indegree equals outdegree
multiple reads for DB graphs
not Eulerian bc cannot walk along each edge once; 2 semi-balanced nodes
edges on walk extend the contig in multiple directions
errors in assembly effect on DB graphs
Errors affect:
1) k-mer counts, 2) increase # of edges and unconnected graphs
- No overlap would lead to unconnected graphs; weights can be added to arrows (#)
Error correction should remove most tips, islands, bulges (splits and reconnects)