Bioinformatics Exam Flashcards
AlphaFold 3
Predict the joint structure of complexes including proteins, NA, small molecules, ions, etc.
HOMER2
Show that the effect of transcription factor binding on transcription initiation is position dependent
How to we acquire our DNA sample for DNA sequencing?
- Start with bacterial culture to produce the product of interest
— Biotechnology frequently uses massive E. coli cultures to produce. - Separate cells from media
– Centrifuge and separate cells and media
– Keep the component of interest (DNA)
– Break open the cells by lysing them (chemical lysis destabilizes the lipid bilayer and denatures proteins) - Isolate and purify our DNA
– phenol-chloroform extraction (liquid-liquid separation)
– Aq DNA/RNA on top
– Lipids/large molecules on the bottom
Surfactants VS Phospholipids
- both contain a hydrophilic head and hydrophobic tail
– surfactant have only hydrophobic tail which allows them to further penetrate molecular structure as compared to phospholipids with 2 tails
– break phospholipid barrier more and destabilize proteins (used for chemical lysis)
260 nm DNA sample absorbance
- absorbance at 260 nm is correlated to the DNA concentration of the sample
— looks for impurities in the sample solution
— can assume we have purified DNA sample after this step
— based on the the absorbance of UV irradiation (Bier Lambert’s Law)
Main purpose of Sanger Sequencing
— determine the precise ordering of nucleotides
DNA elongation
- occurs rapidly and continuously
- use DNA polymerase and excess nucleotides to make copies of DNA
- requires 3’ OH to add another nucleotide to the chain
Di-deoxynucleotides (ddNTP)
- ddNTPs stop replication
- do not have a 3’ OH for continued elongation
- usually a 1:100 ratio
*** left with DNA strands of variable length
Sanger sequencing process
- sort DNA fragments by length to see what the last nucleotide is
– the less ddNTP results in a longer strand
– higher concentration of ddNTP results in shorter strands
*** by sorting fragments by length, we can see what the last nucleotide was (line up 5’ nucleotide)
— get the template strand
Original Sanger Sequencing SetUp
- split DNA sample into 4 beakers
- Add a ddNTP into each beaker (A,T,C,G)
- Add some radioactive ddNTP into a single beaker
- Add Taq and run PCR
** separate by length in gel electrophoresis
(larger fragments do not travel as far)
– order from farthest traveled (shortest) to least traveled (longest)
***** need SEPARATE beakers bc you cannot differentiate between radioactive nucleotides
Sanger Sequencing Now
- now use fluorescent tags to distinguish ddNTPs
- only need one beaker for PCR
- also automate fragment separation
capillary gel electrophoresis
- can accelerate fragment length sorting and detection
- separates molecules by sized based on their charge-to-mass ratio
- Smaller molecules move more freely/faster through the gel than larger molecules
- molecules must be charged through tagging with a charged molecule
- DNA and RNA are charged bc each nucleotide has a charge
SanSeq Chromatogram
- unique fluorescence signal per ddNTP produces a chromatogram
ideal SanSeq Chromatogram
- variation in peak height is less than 3-fold
- peaks are evenly distributed
- peaks contain only 1 color
- absent baseline noise
- interpreted nucleotide sequence is 5’ to 3’
Nonideal SanSeq Chromatogram
- significant noise up to ~20 bps is unreliable transport
- dye blobs from unused ddNTPs
- fewer longer fragments so signal is weaker
SanSeq VS Illumina Sequencing
- Sanger sequencing is very accurate but slow compared to Illumina
Illumina Sequencing
- sequencing by synthesis
- used polymerase/ligase enzyme to incorporate nucleotides with fluorescent tag (fluorescently labeled reversible terminator)
- tags are then identified to determine the DNA sequence
Illumina Sequencing Process
- Adapter ligations attach P5 and P7 oligos to facilitate binding to flow cell
- fragments become bound somewhere in the flow cell
- locally amplify bound DNA fragments to get clusters of the same sequence
– bridge amplification creates double-stranded bridges
– double-stranded clonal bridges are denatured with cleaved reverse strands
***clusters will give off a stronger signal compared to a single fragment
We repeatedly →
1. Add nucleotide
2. Capture signal
3. Cleave fluorophore
5 step iIlumina sequencing process
- Add labeled dNTPs into flow cells
- Incorporate a complementary nucleotide
- Remove unincorporated fluorescent nucleotides
4, Capture fluorescent signal & image clusters - Remove the fluorophores and the protecting group
pair-ended sequencing
- enables both ends of the DNA fragment to be sequenced
– Because the distance between each paired read is known, alignment algorithms can use this information to map the reads over repetitive regions more precisely.
***Results in much better alignment of the reads, especially across difficult-to-sequence, repetitive regions of the genome
Nanopore Sequencing Technology
- nanopore and polymer membrane respond to electrical perturbations
*** gives us much longer reads which is important for assembling reads into a genome
** type of third-generation sequencing (TGS)
- can give long reads with no amplification
- Direct detection of epigenetic modifications on native DNA.
- sequencing through regions of the genome inaccessible or difficult to analyze by short-read platforms.
- Uniform coverage of the genome; not as sensitive to GC content as short-read platforms.
genome assembly
- process of combining the short, overlapping sequencing reads into continuous DNA sequence
– having multiple fragments that contain the same portion of the sequence improves our coverage
reads
raw sequences coming from experimentatation
contigs
continuous stretches of DNA sequence from overlapping sequencing reads