Dostie Lectures Flashcards
First Generation Sanger DNA Sequencing
- Fred Sanger first developed the dideoxy method of DNA sequencing in 1977. It is DNA sequencing with chain-terminating inhibitors.
Mechanism:
1)
- Uses “dideoxy” nucleotides (ddNTPs) to stop DNA synthesis. These special nucleotides (ddA, ddT, ddG, and ddC) prevent DNA from extending further when they are added to the growing DNA strand.
- each ddNTP is labeled with a different colour (fluorescent dye) so each nucleotide type (A, T, C, G) has its own color. When a ddNTP is added to a DNA chain, that chain end, and the last nucleotide is colour-coded by the dye on the ddNTP,
- Sanger sequencing can be done in small reaction wells of plates.
- Each reaction contains all four ddNTPs and a regular dNTPs, so DNA synthesis can occur but will randomly stop wherever a ddNTP is added.
2)
Capillary Electrophoresis:
- After reach reaction, the different DNA fragments (each ended with a fluorescent ddNTP) are passed through tiny capillary tubes containing polymers. Smaller DNA fragments move faster and longer move slower.
- As each DNA fragment exits the tube, a laser detects the fluorescent dyes, identifying the colour (thus the nucleotide) at the fragments end. The machine reconstructs the original DNA sequence based on the order in which the colour appears.
Top-down DNA sequencing
- “Chromosome walking” or “primer walking” (clone-based, sequential linear assembly)
- Involves one piece of DNA at a time in linear, step-by-step manner.
- Starting points: beginning sequencing from one end of a DNA fragment, get 100 bp to 1 kb of sequence.
- Design new primer: Order a new primer at the end of new sequence, specifically designed to bind at the end of the newly sequenced portion.
- Continue sequencing: Use this primer to extend the sequencing by another overlapping 1 kb of DNA, creating a continuous sequence.
- Repeat as many times as possible (cycle of sequencing and designing new primers until reach the end of the DNA region or hit an obstacle)
- Reach a GC-stretch that you can’t sequence, you may need to sub-clone (isolate and sequence smaller portions of this challenging section) to move forward.
Shotgun DNA Sequencing
- Named after the randomness of shotgun firing patterns. (random sampling method)
- Random Fragment sequencing: DNA is randomly broken into many small fragments, which are individually sequenced.
- Special computer programs analyze these sequences, identifying overlapping ends between many different fragments.
- Creating Contigs: Fragments with overlapping sequences are joined together to form continuous sequences called “contigs”
- Forming genome assemblies: contigs are further pieced together, often using homology with known sequences, to build larger ‘assemblies’ or ‘genome assemblies’ which represent the whole genome.
- Efficient for large-scale sequencing but relies heavily on computational tools.
Explain the process of the whole genome beginning to get sequenced.
- Francis Collins - 1985, a project aimed to sequence the entire human genome proposed. It was to be sequenced using the “top-down” or clonal approach. By 1998, less than 5% of human genome had been sequenced, so it was clear needed alternative method to meet timeframe. A proposal submitted to NIH to use shot-gun sequence, but this was rejected. Craig Venter founded company and said in 3 years he will sequence the human genome with shotgun DNA sequencing and software designed by TIGR. In 1999-2000 it became clear he would have a draft, so NIH changed to shotgun sequencing.
- Craig Venter generated a drag of the human genome by: whole genome random shotgun sequencing and computational assembly of sequenced fragments, the map is from 5 individuals, and it has 5-fold coverage.
From the human genome sequence, what did we learn about ourselves?
- More than 1.4 million of SNPs were identified.
- Numerous other types of genome variations were detected: SNPs, indels, SVs, and CNVs can involve: protein-coding genes, ncRNAs, intergenic DNA segments.
- The human genome is very different from one individual to another, even in a healthy population, the human genomic landscape showed variation in genes, transposable elements, GC content, CpG islands, Recombinant rates. This variation can inform on functions (e.g., the lack of repetitive DNA sequences at the HOX gene clustered points to their coordinated regulation).
- The human genome encodes approximately 25,000 to 30,000 protein-coding genes. Our genes are more complex: alternative splicing, alternative transcription termination, alternative transcriptional start sites.
How is the proteome of humans more complex than invertebrates?
- We have more versions of each type of protein domain and motif (DNA-binding motifs, protein-protein interaction, phosphorylation)
Nearly 1/2 the human genome derives from transposable elements, these fall in 4 main categories and most are inactive. What are they?
1) LINEs:
- Long interspersed nuclear elements: Non-LTR retrotransposons. Most are non-active and have evolved away from the original sequence. The active LINES encode a reverse transcriptase that make DNA copies of the LINE mRNA that can reinsert itself into the genome.
2) SINEs:
- (Short interspersed nuclear elements: Short repetitive non-coding sequences with roles in genome organization, evolution, and regulation of gene expression.
3) Retrovirus-like elements:
- Human endogenous retroviruses (HERVs) are remnants of ancestral infections that were fixed in germlines. Most are inactive but some have been linked to autoimmune diseases and cancer. Some may exert immunosuppressive activities and confer antiviral resistance.
4) DNA transposon fossils:
- The human genome does not contain any active DNA transposons (autonomous or non-autonomous). Therefore, they are called “fossils.”
What were some benefits of pyrosequencing that lead it to be developed?
1) Minimized the amount of DNA required for sequencing.
2) Minimized the level of purity (e.g., protein contaminants) requires to get good sequences.
3) Minimized the amount of time required for a sequencing reaction.
4) Maximized the read length.
- Allowed a large amount of DNA to be sequenced in one run.
What are some characteristics of Pyrosequencing?
- It measures the release of pyrophosphate as proxy to nucleotide incorporation.
- reads originally up to 400 bases per reaction (now up to 1 kb)
- originally up to 200, 000 reactions per run (now 1 million)
- compared to the original 100 bases by manual sequencing on gels, this is 10 million times more powerful.
What are the steps (mechanism) of pyrosequencing?
1) Prepare Sequencing Library:
- Specific short DNA sequencers called adapters are added to both ends of the DNA fragments during library preparation, these are important for attaching DNA fragments to sequencing platform, serving as primers for amplification and sequencing reactions.
- Adding different adapters at either ends of DNA molecules, uses DNA ligase to join adapters to fragments and design adapters with complementary sequences so the DNA ends bind correctly.
2) Clonally Amplify (emPCR, so amplified product individually):
- Each DNA molecule is encapsulated in a separate droplet of oil and water, creating isolated environments, PCR amplifications occurs simultaneously across many droplets.
- emulsion PCR (emPCR): allows for individual amplification of DNA strands in a single tube. DNA fragments are mixed with oil to create tiny droplets (emulsions) where PCR occurs. Each droplet contains a single DNA molecule, allowing for separate amplification
- The water phase in. the emulsion contains all necessary reagents (DNA polymerase, primers, nucleotides) for PCR, enabling the individual amplification for each DNA strand without the need for separate reactions.
3) Detect Nucleotide Incorporation:
- Loading sequencing beads.
- Sequencing reagents.
- Placement in machine.
4) Computation analysis:
- After sequencing, the generated signals (light bursts from nucleotide incorporation) are processed by software to determine the DNA sequence and quality scores
Ilumina (currently the most popular approach)
1) Prepare sequencing library with different adapters at either ends.
- This is to facilitate binding of DNA fragments to the flow cell, serve as primers for subsequence amplification and sequencing, enable multiplexing, allowing multiple samples to be sequenced in one run.
2) Clonally amplify: Bridge PCR amplification on glass flow cells.
- The prepared library is loaded onto a glass flow cell where the DNA fragments are clonally amplified.
- Bridge PCR: each DNA fragment binds to a complementary oligonucleotide on the flow cell surface, forming a bridge.
- DNA polymerase amplifies the bound fragments, creating clutters of identical DNA molecules. Each cluster represents a single original DNA fragment.
3) Detect nucleotide incorporation.
- Reagent application: sequencing reagents containing fluorescently labeled dNTPs are passed over the cell.
- All 4 fluorescently labeled dNTPs are presented simultaneously. (A, T, C, G)
- Illumina uses fluorescently labeled reversible terminator dNTPs (only one dNTP can be added). The incorporated nucleotides emit a specific fluorescent signal when added to the cluster.
- TCEP is used to remove both the fluorophore and the 3’-O-axidomethyl group, which regenerates the 3’-OH and allows the cycle to be repeated.
Ion Torrent
- A derivative of pyrosequencing
- Ion Torrent NGS method used semi-conductors to detect base incorporation.
- Developed by same people as pyrosequencing.
—> But instead looks at pH change in each well of seq chip, and detects DNA base incorporation like this, rather than with light emission.
—-> Because every time you release pyrophosphate, you also release proton.
—> pH change is proportional to base incorporation. - Similar steps
—> Seq library preparation is proportional to base incorporation.
Central Dogma of Molecular Biology
DNA (sequencing: first-generation and NGS) → Transcription → mRNA (RNA sequencing) → Translation → Protein (mass spectrometry).
What is the importance of RNA sequencing?
1) Phenotypic Understanding: Not all phenotypes arise from genetic changes; RNA-seq helps investigate gene expression.
2) Detection of Isoforms: Allows detection of different RNA isoforms and measurement of RNA expression levels.
Describe the RNA Sequencing Workflow
- Isolate RNA:
- Extract RNA from the biological sample, ensuring high quality and purity. - Fragment RNA:
- Break down the RNA into shorter segments to facilitate sequencing. - Reverse Transcription to cDNA:
- Convert RNA into complementary DNA (cDNA) using reverse transcriptase. - Library Preparation and PCR Amplification:
- Add adapters to the cDNA and amplify the library through PCR. - Next-Generation Sequencing (NGS):
- Sequence the cDNA library using NGS technology. - Decode cDNA Sequences:
- Analyze the sequenced data to obtain the corresponding nucleotide sequences. - Map Sequencing Reads:
- Align the sequencing reads to the transcriptome or genome to identify where they originated. - Normalization:
- Adjust the data to account for biases and variations in sequencing depth across samples.
Describe: gene expression data analysis and validation (typically performed after high-throughput techniques like RNA sequencing (RNA-seq))
1) Data Analysis & Investigations:
- Comparative Analysis:
o Compare overall gene expression between different groups (e.g., treated vs. control) using Principal Component Analysis (PCA). - Differentially Expressed Genes (DEGs):
o Identify DEGs by comparing normalized counts across samples, visualized using expression heatmaps or volcano plots. - Pathway Enrichment:
o Analyze enrichment in biological pathways related to DEGs. - Regulatory Analyses:
o Study regulatory mechanisms affecting gene expression. - Alternative Splicing:
o Investigate different splicing events affecting mRNA diversity.
2) Validation Steps:
- mRNA Expression Validation:
o Confirm mRNA levels using quantitative PCR (qPCR). - Protein Expression Validation:
o Validate protein levels through Western blot analysis.