Dostie Lectures Flashcards

1
Q

First Generation Sanger DNA Sequencing

A
  • Fred Sanger first developed the dideoxy method of DNA sequencing in 1977. It is DNA sequencing with chain-terminating inhibitors.

Mechanism:

1)
- Uses “dideoxy” nucleotides (ddNTPs) to stop DNA synthesis. These special nucleotides (ddA, ddT, ddG, and ddC) prevent DNA from extending further when they are added to the growing DNA strand.
- each ddNTP is labeled with a different colour (fluorescent dye) so each nucleotide type (A, T, C, G) has its own color. When a ddNTP is added to a DNA chain, that chain end, and the last nucleotide is colour-coded by the dye on the ddNTP,
- Sanger sequencing can be done in small reaction wells of plates.
- Each reaction contains all four ddNTPs and a regular dNTPs, so DNA synthesis can occur but will randomly stop wherever a ddNTP is added.

2)
Capillary Electrophoresis:
- After reach reaction, the different DNA fragments (each ended with a fluorescent ddNTP) are passed through tiny capillary tubes containing polymers. Smaller DNA fragments move faster and longer move slower.
- As each DNA fragment exits the tube, a laser detects the fluorescent dyes, identifying the colour (thus the nucleotide) at the fragments end. The machine reconstructs the original DNA sequence based on the order in which the colour appears.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Top-down DNA sequencing

A
  • “Chromosome walking” or “primer walking” (clone-based, sequential linear assembly)
  • Involves one piece of DNA at a time in linear, step-by-step manner.
  • Starting points: beginning sequencing from one end of a DNA fragment, get 100 bp to 1 kb of sequence.
  • Design new primer: Order a new primer at the end of new sequence, specifically designed to bind at the end of the newly sequenced portion.
  • Continue sequencing: Use this primer to extend the sequencing by another overlapping 1 kb of DNA, creating a continuous sequence.
  • Repeat as many times as possible (cycle of sequencing and designing new primers until reach the end of the DNA region or hit an obstacle)
  • Reach a GC-stretch that you can’t sequence, you may need to sub-clone (isolate and sequence smaller portions of this challenging section) to move forward.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Shotgun DNA Sequencing

A
  • Named after the randomness of shotgun firing patterns. (random sampling method)
  • Random Fragment sequencing: DNA is randomly broken into many small fragments, which are individually sequenced.
  • Special computer programs analyze these sequences, identifying overlapping ends between many different fragments.
  • Creating Contigs: Fragments with overlapping sequences are joined together to form continuous sequences called “contigs”
  • Forming genome assemblies: contigs are further pieced together, often using homology with known sequences, to build larger ‘assemblies’ or ‘genome assemblies’ which represent the whole genome.
  • Efficient for large-scale sequencing but relies heavily on computational tools.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Explain the process of the whole genome beginning to get sequenced.

A
  • Francis Collins - 1985, a project aimed to sequence the entire human genome proposed. It was to be sequenced using the “top-down” or clonal approach. By 1998, less than 5% of human genome had been sequenced, so it was clear needed alternative method to meet timeframe. A proposal submitted to NIH to use shot-gun sequence, but this was rejected. Craig Venter founded company and said in 3 years he will sequence the human genome with shotgun DNA sequencing and software designed by TIGR. In 1999-2000 it became clear he would have a draft, so NIH changed to shotgun sequencing.
  • Craig Venter generated a drag of the human genome by: whole genome random shotgun sequencing and computational assembly of sequenced fragments, the map is from 5 individuals, and it has 5-fold coverage.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

From the human genome sequence, what did we learn about ourselves?

A
  • More than 1.4 million of SNPs were identified.
  • Numerous other types of genome variations were detected: SNPs, indels, SVs, and CNVs can involve: protein-coding genes, ncRNAs, intergenic DNA segments.
  • The human genome is very different from one individual to another, even in a healthy population, the human genomic landscape showed variation in genes, transposable elements, GC content, CpG islands, Recombinant rates. This variation can inform on functions (e.g., the lack of repetitive DNA sequences at the HOX gene clustered points to their coordinated regulation).
  • The human genome encodes approximately 25,000 to 30,000 protein-coding genes. Our genes are more complex: alternative splicing, alternative transcription termination, alternative transcriptional start sites.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How is the proteome of humans more complex than invertebrates?

A
  • We have more versions of each type of protein domain and motif (DNA-binding motifs, protein-protein interaction, phosphorylation)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Nearly 1/2 the human genome derives from transposable elements, these fall in 4 main categories and most are inactive. What are they?

A

1) LINEs:
- Long interspersed nuclear elements: Non-LTR retrotransposons. Most are non-active and have evolved away from the original sequence. The active LINES encode a reverse transcriptase that make DNA copies of the LINE mRNA that can reinsert itself into the genome.

2) SINEs:
- (Short interspersed nuclear elements: Short repetitive non-coding sequences with roles in genome organization, evolution, and regulation of gene expression.

3) Retrovirus-like elements:
- Human endogenous retroviruses (HERVs) are remnants of ancestral infections that were fixed in germlines. Most are inactive but some have been linked to autoimmune diseases and cancer. Some may exert immunosuppressive activities and confer antiviral resistance.

4) DNA transposon fossils:
- The human genome does not contain any active DNA transposons (autonomous or non-autonomous). Therefore, they are called “fossils.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What were some benefits of pyrosequencing that lead it to be developed?

A

1) Minimized the amount of DNA required for sequencing.

2) Minimized the level of purity (e.g., protein contaminants) requires to get good sequences.

3) Minimized the amount of time required for a sequencing reaction.

4) Maximized the read length.

  • Allowed a large amount of DNA to be sequenced in one run.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are some characteristics of Pyrosequencing?

A
  • It measures the release of pyrophosphate as proxy to nucleotide incorporation.
  • reads originally up to 400 bases per reaction (now up to 1 kb)
  • originally up to 200, 000 reactions per run (now 1 million)
  • compared to the original 100 bases by manual sequencing on gels, this is 10 million times more powerful.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the steps (mechanism) of pyrosequencing?

A

1) Prepare Sequencing Library:
- Specific short DNA sequencers called adapters are added to both ends of the DNA fragments during library preparation, these are important for attaching DNA fragments to sequencing platform, serving as primers for amplification and sequencing reactions.
- Adding different adapters at either ends of DNA molecules, uses DNA ligase to join adapters to fragments and design adapters with complementary sequences so the DNA ends bind correctly.

2) Clonally Amplify (emPCR, so amplified product individually):
- Each DNA molecule is encapsulated in a separate droplet of oil and water, creating isolated environments, PCR amplifications occurs simultaneously across many droplets.
- emulsion PCR (emPCR): allows for individual amplification of DNA strands in a single tube. DNA fragments are mixed with oil to create tiny droplets (emulsions) where PCR occurs. Each droplet contains a single DNA molecule, allowing for separate amplification
- The water phase in. the emulsion contains all necessary reagents (DNA polymerase, primers, nucleotides) for PCR, enabling the individual amplification for each DNA strand without the need for separate reactions.

3) Detect Nucleotide Incorporation:
- Loading sequencing beads.
- Sequencing reagents.
- Placement in machine.

4) Computation analysis:
- After sequencing, the generated signals (light bursts from nucleotide incorporation) are processed by software to determine the DNA sequence and quality scores

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Ilumina (currently the most popular approach)

A

1) Prepare sequencing library with different adapters at either ends.
- This is to facilitate binding of DNA fragments to the flow cell, serve as primers for subsequence amplification and sequencing, enable multiplexing, allowing multiple samples to be sequenced in one run.

2) Clonally amplify: Bridge PCR amplification on glass flow cells.
- The prepared library is loaded onto a glass flow cell where the DNA fragments are clonally amplified.
- Bridge PCR: each DNA fragment binds to a complementary oligonucleotide on the flow cell surface, forming a bridge.
- DNA polymerase amplifies the bound fragments, creating clutters of identical DNA molecules. Each cluster represents a single original DNA fragment.

3) Detect nucleotide incorporation.
- Reagent application: sequencing reagents containing fluorescently labeled dNTPs are passed over the cell.
- All 4 fluorescently labeled dNTPs are presented simultaneously. (A, T, C, G)
- Illumina uses fluorescently labeled reversible terminator dNTPs (only one dNTP can be added). The incorporated nucleotides emit a specific fluorescent signal when added to the cluster.
- TCEP is used to remove both the fluorophore and the 3’-O-axidomethyl group, which regenerates the 3’-OH and allows the cycle to be repeated.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Ion Torrent

A
  • A derivative of pyrosequencing
  • Ion Torrent NGS method used semi-conductors to detect base incorporation.
  • Developed by same people as pyrosequencing.
    —> But instead looks at pH change in each well of seq chip, and detects DNA base incorporation like this, rather than with light emission.
    —-> Because every time you release pyrophosphate, you also release proton.
    —> pH change is proportional to base incorporation.
  • Similar steps
    —> Seq library preparation is proportional to base incorporation.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Central Dogma of Molecular Biology

A

DNA (sequencing: first-generation and NGS) → Transcription → mRNA (RNA sequencing) → Translation → Protein (mass spectrometry).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the importance of RNA sequencing?

A

1) Phenotypic Understanding: Not all phenotypes arise from genetic changes; RNA-seq helps investigate gene expression.

2) Detection of Isoforms: Allows detection of different RNA isoforms and measurement of RNA expression levels.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Describe the RNA Sequencing Workflow

A
  1. Isolate RNA:
    - Extract RNA from the biological sample, ensuring high quality and purity.
  2. Fragment RNA:
    - Break down the RNA into shorter segments to facilitate sequencing.
  3. Reverse Transcription to cDNA:
    - Convert RNA into complementary DNA (cDNA) using reverse transcriptase.
  4. Library Preparation and PCR Amplification:
    - Add adapters to the cDNA and amplify the library through PCR.
  5. Next-Generation Sequencing (NGS):
    - Sequence the cDNA library using NGS technology.
  6. Decode cDNA Sequences:
    - Analyze the sequenced data to obtain the corresponding nucleotide sequences.
  7. Map Sequencing Reads:
    - Align the sequencing reads to the transcriptome or genome to identify where they originated.
  8. Normalization:
    - Adjust the data to account for biases and variations in sequencing depth across samples.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Describe: gene expression data analysis and validation (typically performed after high-throughput techniques like RNA sequencing (RNA-seq))

A

1) Data Analysis & Investigations:

  • Comparative Analysis:
    o Compare overall gene expression between different groups (e.g., treated vs. control) using Principal Component Analysis (PCA).
  • Differentially Expressed Genes (DEGs):
    o Identify DEGs by comparing normalized counts across samples, visualized using expression heatmaps or volcano plots.
  • Pathway Enrichment:
    o Analyze enrichment in biological pathways related to DEGs.
  • Regulatory Analyses:
    o Study regulatory mechanisms affecting gene expression.
  • Alternative Splicing:
    o Investigate different splicing events affecting mRNA diversity.

2) Validation Steps:

  • mRNA Expression Validation:
    o Confirm mRNA levels using quantitative PCR (qPCR).
  • Protein Expression Validation:
    o Validate protein levels through Western blot analysis.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is single-cell RNA sequencing? (scRNA-seq)

A
  • Purpose: Reveals heterogeneity of gene expression within a sample, allowing for the study of:
    o Different cell types within a sample.
    o Subtypes within a cell type.
    o Rare subsets of cells.
18
Q

What are the steps in Single-Cell RNA Sequencing?

A
  1. Separating into Single Cells:
    o Create an emulsion with one cell and one bead per droplet.
    o Each bead contains:
    —-> Cell Barcode: Labels all mRNAs from that specific cell.
    —-> Unique Molecular Identifier (UMI): Labels individual mRNAs.
    —-> Capture Sequence: Typically, a poly(T) sequence to bind mRNAs.
  2. Emulsion Reverse Transcription:
    o Each droplet has enzymes for:
    —> Cell lysis.
    —> RNA binding.
    —> Reverse transcription and oligo priming.
    —-> Template switching and extension.
  3. cDNA Amplification:
    o Break the emulsion, purify newly synthesized cDNA, and amplify by PCR.
  4. Library Preparation for NGS:
    o Add P5 and P7 sequences (for bridge amplification) and a sample index to the cDNA.
  5. Cluster Amplification and Sequencing:
    o Perform cluster amplification and sequencing, following standard NGS protocols.
19
Q

How would you analyze and process the data for scRNA-seq data?

A

–> Data Processing:
- Like bulk RNA-seq but includes three identifiers:
- Sample index.
- 10X barcode (cell barcode).
- UMI (mRNA barcode).

o Align sequences to the genome and count reads for each mRNA in each cell of each sample.

o Quality Control: Crucial for ensuring reliable results.

20
Q

Downstream Analysis in Single-Cell RNA Sequencing (scRNA-seq)

A

1) Dimensionality Reduction Plots:
- PCA (principle component analysis)
- t-SNE (t-distributed stochastic neighbour embedding)
- UMAP (uniform manifold approximation and projection)
All help visualize high dimensional data.

2) Cell-Type Clustering (based on similartities)

3) Differentially Expressed Genes (DEGs):
- Identify within specific cell populations to understand gene expression differences.

4) Differentiation Trajectories:
- analyze the progression of cell types over time or stages.

5) Pseudobulk analysis

6) Intra and inter-sample analyses.
- Intra: examines heterogeneity within a single sample.
- Inter: compares expression profiles across multiple samples.

21
Q

What are challenges of scRNA-seq?

A

o Produces noisier data compared to bulk RNA sequencing.

o Requires additional control steps for data quality and interpretation.

o The high resolution of scRNA-seq may not always be necessary, depending on the specific scientific question being addressed.

22
Q

What is an example of scRNAseq in the lab?

A
  • Goal: understand changes in lungs caused by COVID-19 compared to control patients.
  • Single-cell RNA sequencing allows the identification of the subtypes of cells that are different and how this change post-infection.
  • Identified major changes in the epithelial and myeloid components of the lungs.
  • Identified changes in the types of macrophages present in the lungs.
  • Validated this difference by immunofluorescence in tissue sections.
  • Understanding these differences allows us to understand the immune response differences between fatal covid patients and control patients.
23
Q

Spatial RNA sequencing

A
  • Reveals heterogeneity of gene expression within a sample AND spatial organization of these populations.
  • Each tissue section is placed in a capture area covered in probes as in step 1 of scRNAseq, but instead of a cell barcode, a spatial barcode is attributed.
24
Q

Summarize each technique (RNA seq, scRNA, spatial sequencing)

A
  • RNA seq
    o Cheaper
    o Less labour intensive
    o Average expression
    o Fast analysis
  • scRNA sequencing
    o more expensive
    o multi-step
    o single-cell expression
    o multiple QC steps
    o complex analysis
  • spatial sequencing
    o expensive
    o requires tissue cuts.
    o spatial resolution
    o can be single-cell
    o complex analysis
25
Q

How is NGS stored?

A

o Most genomic data will be driven and funded by health care, private, how can researchers gain access to this data?

o Global Alliance for Genomics & Healthcare (GA4GH) is an organization that sets policies and technical standards for responsible sharing of genomic data.

o In research labs, genomic data must be made available to the scientific community.

1) GEO (gene expression omnibus):
* International public repository that archives and freely distributes microarray, next-gen sequencing submitted by research community.
* Mandatory to submit genomics data to a public repository before publishing to scientific journal.

2) SRA (sequence read archive):
* The largest publicly available repository of raw high throughput sequencing data, maintained by NCBI.

3) Other archiving sites:
* Genome structure database (GSDB)
o First comprehensive repository of Hi-C datasets and 3D genome stuctures.
* The 4D nucleome Web Portal

o Contains many different types of experiments including 96 sequencing datasets.

o Raw sequencing files fresh off sequencer are in BCL (binary) form and converted to FASTQ.

o Sequencing file from NovaSeq 6000 (illumine) will have up to 20 billion entries, need computational experience to process and analyze these files, and need to transform data – interdisciplinary collaboration is key in these types of studies.

26
Q

How can you display genomic data for analysis?

A

o On a genome browser: a graphical interface to display genomic data and information.

o The UCSC browser.
–> Select genome want to look at, enter the gene of interest, can see genes and gene predictions, gene names, different datasets, see enhancers (peaks).

27
Q

What can you learn from genomic data analysis?

A

1) Use marks to locate enhancers:
* H3K4me1 is a mark of enhances in actively transcribed genes.
* H3K27Ac is a mark of enhancers.
* DNAse hypersensitivity is a sign that the chromatin is open.

2) Location of TSS
o Pol II associates preferentially with highly expressed genes.
o H4K20me1 identifies with TSSs of actively transcribed genes even better than pol II.

3) Level of transcription of the gene:
o H3K4me1 at enhancers mark actively transcribed genes.
o H3K36me3 on exons marks the body of actively transcribed genes.

4) Can look at given histone marks over your favorite gene in many different cell types or tissues.

5) Can correlate the presence of marks or their distribution with the expression of your favourite gene.

6) Can look at the distribution of a given mark across several genes by scrolling up and down chromosomes.

28
Q

What is a summary of things that you learn learn when analyzing the genome?

A
  • H3K27Ac is a mark of enhancers
  • DNAse hypersensitivity is a sign that the chromatin is open
  • H3K4me1 at enhancers marks actively transcribed genes
  • H4K20me1 identifies the TSSs of actively transcribed genes even better than pol II.
  • H3K36me3 marks the body of actively transcribed genes.
  • Pol II associates preferentially with highly expressed gene
29
Q

What are the key-features of SMRT DNA Technology (3rd generation sequencing technology)?

A

1) No PCR Amplification Needed: Directly sequences single DNA molecules without the need for amplification.

2) Long Reads: Produces very long reads, which can provide more informative data compared to shorter reads.

3) Speed: Capable of sequencing entire human genomes in under 1 hour and at a cost of less than $100.

30
Q

What are the drawbacks of SMRT DNA Technology (3rd generation sequencing technology)?

A
  • High Error Rate: Currently has a high error rate, particularly for single nucleotide polymorphisms (SNPs).
31
Q

What are the technology details of SMRT DNA Technology?

A

1) Technology details
- SMRT Chips: Utilizes specialized SMRT chips for sequencing.
- Zero Mode Waveguides (ZMWs):
- Contains thousands of tiny wells (ZMWs) with a diameter of a few nanometers.
- Each ZMW has a single DNA polymerase molecule at the bottom, allowing for the sequencing of one DNA molecule per ZMW.
- Detection Volume:
- The small diameter of ZMWs prevents light wavelengths from passing through the waveguide’s aperture but allows light to enter 30-30 nm inside, creating a very small detection volume (30 zeptoliters).
- This setup minimizes background fluorescence noise from unincorporated dNTPs, enhancing detection accuracy.

2) Excitation and Emission:
- Both processes are performed from the same side of the chip, allowing for efficient detection.

3) Incorporation Time:
- The enzyme holds the incorporated nucleotide in the ZMW’s detection volume for tens of milliseconds, a significantly longer time than the average diffusion of free nucleotides.
- This results in the detection of a flash of bright light, indicating successful incorporation, due to the low background noise.

  • Until the development of third generation sequencing, read lengths remained under 1 kb (454 pyrosequencing): most technologies yield read lengths under 100 bp (illumine).
  • It is hard (impossible) to map short sequence reads in repetitive regions such that contigs cannot be constructed.
32
Q

Describe Next-generation sequencing in the Lab.

A

1) Identify all human genome variants, identify cancer-specific molecular variants, sequencing the genome of other living organisms.

2) Genome-wide analyses:
—> Gene expression:
* RNA-seq
* CAGE
* GRO-seq
* Primer extension (identify splice variants)

3) Epigenomics:
* DNAsel-seq
* Bisulfite-seq, MeDIP-seq
* MRE-seq (DNA methylation)
* CHIP-seq (histone modifications, bound proteins)

4) 3D genome organization
* 4C-seq, 5C, Hi-C, Hi-C capture, GAM

5) Two important genome-wide projects:
 ENCODE
 4D Nucleome project

o Can be done in cell populations or single cells.
o Most common way to conduct genome-wide analysis is with cell populations.
o Sequencing in single cells can provide more information, you can identify new cell types (e.g. stem cells) or driver mutations in tumors.

33
Q

Describe Next Generation Sequencing & Covid.

A
  • Shotgun metagenomics.
  • Sequencing of nasal swab samples, you get information on the sequence of the virus causing the infection, the types of other micro-organisms also present in the nasal cavity, the transcriptome of the inflamed epithelial cells and activated immune cells (e.g., tells you what the patient’s immune system is doing in response of the virus).
  • Help identify and track the virus as it spreads.
  • This could help understand how SARS-CoV-2 causes disease and why it affects some people more than others.
34
Q

Describe next generation sequencing and cancer diagnosis and treatments.

A

1) Cancer Mutations:

  • Solid tumors and blood cancers often have multiple mutations.
  • Mutations arise from cell division errors, faulty DNA repair, or external mutagens.
  • Types of mutations:
    Passenger: No effect on cancer cells.
    Driver: Cause clonal expansion of cancer cells.

2) Tumor Heterogeneity:

  • Cancers are heterogenous with multiple clones, each having distinct mutation loads.
  • Mutations evolve over time through clonal evolution.
  • Metastatic tumors may have different mutation profiles from primary tumors.

3) Importance of Frequent Testing:
- Clonal evolution requires regular testing to guide treatment.

4) Next-Generation Sequencing (NGS):
- Allows faster and more accurate diagnosis, even with small biopsies.
- Personalized Precision Medicine: Tailors treatment based on patient or cancer cell genes.
- NGS Panels: Target known mutations/translocations in specific cancer types.
- Can use fresh tissue, formalin-fixed tissue, or liquid biopsies (DNA from tumors in the bloodstream).

5) Precision Oncology:

  • NGS helps identify more molecular targets for therapy.
  • Molecular therapies (e.g., immunotherapy) are replacing chemotherapy.

6) Immune Checkpoint Inhibitors:

  • Approved antibodies: PD-1, PD-L1, CTLA-4.
  • Ipilimumab: First approved checkpoint inhibitor (anti-CTLA-4) for metastatic melanoma in 2011.
  • PD-1 and PD-L1 inhibitors treat cancers like non-small cell lung cancer, - Hodgkin’s lymphoma, renal cell cancer, and more.
35
Q

Describe Next Generation Sequencing and Prenatal Testing

A

—-> NGS in Non-Invasive Prenatal Testing (NIPT)

Purpose: Determines the risk of genetic abnormalities in the fetus.

Method:

  • Uses cell-free DNA (cfDNA) from fetal DNA fragments circulating in the mother’s blood.
  • DNA fragments are shed by the placenta and are typically shorter than 200 bp.
  • No need for long sequence reads.
  • Detectable as early as 9-10 weeks of gestation.

Common Uses:

  • Screens for aneuploidies (missing/extra chromosomes) such as:
    Trisomy 21 (Down syndrome), Trisomy 18, Trisomy 13.
  • Can also detect chromosome deletions, duplications, and single gene variants in some cases.

Maternal DNA:

  • Since both fetal and maternal cfDNA are analyzed, the test may reveal a genetic condition in the mother.
36
Q

What is metagenomics?

A
  • The study of genetic material taken directly from the environment without first amplifying in culture (e.g., soil, sea water, human gut).
37
Q

What is the origin of metagenomics?

A

Metagenomics originated from Carl Woese’s discovery of rRNA as a marker of taxonomy, which led to the identification of the archaebacteria kingdom. Since rRNA sequences are highly conserved within species, even one base difference can indicate a distinct organism. Metagenomics, defined as the application of modern genomics without isolating or cultivating individual species, initially focused on studying the diversity of environmental microbial communities. In a notable study, Craig Venter and Hamilton Smith identified 148 new bacterial phylotypes and 1.2 million new genes in ocean samples near Bermuda, revealing bacterial diversity based on sampling location. This method, which doesn’t require DNA cloning or large amounts of DNA, has also uncovered the functional connections between microbial communities, including how viruses contain sequences related to microbial physiology and may act as gene reservoirs that drive genetic diversity in marine environments. The development of next-generation sequencing (NGS) has further enhanced metagenomics by enabling faster genome assembly and reducing sequence gaps, thanks to longer sequence reads.

38
Q

Human Microbiome Project (HMP)

A
  • HMP1: Characterize the microbiomes of 300 healthy individual across 5 sites om the human body:
    o Nasal passages
    o Oral cavity
    o Skin
    o Gastrointestinal track
    o Urogenital track
  • Establish a resource to determine if there is a core healthy microbiome.
  • We can be uniquely identified by out microbiome. Our microbiome is stable. Our gut microbiome is the most-stable can be used to uniquely identify people even after one year.
39
Q

iHMP (Integrative Human Microbiome Project)

A
  • characterize the human host for 3 microbiome-linked conditions.
    o Pregnancy & preterm birth:
    o Onset IBD (irritable bowel disease)
    o Onset of Type 2 diabetes.
40
Q

Gut Microbiota and Metabolic Diseases (T2D)

A

1) Role in T2D and Obesity:

Gut microbiota influences type 2 diabetes (T2D) and other metabolic conditions.
Contains trillions of microbes, forming a complex ecosystem.

2) Metabolites and Metabolism:

Microbiota secretes metabolites that affect metabolism.
LPS (from Gram-negative bacteria) is linked to increased risk of T2D.

3) Glucose Metabolism and Insulin Resistance:

Gut metabolites influence enteroendocrine cells, which secrete hormones regulating insulin sensitivity, glucose tolerance, fat storage, and appetite.

4) Microbiome and Blood Metabolites:

15% of human blood metabolites interact with the gut microbiome.
Goal: Treat T2D with probiotics instead of drugs or insulin.

5) Human Microbiome as a Life-Saving Ecosystem:

Fecal Microbiota Transplantation (FMT) treats recurrent Clostridium difficile infections.
Example: A mother developed obesity after receiving FMT from her daughter.

6) Metagenomic Analysis:

Not limited to bacteria, uses NGS sequencers and complex pipelines.
Applied in advanced studies like sequencing nasal swabs for COVID-19.

41
Q

Origin of synthetic genomics

A

1) Synthetic Biology: Creates new biological parts or systems based on those found in nature.

2) Synthetic Genomics:
- Involves genetically modifying existing organisms or creating completely artificial genetic material.
- Generates DNA sequences not found in nature and introduces them into organisms.

3) Key Developments:
- Artificial Bacterial Genome and life forms with artificial genomes (e.g., created by Craig Venter’s company, Viridor).

4) Unnatural Base Pairs (UBP):
- 2012 Romesberg group: Designed UBP using two artificial nucleotides (“d5SICS” and “dNaM”).
- UBPs are amplified by PCR and function like natural base pairs, expanding the genetic alphabet to six letters.

5) Synthetic Life Forms:
- 2014: E. coli can replicate and propagate a plasmid containing UBPs.
- Expands potential amino acids from 20 to 172 with a six-letter genetic code

42
Q
A