week 2: mRNA seq Flashcards

Question

calculating read depth

Answer 1

150 basepairs reads: - PE150 = paired end sequencing, 150bp reads - read pair = 2x150bp front and back = 300bp/read pair (read) total gigabase pair 1Gb = 1,000,000,000 bp (1 billion base pairs) calculation of Gb of reads - 1,000,000,000 bp / 150 bp x 2 = 3,333,333 (3,33 million) 1Gb = 3.33 M PE150 reads 3Gb (the size of human genome) = 10M PE150 reads 6Gb = 20M PE150 reads 250 basepairs reads: - PE250 = paired end sequencing, 250bp reads - read pair = 2x250bp = 500bp/read pair (read) gigabase pair - 1Gb = 1,000,000,000 bp (1 billion base pairs) calculation of Gb to reads 1,000,000,000 bp / 250 bp x 2 = 2,000,000 (2 million) 1Gb = 2M PE250 reads 200Gb = 400M PE250 reads (SP lane) 400 Gb = 800M PE250 reads (SP flow cell)

Answer 2

Gb x 3.333 = M reads M reads / 3.333 = Gb

Answer 3

Eukaryotic, non-directional: 20M/sample (6G) Eukaryotic, Directional: 30M/sample (9G) – 40M reads (12Gb) Detection of less abundant transcripts: 50M (15Gb)– 100M reads (30Gb) Prokaryotic: 2G/sample (6.6M)

Answer 4

Offered to all clients for free! Content & significant: - Data volume – whether meets the requirements - Error rate distribution, Q20/Q30 – the quality of each base - GC content distribution – whether GC and AT are equal and content is stable - Data filtering – whether contains low quality reads or reads with adapters - Mapping status – whether there is a contamination

Answer 5

QC - original data - data assessment - mapping to reference genome gene count - expression quantification quantitative analysis - differential expression analysis - GO enrichment - KEGG enrichment analysis - protein protein interaction analysis standard analysis - new transcript prediction - alternative splicing analysis - SNP and indel - transcription factors analysis

Answer 6

mapping reads to reference genome - files provided in BAM format Gene Expression Quantification & Distribution of Gene Expression Levels - In RNA-seq experiments, gene expression level is estimated by the abundance of transcripts Correlation analysis (For biological replicates only) - Correlation of the gene expression levels between biological replicates. The closer the correlation coefficient is to 1, the higher similarity the samples have - Principle Component Analysis (PCA) ---- Used to evaluate intergroup differences and intragroup sample duplication ----- can help identify and correct for batch effects or other technical variations that are not related to biological differences

Answer 7

Differential Expression Analysis & Statistics (two or more groups of samples) - The statistics of the number of differential genes (including up-regulation and down-regulation) for each comparison group at set expression threadholds (LogFold Change) threshold - Volcano Plots, Heatmaps, Venn Diagrams Functional Enrichment / Pathway Analysis - GO (gene ontology) = To annotate cellular component, molecular function and biological process of DEG - Kegg = focuses on metabolic pathways & signal transduction pathways associated with DEG - Reactome = curated database of human molecular pathways to annotate reactions, pathways, and biological process of DEG - DO (Human Disease Ontology) enrichment = to investigate the human disease and gene function related to DEG (human only) Protein Protein Interaction Analysis - mRNA analysis can identify genes that are up- or down-regulated in certain conditions, which might affect protein levels and, consequently, protein interactions

Answer 8

Novel Gene Prediction Alternative Splicing - Alternative splicing (AS) is a regulated process during gene expression that results in a single gene coding for multiple proteins - Detection of Differentially Expressed Isoforms SNP/InDel Analysis - Sequence variant found when comparing to the reference genome Fusion Gene Analysis (for tumor sample and cancer cell line) - A fusion gene is a hybrid gene formed from two previously separate genes. Fusion proteins produced by this change may lead to the development of some types of cancer Recommendation: Opt for Directional Library Prep to make full use of the analysis package: Start and stop sight, strand specificity, novel gene prediction

Answer 9

What is Novomagic? - It is an add-on function to the analysis our BI team performs to allow additional manipulation of the existing data (altering fold changes, targeting specific genes, regrouping samples, and re-visualizing charts and figures) - NovoMagic can support you to select specific group of genes, analyze gene expression, identify differentially expressed genes and perform gene function analysis. Overall, 17 small tool kits are offered In the Toolkit item. In the future, Novogene will gradually launch more toolkits on NovoMagic Do you need to purchase analysis to access Novomagic? - Yes!! You must purchase Quantification or Standard analysis to have full access to Novomagic How long is project data available? - the data on Novomagic will be preserved for 1 year.

Answer 10

lncRNA + CircRNA + mRNA + smallRNA At Novogene we can do any of these parts individually, or we can do them all together (WTS Pipeline)

Answer 11

Definition: Transcripts with lengths exceeding 200 nucleotides that are not translated into protein Characteristics: Polyadenylated (mRNA-like) or non-polyadenylated at 3′ end Can be folded into a variety of specific secondary structures which contribute to their regulatory functions lncRNAs do not have the capacity to translate into proteins. Biological Function/Significance: Regulation of gene transcription Post-transcriptional regulation Epigenetic regulation Regulation of DNA replication timing and chromosome stability

Answer 12

Location(s): Davis-US Lab (standard); San Jose-US Lab (Low Input/HMR Blood); or Beijing Lab (Standard/Globin Depletion) Strategies RNA QC: Gel Electrophoresis & Agilent 2100 Bioanalyzer Library Prep: NEB directional with rRNA depletion by Ribo-Zero Sequencing: PE150 on Illumina Novaseq6000 OR NovaSeq X Plus Recommended: 12 Gb per sample (~40 M reads) on Illumina PE150 Quote Checklist: Sample number Species Sample Origin (blood, tissue type, etc.) Any BSL concerns? Sequencing Depth Material Sent (RNA/cell pellets, etc.) Analysis (lncRNA only / circRNA only / lncRNA + circRNA) Timeline

Answer 13

Definition: Circular RNAs (circRNAs) are a type of non-coding RNA that form a covalently closed loop structure, making them distinct from linear RNAs. Characteristics: circRNAs are highly stable compared to linear RNAs due to their resistance to exonuclease degradation. They are derived from back-splicing events where a downstream splice donor is joined to an upstream splice acceptor. circRNAs are often tissue-specific and exhibit conserved sequences across species. Biological Function/Significance: circRNAs can act as microRNA sponges, sequestering miRNAs and preventing them from binding to their target mRNAs. They are involved in the regulation of gene expression and have been implicated in various diseases, including cancer and neurological disorders. Due to their stability and specific expression patterns, circRNAs are being explored as potential biomarkers for disease diagnosis and therapy.

Answer 14

Location(s): Tianjin Lab Strategies RNA QC: Nanodrop (prelim detection of conc.) --> AATI Fragment analyzer + Gel Electrophoresis Library Prep: Abclonal Directional Library Prep with linear rRNA depletion by Ribo-Zero Sequencing: PE150 on Illumina Novaseq6000 S4 Flowcell Recommended: 8 Gb per sample (~26.7 M reads) Quote Checklist: Sample number Species Sample Origin Any BSL concerns? Sequencing Depth Material Sent (RNA/cell pellets, etc.) Analysis (yes/no) Timeline

Answer 15

Definition: Transcripts with lengths between 18-40nt that are not translated into protein Characteristics: 5 'phosphate group and 3' hydroxyl group Small RNAs include microRNAs (miRNAs), small interfering RNAs (siRNAs), and Piwi-interacting RNAs (piRNAs) They are known for their high specificity in binding to target messenger RNAs (mRNAs) to regulate gene expression. Function: Small RNA plays an important regulatory role in regulating almost all events at the cellular level, including individual development, cell proliferation and differentiation, tumor occurrence and development, etc. Gene silencing (via RNA interference) and post-transcriptional regulation Regulating mRNA degradation and translation

Answer 16

Location(s): Tianjin Lab Strategies RNA QC: Nanodrop (prelim detection of conc.) --> AATI Fragment analyzer + Gel Electrophoresis Library Prep: Abclonal small RNA Library Prep for Illumina Sequencing: SE50 on Illumina Novaseq6000 Recommended: 10M reads via Illumina SE50

Answer 17

Location(s): Tianjin Lab Strategies RNA QC: Nanodrop (prelim detection of conc.) --> AATI Fragment analyzer + Gel Electrophoresis Small RNA Library Prep: NEBNext Small RNA Library Prep Sequencing - SE50 on Illumina Novaseq6000 SP Flowcell Recommended: 20 M reads on Illumina SE50 lncRNA, mRNA, circRNA Library Prep: Abclonal directional with rRNA depletion by Ribo-Zero Sequencing: PE150 on Illumina Novaseq6000 S4 Flowcell Recommended: 12 Gb per sample (~40 M reads) on Illumina PE150 WTS = lncRNA pipeline + smallRNA pipeline packaged into 1

Answer 18

Because rRNA depletion is used, all mRNA and lncRNA are captured every time What about circRNA? circRNAs do exist in prokaryotes, their prevalence and functional significance are not well understood What about smallRNA? This isn’t something a pipeline we have built out a Novogene. Prok RNA library prep size selection targets the cDNA in 250-300 bp, the cDNAs beyond of that range (including smallRNA) will also be included but should with quite insignificant percentage

Answer 19

Location(s): Beijing Lab Strategies RNA QC: Nanodrop (prelim detection of conc.) --> AATI Fragment analyzer + Gel Electrophoresis Library Prep: Abclonal directional Library Prep for Illumina with rRNA depletion Sequencing - PE150 on Illumina Novaseq6000 S4 Flowcell Recommended: 2 Gb per sample (~6.7 M reads) on Illumina PE150

Answer 20

All our Novogene Labs are considered BSL1 – only BSL restrictions are more strict in China than in US Prokaryotic RNA is only processed in China, so be sure to confirm that the bacterial RNA is considered BSL1 and is able to be shipped to China (consult the biohazard form) If there IS biohazard concerns, consult with TS on outsourcing

Answer 21

Dual RNA - Dual RNA Seq show microbes or viruses sustain themselves within host organisms on a molecular, cellular, organismal or population level Simultaneously capture all classes of coding and noncoding transcripts in both the pathogen an the host Two species are present, both identities are known Library Prep: rRNA depletion by 'Proprietary rRNA depletion kit' & AB Clonal®Fast RNA-seq Lib Prep Kit V2 for Illumina (Non-Directional(default) & Directional) Recommended Seq Depth: 12 Gb Goal: To see Host/Pathogen Interaction MetaTranscriptomics metatranscriptome refers to multiple transcriptomes across populations or communities, from natural environment samples, like sea water, soli, stool, ferment and more. It mainly studies gene expression profile of all species as a whole in each environmental sample Multiple species are present, identities are unknown Library Prep: rRNA depletion by 'Proprietary rRNA depletion kit' & AB Clonal®Fast RNA-seq Lib Prep Kit V2 for Illumina (Non-Directional(default) & Directional) Recommended Seq Depth: 6Gb Goal: To see environmental chances and microbial community interactions

Answer 22

IsoSeq technology by PacBio offers full-length transcript sequencing without the need for assembly. It captures complete isoforms and accurately identifies splice variants, which is essential for understanding complex transcriptomes. Applications: Isoform discovery, gene annotation, and alternative splicing studies. QC & Library Prep: $549/sample Sequencing: $2,899/ SMRT Cell (300Gb of raw data (NOT CCS data*)/SMART cell.) up to 10 libraries can be pooled into one SMRT cell --> yields ~30Gb raw data per sample Or $15/Gb (minimum of 30Gb/sample)

Answer 23

The Kinnex full-length RNA kit uses the MAS-Seq method to enhance throughput on PacBio platforms by concatenating cDNA molecules into longer HiFi libraries. This approach allows for high-throughput, cost-effective isoform sequencing, making it suitable for large-scale transcriptomic studies Packaged Pricing (sold by M reads): 5M reads/sample: $900/sample 10M reads/sample: $1500/sample

Answer 24

RNA Seq on Nanopore (with cDNA conversion) Nanopore sequencing with cDNA conversion enables the sequencing of RNA molecules by converting them into cDNA before sequencing. This method provides long reads that cover entire transcripts, offering insights into isoform structure and expression. Applications: Long-read transcriptomics, alternative splicing analysis, and comprehensive gene expression profiling. Up to 24 samples can be pooled per cell Average data output per cell: 75-90Gb raw data, it will be influenced by species, genome size and sample quality.

Answer 25

Direct RNA sequencing on Nanopore technology sequences RNA molecules directly, without the need for cDNA conversion. This approach preserves the native RNA structure, including modifications, and provides real-time data. Applications: RNA modification studies, real-time transcriptomics, and understanding RNA biology at a native state As there's no barcode in direct RNA library kit, only one sample could be loaded on a cell. The data output/cell is normally ranges from 5G to 8G for a QC-pass sample.

Answer 26

Transcriptome Sequencing (RNA-seq) at the single cell or nuclei level Differentiates RNA expressed by each individual cell rather than the whole tissue Very expensive $2.5-4k vs $120 - $250 per sample Coverage is based on cells captured and reads per cell

Answer 27

bulk rna seq - measures the average gene expression levels in a group of cells, tissues, or biopsies

Answer 28

Coverage (M reads) = # of captured cells x # reads per cell G = (M Reads/10)*3 Recommendations: Maximum capture is 10k cells 10X recommends a minimum of 20k reads per cell NVG recommends 30-50k reads per cell Example: 10,000 cells * 50,000 reads/cell = 500,000,000 (500M) reads 500M reads = 150Gb

week 2: mRNA seq Flashcards

(54 cards)