Transcriptomic approaches in biomedical research Flashcards
What is bioinformatics?
“Bioinformatics is a subdiscipline of biology and computer science concerned with the acquisition, storage, analysis, and dissemination of biological data, most often DNA and amino acid sequences.”
As an interdisciplinary field of science, bioinformatics combines biology, computer science, information engineering, mathematics and statistics to analyze and interpret the biological data. Bioinformatics has been used for in silico analyses of biological queries using mathematical and statistical techniques ”
what can we use bioinformatics for?
Big data analysis homology modeling phylogenetics omics studies and systems biology (RNAseq) Functional annotation Protein structure prediction sequence alignment (WGS)
Provide a brief overview of NGS
- Genomics
- Whole-Genome Sequencing (WGS)
- Exome Sequencing
- Targeted Sequencing
- De novo Sequencing - Transcriptomics
- Total RNA Sequencing
- mRNA Sequencing
- Small NRA, Noncoding RNA Seq - Epigenomics
- ChIP Sequencing
- Methylation Sequencing
How do you prepare a DNA library?
- Extract nucleic acids from blood, tissue, saliva, etc.
- Shear dsDNA into fragments (300bp)
- Attach adapters to fragments
Start in lab by taking a sample (blood, tissue, cells). Purify to get out DNA, then fragment into smaller pieces. Then add adapter sequences at the ends of these fragments which will facilitate the sequencing of these fragments. These are small (300bp). Small fragment sequencing essentially.
How do you then sequence the DNA library? (PART 1)
This is done via a sequencing machine.
- DNA libraries deposited on flowcell
- bridge amplification
- amplified to form clusters
- Sequencing machine processes a flowcell containing lanes
- Each lane may contain multiple samples (indexed with a DNA barcode contained in adapters)
Once you have frag library, load onto flowcell and flood it with dna fragments. They attach to flowcell. Then perform bridge amplification : modified version of PCR whereby you amplify single fragments into clusters which are clonal copies of our individual attached fragments. Then load flowcell onto sequencing machine and perform sequencing by synthesis reaction- synthesising new dna and recording each nucleotide base in a cyclical fashion. This works as each base has a diff colour dye so can read sequence of bases according to dye. End up with massive pile of short read sequences
What is sequencing by synthesis? (SBS) (PART 2)
- Annealing of sequencing primer
- Sequence each nucleotide 1 cycle at a time in a controlled manner
- Modified 4 bases (ATCG) with reversable terminators AND a different fluorescent dye tag
Anneal sequencing primer which has a binding site in the adapters of those fragments. With each cycle, flood flowcell with coloured nucleotides. This is because nucleotides have a reversible terminator group, so polymerase can only incorporate one base and then stops. Can then cleave off the terminator and run the next cycle helps keep it controlled.
Sequencing the library (PART 3)
- Single nucleotide incorporation (DNA polymerase)
- Flowcell wash
- Image the 4 bases (digital photograph)
- Cleave chain terminator chemical group and dye with enzyme
Repeat (n) times for full length sequence
Sequencing the library (PART 4)
- Camera sequentially images all 4 bases on the surface of the flowcell each cycle
- Each cycle image is converted to a nucleotide base call (A or C or G or T)
- Cycle number anywhere between 50 – 250 nucleotide base pairs, depending on desired sequence length.
What is RNAseq?
- RNA-seq experiments use the total RNA (or mRNA) from a collection of cells or tissue
- RNA is first converted to cDNA prior to library construction
- Next-generation sequencing of RNA samples determine which genes are actively expressed.
- Single experiment can capture the expression levels of thousands of genes.
What can be used as a measure of gene abundance?
during RNA seq The number of sequencing reads produced from each gene can be used as a measure of gene abundance
what can RNAseq be used for?
- Quantification of the expression levels
- Calculation of the differences in gene expression of all genes in the experimental conditions
- With appropriate analysis, RNA-seq can be used to discover distinct isoforms of genes are differentially regulated and expressed.
Short read RNA seq is not best for this as you have to assemble these transcripts from very small pieces. Which is tricky, better if you have longer reads because you essentially have the entire transcript in one read. Can do with short, but not as accurate as long read technologies.
What is RNAseq?
RNA-seq experiments use the total RNA (or mRNA) from a collection of cells or tissue.
RNA is first converted to cDNA prior to library construction.
Next-generation sequencing of RNA samples determine which genes are actively expressed.
Single experiment can capture the expression levels of thousands of genes.
What are the steps involved in the RNAseq design and work flow?
Experimental design - what is the question you want to address?
preparation of RNA - Rna extraction and QC
Library preparation - cDNA, Fragmentation, adaptors, amplificatiob and QC
Sequencing- Sequencing platforms Illumina
Data analysis
What is the info in an RNAseq data file?
Data comes back in standard text files format.
1) Sequence ID - corresponds to each gene that was sequences on. Its location on the flow cell. Can track every sequence to where it originated.
2) Nucleotide sequence
3) Strand
4) Per base quality score
What is involved in RNAseq data analysis?
-Align short sequence reads (the fastq files) to the reference genome
-Specialist bioinformatic alignment programs
-Alignment file. In this file we are interested in counting the number of mapped reads in sets of defined genomic intervals
- The read counts that are quantified and are proportional to gene expression level