Transcriptomic approaches in biomedical research Flashcards

1
Q

What is bioinformatics?

A

“Bioinformatics is a subdiscipline of biology and computer science concerned with the acquisition, storage, analysis, and dissemination of biological data, most often DNA and amino acid sequences.”

As an interdisciplinary field of science, bioinformatics combines biology, computer science, information engineering, mathematics and statistics to analyze and interpret the biological data. Bioinformatics has been used for in silico analyses of biological queries using mathematical and statistical techniques ”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what can we use bioinformatics for?

A
Big data analysis
homology modeling
phylogenetics
omics studies and systems biology (RNAseq)
Functional annotation
Protein structure prediction
sequence alignment (WGS)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Provide a brief overview of NGS

A
  1. Genomics ​
    - Whole-Genome Sequencing (WGS)​
    - Exome Sequencing​
    - Targeted Sequencing​
    - De novo Sequencing ​
  2. Transcriptomics​
    - Total RNA Sequencing​
    - mRNA Sequencing​
    - Small NRA, Noncoding RNA Seq​
  3. Epigenomics ​
    - ChIP Sequencing​
    - Methylation Sequencing
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How do you prepare a DNA library?

A
  1. Extract nucleic acids from blood, tissue, saliva, etc.​
  2. Shear dsDNA into fragments (300bp)​
  3. Attach adapters to fragments

Start in lab by taking a sample (blood, tissue, cells). Purify to get out DNA, then fragment into smaller pieces. Then add adapter sequences at the ends of these fragments which will facilitate the sequencing of these fragments. These are small (300bp). Small fragment sequencing essentially.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do you then sequence the DNA library? (PART 1)

A

This is done via a sequencing machine.

  • DNA libraries deposited on flowcell
  • bridge amplification
  • amplified to form clusters
  • Sequencing machine processes a flowcell containing lanes​
  • Each lane may contain multiple samples (indexed with a DNA barcode contained in adapters)

Once you have frag library, load onto flowcell and flood it with dna fragments. They attach to flowcell. Then perform bridge amplification : modified version of PCR whereby you amplify single fragments into clusters which are clonal copies of our individual attached fragments. Then load flowcell onto sequencing machine and perform sequencing by synthesis reaction- synthesising new dna and recording each nucleotide base in a cyclical fashion. This works as each base has a diff colour dye so can read sequence of bases according to dye. End up with massive pile of short read sequences​

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is sequencing by synthesis? (SBS) (PART 2)

A
  • Annealing of sequencing primer​
  • Sequence each nucleotide 1 cycle at a time in a controlled manner​
  • Modified 4 bases (ATCG) with reversable terminators AND a different fluorescent dye tag

Anneal sequencing primer which has a binding site in the adapters of those fragments. With each cycle, flood flowcell with coloured nucleotides. This is because nucleotides have a reversible terminator group, so polymerase can only incorporate one base and then stops. Can then cleave off the terminator and run the next cycle helps keep it controlled.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Sequencing the library (PART 3)

A
  • Single nucleotide incorporation (DNA polymerase)​
  • Flowcell wash​
  • Image the 4 bases (digital photograph)​
  • Cleave chain terminator chemical group and dye with enzyme​

Repeat (n) times for full length sequence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Sequencing the library (PART 4)

A
  • Camera sequentially images all 4 bases on the surface of the flowcell each cycle​
  • Each cycle image is converted to a nucleotide base call (A or C or G or T)​
  • Cycle number anywhere between 50 – 250 nucleotide base pairs, depending on desired sequence length.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is RNAseq?

A
  • RNA-seq experiments use the total RNA (or mRNA) from a collection of cells or tissue​
  • RNA is first converted to cDNA prior to library construction​
  • Next-generation sequencing of RNA samples determine which genes are actively expressed. ​
  • Single experiment can capture the expression levels of thousands of genes.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What can be used as a measure of gene abundance?

A

during RNA seq The number of sequencing reads produced from each gene can be used as a measure of gene abundance​

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what can RNAseq be used for?

A
  • Quantification of the expression levels​
  • Calculation of the differences in gene expression of all genes in the experimental conditions​
  • With appropriate analysis, RNA-seq can be used to discover distinct isoforms of genes are differentially regulated and expressed​.

Short read RNA seq is not best for this as you have to assemble these transcripts from very small pieces. Which is tricky, better if you have longer reads because you essentially have the entire transcript in one read. Can do with short, but not as accurate as long read technologies.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is RNAseq?

A

RNA-seq experiments use the total RNA (or mRNA) from a collection of cells or tissue​.
RNA is first converted to cDNA prior to library construction​.
Next-generation sequencing of RNA samples determine which genes are actively expressed. ​
Single experiment can capture the expression levels of thousands of genes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the steps involved in the RNAseq design and work flow?

A

Experimental design - what is the question you want to address?
preparation of RNA - Rna extraction and QC
Library preparation - cDNA, Fragmentation, adaptors, amplificatiob and QC
Sequencing- Sequencing platforms Illumina
Data analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the info in an RNAseq data file?

A

Data comes back in standard text files format.

1) Sequence ID - corresponds to each gene that was sequences on. Its location on the flow cell. Can track every sequence to where it originated​.
2) Nucleotide sequence
3) Strand
4) Per base quality score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is involved in RNAseq data analysis?

A

-Align short sequence reads (the fastq files) to the reference genome​
-Specialist bioinformatic alignment programs​
-Alignment file. In this file we are interested in counting the number of mapped reads in sets of defined genomic intervals​
- The read counts that are quantified and are proportional to gene expression level

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are genomic intervals?

A

protein gene exons or other genomic intervals (e.g. lncRNA or micoRNA) of interest related to the experiment question.

17
Q

What is the difference between WGS and RNAseq?

A
wgs = looking to find variants in DNA
RNAseq = looking to read counting; count the number of reads in a particular gene.
18
Q

Outline the steps in RNASeq data analysis

A
Raw reads (FASTQ)
Alignment
Quantification 
Differential Expression
Functional interpretation
19
Q

How can you display RNAseq results?

A

A volcano plot is a typical way to present RNAseq results​

Plot of log2 fold changes versus P-value for significance (-log10)​

​Shows the genes that are up (green) and down regulated

20
Q

What is GSEA?

A

Gene Set Enrichment Analysis (GSEA) is a computational method that determines whether an a priori defined set of genes shows statistically significant, concordant differences between two biological states

Instead of focusing on individual genes in a long list, the focus is put on a gene set.

Gene set enrichment analysis uses a priori gene sets that have been grouped together by their involvement in the same biological pathway.

21
Q

What is GSEA?

A

u have sets of genes that belong to a biological pathway e.g apoptosis, cell motility. So rather than focusing on individual genes, you can look at entire gene sets in this type of analYsis. ​

Lots of diff ways to do this. But essentially you are feeding in your gene set pathways and your RNA seq data and youre generating these tables where youre looking at the sig upregulation/downregulation of biological pathways. ​

22
Q

What is cryptococcal meningitis (CM)?

A

A fungal infection and inflammation of the membranes covering the spinal cord and brain.​

Commonest cause of meningitis in Africa​

Caused by the yeast Cryptococcus neoformans (C gattii)​

180,000 deaths worldwide per year​

Most common meningitis in HIV+ patients​

Incidence not decreasing despite antiretroviral therapy (ART) ​

23
Q

What is PBMC?

A

A peripheral blood mononuclear cell (PBMC) is any peripheral white blood cell that has a round nucleus. These cells consist of lymphocytes (T cells, B cells, NK cells) and monocytes​.

Cell easily taken from a donor and purified for study​.
Non invasive way to get biological material from healthy donor.
Perform RNA seq on these cells to give the transcriptome profile of these cells ​