Lecture 28- Big data Flashcards
Outline the biological data science pyramid
- Big data
- Information
- Knowledge
- Insight
What populations is big data gathered from?
- DNA, RNA, proteins
- Cells
- Tissue samples
- Organisms
What does big data involve?
- Gathering late volumes of data
- Substantial variation within the data
- Integrative analysis of different types of big data reveals interactions between variables
What are the 4 main types of big data?
- Transcriptomic
- Genomic
- Proteomic
- Epigenomic
What sort of knowledge is generated using big data in biology?
- Developmental
- Physiological
- Drug safety and efficacy
- Epidemiology
- Understanding past events and predicting future risks
What is the aim of transcriptomic analyses of developmental processes, drug treatments and environmental factors?
To define the functional consequences of a specific mutation, drug treatment or other environmental change on expression of every gene
How can transcriptomic data generate and compared?
- Generated by sequencing complementary DNA copies of every mRNA
- Compare the mRNA population in 2 or more biological samples in order to identify the genes whose expression differs and is likely to be caused by an actual biological different in the same samples
What was the experimental strategy used for transcriptomic analysis (see notes)?
- mRNA is extracted
- mRNA is converted to cDNA and each cDNA molecule is sequences on an Illumina Next Generation Sequence
- Numbers of independent molecules corresponding to each specific mRNA are counted in silica
- cDNA counts = mRNA expression level
What data analysis is performed for the transcriptomic analysis (see notes)?
- Gene exhibiting differentiation expression (DE) in the compared sample types are identified, ranked and presented in a data frame
- Volcano plot: gene expression levels are plotted as log2 fold changes vs p-values
- Gene ontology and biological pathway algorithms are used to identify and visualise biological pathways involving the genes on the DE list
- Functional biological consequences of the DE genes are inferred
What was the aim, procedure and analysis of the RNAseq experiment of a differentially expressed gene list of human skin cells treated with and without glucocorticoid clobetasol propionate?
Aim: to identify genes that are regulated by the glucocorticoid in human skin cells
- RNA count data was collected for each sample and a differential expression gene list was complied
- A volcano plot identified genes exhibiting statistically significant changes in transcript abundance caused by exposure to glucocorticoid
- Statistically significance rises as the fold change increases
- The more robust the gene expression, the more statistically significant the data
What does a more robust gene expression mean to the significance of the data?
The more robust the gene expression is, the more the data becomes statistically significant
Explain the RNAseq experiment that investigating the effect of hypoxia
- Compare cells cultured in normoxic and hypoxic conditions
- RNAseq identified upregulating and downregulating genes responding to hypoxic conditions
- Hypoxia modulates the transcription of many hundreds of genes in human cells
- A volcano plot identified the most robust upregulated and downregulated genes that exhibited significant change in transcription abundance caused by hypoxia
- Information analysed using gene ontology analysis
What biological processes modulated by hypoxia-regulated genes did PANTHER gene ontology analysis identify?
Hypoxia induced genes: main roles in metabolism, development and transcription
Hypoxia repressed genes: main roles in metabolism, development, transcription and mRNA processing/splicing
What sort of analysis can identify biological processes modulated by hypoxia-regulated genes?
PANTHER gene ontology analysis
What gene expression is modulated by hypoxia and how was this identified?
- After RNAseq and gene ontology analysis, RNAi can further identify biological processes
- RNAi can inactivate the function of a TF which plays a role in responding to hypoxia
- REST is a transcriptional repressor that has been identified
- Hypoxia repressed genes require the function of REST
- REST repressed the expression of some hypoxia-responsive genes
Which 3 tools can be used to identify the genes and biological processes affected by hypoxia/used in transcriptomic analysis?
- RNAseq
- Gene ontology analysis
- RNAi