Big Data Flashcards
what is big data?
refers to data sets too large or complex to process using traditional data processing methods
- large volumes of data, often comprising multiple data types
- there is substantial variation within the data which is complex to analyse
- integrative analysis of different types of big data reveals interactions between variables
who analyses big data?
- computational methods and advanced statistics are used by bioinformaticians to analyse data
are big data experiments hypothesis-based or hypothesis-generating?
they are unbiased and hypothesis-generating
- they have huge power for discovery
- no need to choose and exclude markers in advance
where can big data be generated from?
- DNA, RNA, protein molecules
- cells, tissues
- organisms
what are OMICs in big data?
- genomics (DNA) and transcriptomics (RNA) - rely on sequencing of nucleic acids
- short read (Illumina) and long read (PacBio, Nanopore) sequencing
- RNA-seq - Proteomics and metabolomics
- mass spectometry - epigenomics
- ChIP-Seq, chromatin conformation
how can microscopy be used to generate big data?
- high throughput imaging
- fluorescent tagging in live cells
- fixed cell staining
- automated image analysis (machine learning/AI)
what big data can microscopy generate?
- cell shape/cell type
- subcellular protein localisation
- cell differentiation
- cell contractility and migration -> wound healing, sclerosis, metastasis
- infection status
- response to drugs
how can big data on human physiology/health be generated?
- activity tracking
- questionnaires
- blood samples
- whole body imaging
- electronic health records
what knowledge does big data contribute to biology?
- Development
- Physiology
- Drug safety and efficacy
- Epidemiology – identifies relationships between environmental exposures / genetic predispositions and disease risk -> reduce exposures
- Disease pathobiology – understand how interactions between exposures and predispositions affect health -> more effective diagnosis and treatment
- Understanding of past events and prediction of future risks
what is transcriptomics?
studies gene expression and mRNA
- to determine the functional consequences of something on the expression of every gene in the tissue/organ/particular cell type of interest, or on a developmental stage
may be:
- wildtype vs mutant
- treated vs untreated
- untreated vs environmental change
what is an experimental strategy in transcriptomics? what steps does it involve
- Extract mRNA from whole tissue or cell population, convert to cDNA
- Prepare a sequencing ‘library’ containing all cDNA molecules in each biological sample
- Sequence on an Illumina Next Generation Sequencing (NGS) machine.
- Run series of computational steps (‘pipeline’ = quality control and
normalising/standardising the data) and make statistical comparisons - cDNA counts reflect mRNA expression level
identify genes exhibiting differential expression in the compared cell types
what plot can be used to display big data on transcriptomics?
volcano plot
- each dot represents a gene
- fold-change on x-axis is how much gene expression is increases/decreases
- significance is the Y-axis showing statistical significance of the difference in gene expression
- red dots = downregulated genes
- green dots = upregulated genes
what methods can help to interpret the consequences of gene expression changes?
gene ontology and biological pathway algorithms:
- These algorithms can be ran on the data to interpret consequences of gene expression changes
- Differentially expressed genes are fed into algorithms which extract information from databases about the functions of those genes and summarise it
how can the transcriptome of 100-10,000s of individual cells be collected?
single cell RNA-seq:
1. Dissect tissue, treat with enzymes
2. Single cell suspension – contains a mixture of cell types from tissue
3. Prepare libraries and sequence the transcriptome of every cell
what plot can be used to display the transcriptome of thousands of individual cells? what do these plots give insights into?
UMAP plots:
- Each dot is a cell
- Close = similar, far away = more different
- Each colour marks ‘clusters’ of similar cells
Potential insights into:
- Which genes are expressed by particular cells
- Cell type-specific gene expression changes
- Cell lineage/differentiation trajectories
- Tissue composition changes