big data Flashcards
what is big data
datasets too large or complex to process using traditional data processing methods
characteristics of big data
- large volume, many types with variation
- interactions between variables
- advanced stats to analyse data
- unbiased experiments
types of big data
OMICs = genomics/epigenomics etc
Microscopy = cell morphology, migration etc
Human physiology = activity, health etc
purpose of big data
Development of science
Physiology
Drug safety
Epidemiology
Disease
Past and future events
how could you interpret consequences of gene expression changes
gene ontology and bio pathway algorithms
what does single cell RNA sequencing do
collects transcriptome of 100s-10,000s individual cells → sequence genomes
- tells which genes are expressed by particular cells
- cell type-specific gene expression
changes
- cell lineage/differentiation trajectories
- tissue composition
what is a UMAP plot
- Each dot is a cell
- Close = similar, far away = more different
- Each colour marks ‘clusters’ of similar cells
what is a GWAS
- identifies genes affecting disease risk
- Manhattan Plots maps DNA sequence variants associated with a disease at genome-scale
Combining disease risk-associated genetic variant data with gene expression data can:
- Identify the gene(s) whose expression levels are linked to the SNP allele
- Identify the cell type(s) in which the the genetic variant(s) have functional consequences
- Reveal how those variants might regulate gene expression