big data Flashcards
what are the four different kinds of OMICs data and how are they viewed?
genomics - DNA
transcriptomics - RNA
epigenomics - chromatin/where proteins bind to DNA
^^^acquired through next generation sequencing using illumina sequencing machines. RNA often converted to cDNA
proteomics - acquired by mass spectrometry
how is microscopy used for big data?
high throughput imaging = a moving stage imaging loads of samples
AI can be used to analyse the images
analyses fluorescent tagging in live cells, fixed cell staining, automated image analysis
Can tell us about cell type, differentiation, pathological processes, migration
how would one investigate factor Xs effects on gene expression in sample Y in four steps?
Sequence mRNA from both the control group and the group that received factor X, because we want to look at gene expression
Convert to cDNA and prepare a sequencing library (sequences to be analysed)
Amount of cDNA represents amount of mRNA
Run data through pipeline (computational steps to analyse it)
make a plot e.g. volcano plot to look at fold change
when looking at gene expression and transcriptonomics, what is meant by fold change?
the difference in gene expression between tested and control groups
what plot might be used to look at changes in gene expression (fold change)?
volcano plot, a type of scatter-plot that is used to quickly identify changes in large data sets composed of replicate data. It plots significance versus fold-change on the y and x axes, respectively.
So this volcano plot looks at the down and up-regulation of genes
single cell RNA sequencing - what 3 things is it often used to investigate?
Great for telling which genes are expressed by which cells
How a cell’s gene expression changes over time and differentiation
Tissue composition changes - e.g. proportion off immune cells in one sample compared to a disease one
what plot is used to show results of single cell RNA sequencing?
UMAP - where each individual dot is a cell
Each colour marks ‘clusters’ of similar cells with similar transcriptional profiles
Can see which genes are expressed by particular cells and if cell-type specific gene expression changes
Can also see how cells change over time
Trajectories can show how change has occurred from certain cells
what is a GWAS study?
used for finding risk alleles - versions of genes that contribute to causing a disease, they’re good if you don’t know what to look for…
examine a panel of SNPs in a genome for an association with disease of interest
It looks for differences in allelic frequency between disease and control groups
Studies must be very large to be statistically sound
how are results of GWAS study represented?
Manhattan plots - along X axis is genomic position, all of the dots are SNPs plotted to where they are in the genome, along Y axis is degree of association so the higher up the higher chance of association
remember - the SNP itself may not be part of the allele causing the problem, just might be linked to (close to) the gene involved
what might you combine a GWAS study with?
Can combine GWAS results with other data like RNA sequencing to identify the cell types in which genetic variants cause a functional difference
what is the significance threshold for a Manhattan plot?
less than 5 x 10^-8
example of big data - what was the UK biobank?
500,000 adults
Lots of measurements taken and tests such as anatomical, physiological, biochemical
Followed over time where some develop diseases
Baseline data is then studied and compared with follow up data to discover new disease associations
There is a social gradient with life expectancy, poor neighbourhoods have a greater burden of ill-health than wealthy ones