Biomarker Discovery Flashcards
Biomarkers
Can be used for diagnostic, prognosis (progression of a disease), predictive of the best treatment, monitoring of clinical response.
Can produce false positive and false negative however can still be useful to eg reduce group
There can be univariate marker (a single marker is indicative) or multivariate (multiple markers form a signature)
Univariate can be found with statistics, multivariate are usually found with multivariate
Also need to take into consideration if it is interpretable or not how we found the biomarker
=> proteomics and metabolics
This reflect more closely what is actually happening in the body compared to the rest.
How to measure proteins
Mass spectometry: can seperate on size, polarity etc
some will pass the tube and other will stick to the walls
gel eletrophoresis: have a gel with electrodes, will extend more or less the strand and can compare.
How to measure metabolites
NMR (Nuclear magnetic resonance): will have a graph with peaks, each peak is for one compound that can contain more or less of hydrogen. 2 dimensional array with also retention time. pick the peaks we find in all samples and align the peaks. We need to assign the peaks, ie we don’t know what corresponds to what ! After preprocessing and normalization( run order will affect the stickiness and there are dilution effects. Median Fold Change used often)we need to figure it out. This is done manually to try to overlap the best. Some softwares can help but not completly automated. Example: IKnife
Mass spectometry: can seperate on size, polarity etc
some will pass the tube and other will stick to the walls
Supervised vs unsupervised method
for multivariate biomarkers
supervised used the class of the sample to help produce the results. Classification is usually supervised.
Clustering with all the variables is unsupervised but if the variables are selected it becomes supervised
Supervised will often give good results even on random data: compare to models built on random ! have a validation set!
Selecting variables for classification
Fixed criteria: p value, fold change. Useful for biological interpretation but not for predictive models -> tend to carry the same kind of information
forward selection: select best variable to classify on its own and add 1 gene at a time that improve
backward selection: classify with all then remove one at a time that have lil effect on the classification
The problem is that there can be many variables and the results are poorly reproducible and not biologically interpretable
Clustering: cluster of similar variables, choose only 1 variable per cluster.
Could also use network/pathway analysis or PCA