Chemometrics Flashcards
what is chemometrics
A computationally intensive, multivariate statistical analysis applied to chemical systems or processes
why is chemometrics good
reduced large data sets
identifies sample groupings
allows for better visualisation of data
isolates important variables and identifies co-variance
EDA (exploratory data analysis)
a pattern recognition technique (identifies groupings and visualises trends)
unsupervised technique
- the basis from which supervised techniques are then followed on from.
CA (cluster analysis)
groups samples into clusters based on similarity
- Aglomerative - individual samples –> cluster
- Hierarchal - cluster –> individual sample
what is the output from cluster analysis
a dendrogram - simplifies complex data but doesn’t say why the groupings exist only that they do.
PCA (principle component analysis)
- decides features of a data set that are relevant
- largest variation given highest priority (PC1)
- if that is not sufficient to describe data then repeat for PC2
- done until all data is modelled (ideal = as much data is as few PCs)
data = structure + noise
what is a principle component
the linear combinations of the original variables
Explained variance
aka scree
the structure = the explained variance and is described by PCs
the EV plot reveals the optimum number of PCs
what are scores?
the distance along each PC a sample is from the mean the sample is mapped on to a SCORES PLOT
(can be 3D = increased visualisation)
PCA loadings
the coefficients/ weights attached to the variables, so they map the VARIABLES
is the link between the scores plot and the chemistry of the samples
shows how much the original variable contributes to their PC.
LDA (linear discriminant analysis)
supervised technique
uses PCs to create classification rules
PCs = ‘discriminant functions’
scores = ‘discriminant values’
tries to minimise variation w/in groups to max separation by prioritising loadings
LDA discriminant values
samples in the same group have similar discriminant values (scores) and visa versa
caveats
- pre processing is important
- doesn’t compensate for bad data
- potential for cognitive bias
- never a sub for human interpretation