Chemometrics Flashcards
What is chemometrics?
- Computationally intensive
- Multivariate (many variables) statistical analysis
- Applied to chemical systems or processes
What can chemometrics do?
- Reduce complex datasets
- Identify and quantify sample groupings
- Optimise experimental parameters
- Isolate important variables and identify covariance
- Provide reproducible measures of data
- Allow for better visualisation of data
What is univariate?
Singular variable
What is the disadvantage of univariate?
Too simplistic approach for complex data
* wouldnt be able to see an outlier when using univariate
What is covariant analysis used for?
Used to explore relationships between different variables to look for patterns
What are the four chemometric categories?
- Design of Experiments
- Exploratory Data Analysis
- Classification
- Regression
What is Design of Experiments (DOE)?
- Used to work out which collection method might be best
- Relates to experimental setup
- Will affect evidence collection, storage, instrument selection, parameter optimisation
What is regression analysis?
- Based on y = mx + c linear relationship
- Maps the effect of multiple independent variables (predictors) upon dependent variables (respone)
- Allows prediction of quantitative sample properties
What is Exploratory Data Analysis (EDA)?
- Dimensionality reduction
- Pattern recognition technique - identify grouping
- Visualise trends that may otherwise have gone unoticed
- Determination of sample similarity in complex data
What does an unsupervised technique mean?
Exploring the data without any prior assumptions or knowledge of the samples
What is a supervised technique?
Building classification rules for known sample grouping (from EDA)
What are the two most commonly used EDA techniques?
- Cluster Analysis (CA)
- Principal Component Analysis (PCA)
What is Cluster Analysis (CA)?
- Unsupervised technique
- Samples grouped into clusters based on calculated distance (measure of their similarity)
- Either agglomerative or hierachical
- Output is a dendrogram
- Good intial technique - simplifies complex data
- Not limited to quantitative data
- Visualisation of relationships
- Can tell you that there are groupings but not why
What is agglomerative?
Taking individual samples and grouping them together to form clusters
What is hierachial (HCA)?
Opposite of agglomerative, taking a cluster and filter down into individual samples
What is PCA?
- Principal Component Analysis
- Unsupervised technique
- Assesses all variables and desides which are relevant then determines which variables are correlated
What is a Principal Component (PC)?
- When the algorithm finds a correlation between the data
- The PC that describes the largest variation between the samples will be given the highest priority (PC1)
When are supervised techniques run?
Usually after EDA
Why is more than one PC made?
- If the first PC is not sufficient to describe the spread of data, the calculation is repeated to find PC2
- The process is continused until all the variability within the dataset has been accounted for and modelled
How do you get the score of a PC?
- The distance along each PC from the mean gives a sample its ‘score’
- Each sample will have a different score for each PC
What does an explained variance show?
The optimum number of PCs - majority of the information is in the first few PCs
What is a scores plot?
- Is a map of the samples
- Each point is one sample
- Similar samples cluster
What are 3D scores plots used for?
- Increased visualisation
- Plot differing combinations of PCs for enhanced discrimination
What are PCA loadings?
- The link between the scores plot and the chemistry of the samples
- Whereas a scores plot is a map of samples, a loadings plot is the map of variables - why PCA is better than CA
- Tells you why samples are grouping
- Can either be +ve or -ve
- Values indicate weightings for a particular variable
- Quicker to identify variables using PCA
- Objective
What is LDA?
- Linear Discriminant Analysis
- Supervised chemometric technique (used when we have knowledge of the samples)
- Uses PCs to create classification rules
- Loadings are calculated to maximise separation between known groups
- PCs are termed discriminant functions
- Scores are called discriminant values
What does LDA predict?
- Looks for the most variation between samples and applies loading weight wherever variation is found
- Tries to minimise variation within groups to maximise separation between groups by prioritising loadings
- Similar samples are given similar discriminant values and those from different groups, different discriminant values
What are the disadvantages of chemometrics?
- Cannot compenstate for bad data
- Sample size
- Can only reduce subjectivity not remove it
- Not a substitute for human interpretation
- Preprocessing is important but less is more
- Success in research doesnt mean success in casework