Chemometrics Flashcards

1
Q

What is chemometrics?

A
  • Computationally intensive
  • Multivariate (many variables) statistical analysis
  • Applied to chemical systems or processes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What can chemometrics do?

A
  • Reduce complex datasets
  • Identify and quantify sample groupings
  • Optimise experimental parameters
  • Isolate important variables and identify covariance
  • Provide reproducible measures of data
  • Allow for better visualisation of data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is univariate?

A

Singular variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the disadvantage of univariate?

A

Too simplistic approach for complex data
* wouldnt be able to see an outlier when using univariate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is covariant analysis used for?

A

Used to explore relationships between different variables to look for patterns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the four chemometric categories?

A
  • Design of Experiments
  • Exploratory Data Analysis
  • Classification
  • Regression
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is Design of Experiments (DOE)?

A
  • Used to work out which collection method might be best
  • Relates to experimental setup
  • Will affect evidence collection, storage, instrument selection, parameter optimisation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is regression analysis?

A
  • Based on y = mx + c linear relationship
  • Maps the effect of multiple independent variables (predictors) upon dependent variables (respone)
  • Allows prediction of quantitative sample properties
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is Exploratory Data Analysis (EDA)?

A
  • Dimensionality reduction
  • Pattern recognition technique - identify grouping
  • Visualise trends that may otherwise have gone unoticed
  • Determination of sample similarity in complex data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does an unsupervised technique mean?

A

Exploring the data without any prior assumptions or knowledge of the samples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a supervised technique?

A

Building classification rules for known sample grouping (from EDA)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the two most commonly used EDA techniques?

A
  • Cluster Analysis (CA)
  • Principal Component Analysis (PCA)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is Cluster Analysis (CA)?

A
  • Unsupervised technique
  • Samples grouped into clusters based on calculated distance (measure of their similarity)
  • Either agglomerative or hierachical
  • Output is a dendrogram
  • Good intial technique - simplifies complex data
  • Not limited to quantitative data
  • Visualisation of relationships
  • Can tell you that there are groupings but not why
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is agglomerative?

A

Taking individual samples and grouping them together to form clusters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is hierachial (HCA)?

A

Opposite of agglomerative, taking a cluster and filter down into individual samples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is PCA?

A
  • Principal Component Analysis
  • Unsupervised technique
  • Assesses all variables and desides which are relevant then determines which variables are correlated
17
Q

What is a Principal Component (PC)?

A
  • When the algorithm finds a correlation between the data
  • The PC that describes the largest variation between the samples will be given the highest priority (PC1)
18
Q

When are supervised techniques run?

A

Usually after EDA

19
Q

Why is more than one PC made?

A
  • If the first PC is not sufficient to describe the spread of data, the calculation is repeated to find PC2
  • The process is continused until all the variability within the dataset has been accounted for and modelled
20
Q

How do you get the score of a PC?

A
  • The distance along each PC from the mean gives a sample its ‘score’
  • Each sample will have a different score for each PC
21
Q

What does an explained variance show?

A

The optimum number of PCs - majority of the information is in the first few PCs

22
Q

What is a scores plot?

A
  • Is a map of the samples
  • Each point is one sample
  • Similar samples cluster
23
Q

What are 3D scores plots used for?

A
  • Increased visualisation
  • Plot differing combinations of PCs for enhanced discrimination
24
Q

What are PCA loadings?

A
  • The link between the scores plot and the chemistry of the samples
  • Whereas a scores plot is a map of samples, a loadings plot is the map of variables - why PCA is better than CA
  • Tells you why samples are grouping
  • Can either be +ve or -ve
  • Values indicate weightings for a particular variable
  • Quicker to identify variables using PCA
  • Objective
25
Q

What is LDA?

A
  • Linear Discriminant Analysis
  • Supervised chemometric technique (used when we have knowledge of the samples)
  • Uses PCs to create classification rules
  • Loadings are calculated to maximise separation between known groups
  • PCs are termed discriminant functions
  • Scores are called discriminant values
26
Q

What does LDA predict?

A
  • Looks for the most variation between samples and applies loading weight wherever variation is found
  • Tries to minimise variation within groups to maximise separation between groups by prioritising loadings
  • Similar samples are given similar discriminant values and those from different groups, different discriminant values
27
Q

What are the disadvantages of chemometrics?

A
  • Cannot compenstate for bad data
  • Sample size
  • Can only reduce subjectivity not remove it
  • Not a substitute for human interpretation
  • Preprocessing is important but less is more
  • Success in research doesnt mean success in casework