Lecture 6: Chemometrics Flashcards

1
Q

What is chemometrics?

A

Computationally intensive, multivariate (many variables) statistical analysis, applied to chemical systems or processes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What can chemometrics do?

A
  • Reduce complex datasets
  • Identify and quantify sample groupings
  • Optimise experimental parameters
  • Isolate important variables and identify covariance.
  • Provide reproducible measures of data
  • Allows for better visualisation of data
  • Can isolate the important variables
  • Removes subjectivity, but it isn’t wholly non-subjective as humans need to interpret the data.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

When did chemometrics first get used in journals?

A

it wasn’t used in journals util 1980’s due to skeptism.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Where is chemometrics routinely used?

A

Routinely used in industry for process optimisation and quality control, e.g. food and pharmaceutical
to maximise output and quality with minimal cost.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

When did chemometrics start getting used in forensic science?

A

2009

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What highlighted the need for chemometrics?

A

NAS reported the need for a ‘statistical framework’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the categories of chemometrics?

A
  • Design of experiments
  • Exploratory data analysis
  • Regression
  • Classification
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the purpose of DOE?

A
  • Make experiments more effective
  • Achieve maximum data
  • Improve efficiency, quality and reproducibility
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is regression analysis?

A
  • Chemometric version of a calibration curve
  • Based on a y = mx + c linear relationship for multiple variables
  • Maps the effect of multiple independent variables (predictors) upon dependent variable (response)
  • Allows prediction of quantitative sample properties
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is dimensionality reduction

A

Reducing data that has many variables into just a few measures called principle components.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What form of chemometrics used dimensionality reduction?

A

Exploratory Data Analysis (EDA)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the main features of EDA?

A
  • Dimensionality reduction
  • Pattern recognition technique
  • Visualise trends that may have gone unnoticed.
  • Determination of sample similarity in complex data
  • Unsupervised
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does pattern recognising technique do?

A

It identifies groupings within data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is an unsupervised technique?

A

Exploring the data without any prior assumptions or knowledge of the samples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a supervised technique?

A

Supervised is when you’re building classification rules once you know the groupings of the sample and how they compare.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the most commonly used EDA techniques?

A
  • Cluster Analysis (CA)
  • Principle Component Analysis (PCA)
17
Q

What type of technique is cluster analysis?

A
  • Unsupervised
  • Form of EDA
18
Q

What does cluster analysis do?

A

Samples grouped into clusters based on calculated distance (similarity)

19
Q

What are the two types of cluster analysis?

A

Agglomerative
Hierarchical

20
Q

What is the agglomerative technique?

A
  • This is when you go from individual samples to clusters
  • The samples become more dissimilar
  • Ones grouped close to 0 are the most similar
21
Q

What is hierachical?

A

This is when you go from clusters to individual samples

22
Q

How is data from cluster analysis presented?

A

On a dendrogram

23
Q

What do analysts need to decide for cluster analysis?

A

They need to decide the stopping rules to determine the number of clusters arbitrarily

23
Q

What is the problem with cluster analysis?

A

it can only tell you there are groupings and not why.

24
Q

Why isn’t cluster analysis entirely non-subjective?

A

Because the analyst decides the stopping point

25
Q

What technique is principle component analysis?

A
  • Unsupervised technique
  • Form of EDA
  • Dimensionality reduction technique
26
Q

What does PCA do?

A

Assesses all variables within a dataset (eg spectrum) and deecides which are relevant, then determines which variables are correlated

27
Q

What does the algorithm do in PCA when it finds a correlation?

A

Wherever the algorithm finds correlation it defines this as a Principal Component (PC)

28
Q

What PC is given the highest priority?

A

The PC that describes the largest variation between samples will be given the highest priority, i.e. PC1

29
Q
A