Chemometrics Flashcards

1
Q

what is chemometrics

A

A computationally intensive, multivariate statistical analysis applied to chemical systems or processes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

why is chemometrics good

A

reduced large data sets
identifies sample groupings
allows for better visualisation of data
isolates important variables and identifies co-variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

EDA (exploratory data analysis)

A

a pattern recognition technique (identifies groupings and visualises trends)
unsupervised technique
- the basis from which supervised techniques are then followed on from.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

CA (cluster analysis)

A

groups samples into clusters based on similarity
- Aglomerative - individual samples –> cluster
- Hierarchal - cluster –> individual sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is the output from cluster analysis

A

a dendrogram - simplifies complex data but doesn’t say why the groupings exist only that they do.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

PCA (principle component analysis)

A
  1. decides features of a data set that are relevant
  2. largest variation given highest priority (PC1)
  3. if that is not sufficient to describe data then repeat for PC2
  4. done until all data is modelled (ideal = as much data is as few PCs)
    data = structure + noise
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is a principle component

A

the linear combinations of the original variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Explained variance

A

aka scree
the structure = the explained variance and is described by PCs
the EV plot reveals the optimum number of PCs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what are scores?

A

the distance along each PC a sample is from the mean the sample is mapped on to a SCORES PLOT
(can be 3D = increased visualisation)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

PCA loadings

A

the coefficients/ weights attached to the variables, so they map the VARIABLES
is the link between the scores plot and the chemistry of the samples

shows how much the original variable contributes to their PC.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

LDA (linear discriminant analysis)

A

supervised technique
uses PCs to create classification rules
PCs = ‘discriminant functions’
scores = ‘discriminant values’
tries to minimise variation w/in groups to max separation by prioritising loadings

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

LDA discriminant values

A

samples in the same group have similar discriminant values (scores) and visa versa

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

caveats

A
  • pre processing is important
  • doesn’t compensate for bad data
  • potential for cognitive bias
  • never a sub for human interpretation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly