Lecture 6&7 - Chemometrics Flashcards

1
Q

what is chemometrics

A

a multivariate statistical analysis that is computationally intensive and is applied to chemical systems or processes to find patterns and trends in the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what does multivariate mean

A

analyses multiple variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what does computationally intensive mean

A

the need for a computer and algorithms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

name 6 things chemometrics can help us do

A
  • reduce complex datasets
  • identify and quantify sample groups
  • optimise our experimental parameters
  • identify covariance and pick out important variables
  • give reproducible measures of data
  • visualise the data (a picture is better)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is meant by covariance

A

which parts of the data are associated/not independent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what effect does chemometrics have on subjectivity

A

reduces it in the analysis of data but does not eliminate it completely as human still interpret the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what can chemometric reveal that may have not been noticed

A

underlying/ not obvious trends between variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what is the main aim of chemometrics

A

to maximise output and quality with minimal cost

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

chemometrics has been used in forensic science since 2009 - what benefits has this provided

A

improved efficiency in forensic workflow
better quality of the use of resources for forensic purposes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

how is chemometric applicable to forensic science (6)

A

statistical framework

replaces use of unique and match without use of stats to support (subjectivity)

can counteract bias

quicker than manual data interpretation

don’t need an expert but someone does need to be able to interpret the output data

can predict trace behaviour

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what can be identified using multivariate analysis that may not be seen in univariate analysis

A

outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what is univariate analysis

A

analysis that only considers one variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

give an example of where multivariate analysis may be beneficial in forensic analysis and the variables that could be considered

A

pollen dispersion
considering time of year and weather

fingermarks
considering sweatiness of someone and weather conditions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what are the 4 broad categories of chemometrics

A

experiment design (DOE)
exploratory data analysis (EDA - what is the data showing me)
classification
regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what does the DOE (design of experiment) affect in forensic science

A

evidence collection, storage, analysis instrument selection and optimisation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what can the DOE part of chemometrics be used to streamline in forensics in the future

A

efficiency, quality and reproducibility by establishing optimised workflows

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

what does the regression part of chemometrics involve

A

a version calibration curves based on a linear y=mx+c relationship which maps the effect of multivariate independent variables

we can make educated predictions based of the curve

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What happens in the EDA part of chemometrics

A

data is reduced by an algorithm that looks how variables correspond

identifies groupings of samples in complex datasets

visualises trends

making the data more manageable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

EDA is an unsupervised technique - what does this mean

A

it explores the data without any prior assumptions or knowledge of the samples (reducing human bias and subjectivity)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

what is a supervised technique

A

the building of classification rules for grouping samples together - done from EDA analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

name the two most commonly used EDA techniques

A

CA = cluster analysis
PCA = principal component analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

name 3 features of cluster analysis (CA)

A

unsupervised technique

samples are grouped into clusters based on a measure of similarity (a calculated distance)

the output is a dendrogram

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

what are the two types of cluster analysis

A

ACA - agglomerative cluster analysis
HCA - hierarchical cluster analysis

24
Q

what is agglomerative cluster analysis (ACA)

A

individual samples are grouped into clusters

25
Q

what is hierarchical cluster analysis (HCA)

A

clusters are split into individual samples

26
Q

in cluster analysis which samples are the most similar

A

the ones grouped closest to zero

27
Q

how are the number of clusters decided in cluster analysis

A

the analyst decides on the stopping rules - where the subjectivity is introduced

28
Q

what is a limitation of cluster analysis

A

the visualisation of relationships is made clear but it can not tell WHY these grouping occur

29
Q

what happens in PCA analysis

A

an algorithm

an unsupervised technique

assesses all variable in a dataset and decides which are relevant and correlated

30
Q

in PCA when the algorithm finds a correlation what is this defined as

A

a Principal Component (PC)

31
Q

what does the PC describe in PCA analysis

A

the largest variation between samples with PC1 being given the highest priority

32
Q

For ‘good’ data how many principal components would you expect to see in PCA

A

more than 1

multiple PC’s are identified by the algorithm until all the variability within the dataset has been modelled

33
Q

what is data comprised of

A

structure and noise or model and error

where data = spectra and chromatograms

structure/model = useful info (explained variance)

noise/error = not useful info to data interpretation (instrument noise, lab temp fluctuation) (residual variance)

34
Q

how are principal components represented in PCA

A

a straight line = a linear combination of the original variables

35
Q

in PCA what assigns a sample a ‘score’

A

the distance along the PC line from the mean (can be positive or negative)

each sample has a different score for each principal component

36
Q

how many PCs can a model have in PCA

A

as many as the number of original variables

37
Q

what type of plot reveals the optimum number of PCs for a given number of variables in PCA

A

an explained variance or scree plot

the % on this plot (y axis) is the variance in the data set

keep on adding PCs until you are adding less than 1% variance as here you are likely to be looking at noise

this is a poor method of decides how many PCs to use

38
Q

what is a better method to use than scree/explained variance plots

A

scores plots

39
Q

what is a scores plot

A

a map of the samples where each data point is a sample and similar samples are clusters

samples closer to each other are more similar and vice versa

the x and y axis correspond to two PC’s these can be any

plotting different PCs against eachother gives different clusters

40
Q

how can you tell by looking at scores plots that a PC is modelling noise and not useful information

A

if plotting PC1 vs PC4 and then PC1 vs PC5 gives very similar sample clusters then it is likely that PC5 is just modelling noise and isn’t adding any value to the data

41
Q

what is a benefit of PCA over CA

A

PCA can tell you why samples cluster but CA can’t

42
Q

scores plots however do not always reveal immediate trends so what type of plot can be used instead

A

skittles plots

43
Q

what is a skittles plot

A

again this is plotting PCs against each other e.g PC1 vs PC2

samples are then grouped according to predefined categories e.g creams, liquids, loose powders, pressed powders and mousses

44
Q

what is the benefit of 3D scores plots

A

for even better data visualisation

different combinations of PCs can be plotted for better groupings

45
Q

what is a PCA loading plot

A

the link between a scores plot and the chemistry of the samples - tells us why certain samples group together (why PCA is better than CA)

(scores plots = map of samples
loadings = map of variables)

y axis = loading
x axis = raman shift (example)

the loading value can be +ve or -ve

46
Q

what do loading values represent

A

The weight of a particular variable - e.g in slide 19 lecture 7 the blue line tells you that anything grouped into PC1 will have the long peak at the start in the spectrum

47
Q

name one type of supervised chemometric technique

A

LDA = linear discriminant analysis

48
Q

what happens in LDA = linear discriminant analysis

A

PCs are used to create classification rules

classifying unknown samples

49
Q

LDA mathematically works similarly to PCA but what are PCs and scores called instead

why are loadings also calculated in LDA

A

PCs = discriminant functions
scores = discriminant values

loadings are also calculated like in PCA to maximise separation between known sample groupings

50
Q

briefly explain how PCA and LDA differ

A

PCA
- looks for most variation between samples
- applied loading weight where variation is found
plots PC1 vs PC2 (or any PC number)

LDA
- tries to minimise variation within sample groups to maximise the separation of groupings by prioritising the loading values
plots class 1 vs class 2

51
Q

in LDA what are samples in the same group given

A

similar discriminant values (PCs)

those in different groups are given different discriminant values

52
Q

name 4 limitations to using chemometrics for trace evidence analysis

A
  • can’t make up for poor data
  • data preprocessing can be important but less is more (overprocessing data can cause issues if you are trying to force the data to show what it doesn’y)
  • sample size is better if bigger but this is hard in trace analysis
  • do the samples use accurately reflect the population
53
Q

what are things that can happen during sample collection and analysis that chemometrics does not take into account

A

contamination
degradation
replicates taken
controlled sample collection
reproducibility

54
Q

what is chemometric not a substitution for

A

human data interpretation

the user must be able to understand and explain the methods to the judge and jury

55
Q
A