Lecture 6&7 - Chemometrics Flashcards

1
Q

what is chemometrics

A

a multivariate statistical analysis that is computationally intensive and is applied to chemical systems or processes to find patterns and trends in the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what does multivariate mean

A

analyses multiple variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what does computationally intensive mean

A

the need for a computer and algorithms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

name 6 things chemometrics can help us do

A
  • reduce complex datasets
  • identify and quantify sample groups
  • optimise our experimental parameters
  • identify covariance and pick out important variables
  • give reproducible measures of data
  • visualise the data (a picture is better)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is meant by covariance

A

which parts of the data are associated/not independent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what effect does chemometrics have on subjectivity

A

reduces it in the analysis of data but does not eliminate it completely as human still interpret the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what can chemometric reveal that may have not been noticed

A

underlying/ not obvious trends between variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what is the main aim of chemometrics

A

to maximise output and quality with minimal cost

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

chemometrics has been used in forensic science since 2009 - what benefits has this provided

A

improved efficiency in forensic workflow
better quality of the use of resources for forensic purposes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

how is chemometric applicable to forensic science (6)

A

statistical framework

replaces use of unique and match without use of stats to support (subjectivity)

can counteract bias

quicker than manual data interpretation

don’t need an expert but someone does need to be able to interpret the output data

can predict trace behaviour

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what can be identified using multivariate analysis that may not be seen in univariate analysis

A

outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what is univariate analysis

A

analysis that only considers one variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

give an example of where multivariate analysis may be beneficial in forensic analysis and the variables that could be considered

A

pollen dispersion
considering time of year and weather

fingermarks
considering sweatiness of someone and weather conditions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what are the 4 broad categories of chemometrics

A

experiment design (DOE)
exploratory data analysis (EDA - what is the data showing me)
classification
regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what does the DOE (design of experiment) affect in forensic science

A

evidence collection, storage, analysis instrument selection and optimisation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what can the DOE part of chemometrics be used to streamline in forensics in the future

A

efficiency, quality and reproducibility by establishing optimised workflows

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

what does the regression part of chemometrics involve

A

a version calibration curves based on a linear y=mx+c relationship which maps the effect of multivariate independent variables

we can make educated predictions based of the curve

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What happens in the EDA part of chemometrics

A

data is reduced by an algorithm that looks how variables correspond

identifies groupings of samples in complex datasets

visualises trends

making the data more manageable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

EDA is an unsupervised technique - what does this mean

A

it explores the data without any prior assumptions or knowledge of the samples (reducing human bias and subjectivity)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

what is a supervised technique

A

the building of classification rules for grouping samples together - done from EDA analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

name the two most commonly used EDA techniques

A

CA = cluster analysis
PCA = principal component analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

name 3 features of cluster analysis (CA)

A

unsupervised technique

samples are grouped into clusters based on a measure of similarity (a calculated distance)

the output is a dendrogram

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

what are the two types of cluster analysis

A

ACA - agglomerative cluster analysis
HCA - hierarchical cluster analysis

24
Q

what is agglomerative cluster analysis (ACA)

A

individual samples are grouped into clusters

25
what is hierarchical cluster analysis (HCA)
clusters are split into individual samples
26
in cluster analysis which samples are the most similar
the ones grouped closest to zero
27
how are the number of clusters decided in cluster analysis
the analyst decides on the stopping rules - where the subjectivity is introduced
28
what is a limitation of cluster analysis
the visualisation of relationships is made clear but it can not tell WHY these grouping occur
29
what happens in PCA analysis
an algorithm an unsupervised technique assesses all variable in a dataset and decides which are relevant and correlated
30
in PCA when the algorithm finds a correlation what is this defined as
a Principal Component (PC)
31
what does the PC describe in PCA analysis
the largest variation between samples with PC1 being given the highest priority
32
For 'good' data how many principal components would you expect to see in PCA
more than 1 multiple PC's are identified by the algorithm until all the variability within the dataset has been modelled
33
what is data comprised of
structure and noise or model and error where data = spectra and chromatograms structure/model = useful info (explained variance) noise/error = not useful info to data interpretation (instrument noise, lab temp fluctuation) (residual variance)
34
how are principal components represented in PCA
a straight line = a linear combination of the original variables
35
in PCA what assigns a sample a 'score'
the distance along the PC line from the mean (can be positive or negative) each sample has a different score for each principal component
36
how many PCs can a model have in PCA
as many as the number of original variables
37
what type of plot reveals the optimum number of PCs for a given number of variables in PCA
an explained variance or scree plot the % on this plot (y axis) is the variance in the data set keep on adding PCs until you are adding less than 1% variance as here you are likely to be looking at noise this is a poor method of decides how many PCs to use
38
what is a better method to use than scree/explained variance plots
scores plots
39
what is a scores plot
a map of the samples where each data point is a sample and similar samples are clusters samples closer to each other are more similar and vice versa the x and y axis correspond to two PC's these can be any plotting different PCs against eachother gives different clusters
40
how can you tell by looking at scores plots that a PC is modelling noise and not useful information
if plotting PC1 vs PC4 and then PC1 vs PC5 gives very similar sample clusters then it is likely that PC5 is just modelling noise and isn't adding any value to the data
41
what is a benefit of PCA over CA
PCA can tell you why samples cluster but CA can't
42
scores plots however do not always reveal immediate trends so what type of plot can be used instead
skittles plots
43
what is a skittles plot
again this is plotting PCs against each other e.g PC1 vs PC2 samples are then grouped according to predefined categories e.g creams, liquids, loose powders, pressed powders and mousses
44
what is the benefit of 3D scores plots
for even better data visualisation different combinations of PCs can be plotted for better groupings
45
what is a PCA loading plot
the link between a scores plot and the chemistry of the samples - tells us why certain samples group together (why PCA is better than CA) (scores plots = map of samples loadings = map of variables) y axis = loading x axis = raman shift (example) the loading value can be +ve or -ve
46
what do loading values represent
The weight of a particular variable - e.g in slide 19 lecture 7 the blue line tells you that anything grouped into PC1 will have the long peak at the start in the spectrum
47
name one type of supervised chemometric technique
LDA = linear discriminant analysis
48
what happens in LDA = linear discriminant analysis
PCs are used to create classification rules classifying unknown samples
49
LDA mathematically works similarly to PCA but what are PCs and scores called instead why are loadings also calculated in LDA
PCs = discriminant functions scores = discriminant values loadings are also calculated like in PCA to maximise separation between known sample groupings
50
briefly explain how PCA and LDA differ
PCA - looks for most variation between samples - applied loading weight where variation is found plots PC1 vs PC2 (or any PC number) LDA - tries to minimise variation within sample groups to maximise the separation of groupings by prioritising the loading values plots class 1 vs class 2
51
in LDA what are samples in the same group given
similar discriminant values (PCs) those in different groups are given different discriminant values
52
name 4 limitations to using chemometrics for trace evidence analysis
- can't make up for poor data - data preprocessing can be important but less is more (overprocessing data can cause issues if you are trying to force the data to show what it doesn'y) - sample size is better if bigger but this is hard in trace analysis - do the samples use accurately reflect the population
53
what are things that can happen during sample collection and analysis that chemometrics does not take into account
contamination degradation replicates taken controlled sample collection reproducibility
54
what is chemometric not a substitution for
human data interpretation the user must be able to understand and explain the methods to the judge and jury
55