Lecture 6&7 - Chemometrics Flashcards

Question

what is hierarchical cluster analysis (HCA)

Answer 1

clusters are split into individual samples

Answer 2

the ones grouped closest to zero

Answer 3

the analyst decides on the stopping rules - where the subjectivity is introduced

Answer 4

the visualisation of relationships is made clear but it can not tell WHY these grouping occur

Answer 5

an algorithm an unsupervised technique assesses all variable in a dataset and decides which are relevant and correlated

Answer 6

a Principal Component (PC)

Answer 7

the largest variation between samples with PC1 being given the highest priority

Answer 8

more than 1 multiple PC's are identified by the algorithm until all the variability within the dataset has been modelled

Answer 9

structure and noise or model and error where data = spectra and chromatograms structure/model = useful info (explained variance) noise/error = not useful info to data interpretation (instrument noise, lab temp fluctuation) (residual variance)

Answer 10

a straight line = a linear combination of the original variables

Answer 11

the distance along the PC line from the mean (can be positive or negative) each sample has a different score for each principal component

Answer 12

as many as the number of original variables

Answer 13

an explained variance or scree plot the % on this plot (y axis) is the variance in the data set keep on adding PCs until you are adding less than 1% variance as here you are likely to be looking at noise this is a poor method of decides how many PCs to use

Answer 14

scores plots

Answer 15

a map of the samples where each data point is a sample and similar samples are clusters samples closer to each other are more similar and vice versa the x and y axis correspond to two PC's these can be any plotting different PCs against eachother gives different clusters

Answer 16

if plotting PC1 vs PC4 and then PC1 vs PC5 gives very similar sample clusters then it is likely that PC5 is just modelling noise and isn't adding any value to the data

Answer 17

PCA can tell you why samples cluster but CA can't

Answer 18

skittles plots

Answer 19

again this is plotting PCs against each other e.g PC1 vs PC2 samples are then grouped according to predefined categories e.g creams, liquids, loose powders, pressed powders and mousses

Answer 20

for even better data visualisation different combinations of PCs can be plotted for better groupings

Answer 21

the link between a scores plot and the chemistry of the samples - tells us why certain samples group together (why PCA is better than CA) (scores plots = map of samples loadings = map of variables) y axis = loading x axis = raman shift (example) the loading value can be +ve or -ve

Answer 22

The weight of a particular variable - e.g in slide 19 lecture 7 the blue line tells you that anything grouped into PC1 will have the long peak at the start in the spectrum

Answer 23

LDA = linear discriminant analysis

Answer 24

PCs are used to create classification rules classifying unknown samples

Answer 25

PCs = discriminant functions scores = discriminant values loadings are also calculated like in PCA to maximise separation between known sample groupings

Answer 26

PCA - looks for most variation between samples - applied loading weight where variation is found plots PC1 vs PC2 (or any PC number) LDA - tries to minimise variation within sample groups to maximise the separation of groupings by prioritising the loading values plots class 1 vs class 2

Answer 27

similar discriminant values (PCs) those in different groups are given different discriminant values

Answer 28

- can't make up for poor data - data preprocessing can be important but less is more (overprocessing data can cause issues if you are trying to force the data to show what it doesn'y) - sample size is better if bigger but this is hard in trace analysis - do the samples use accurately reflect the population

Answer 29

contamination degradation replicates taken controlled sample collection reproducibility

Answer 30

human data interpretation the user must be able to understand and explain the methods to the judge and jury

Lecture 6&7 - Chemometrics Flashcards

(55 cards)