Chemometrics Flashcards

1
Q

Define the term principal component regression

A
  • finds relationships between two matrices (X and Y)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are latent variables

A
  • underlying factors not directly measured but that defines the important variation / information in the data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Define the term variance

A
  • a measure of the spread of the data

- the square of the standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is covariance and correlation

A
  • measures on how two variables vary in the same way

- formally correlation is normalised covariance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the matrix equation and what do the various components represent

A
  • X = TP^T + E
  • T – scores (samples)
  • P^T – loadings ( variables)
  • E – residuals of unfitted data or noise
  • TP^T – contains the structure or information in the data
  • X – the raw data table
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are PCs

A
  • latent variables that describe a new direction in n dimensional space that explains a large amount of the variation in the data
  • usually a smaller number of LVs are needed to capture the relevant information in the data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are outliers

A
  • samples that are numerically distant from the rest of the data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are score plots

A
  • scores for the PC e.g. PC1 vs. PC3

- provides information about objects/ samples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are loading plots

A
  • loadings for the the PC e.g. PC1 vs. PC3

- provides information about the variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a biplot

A
  • an overlay of scores and loadings

- information about the relationship between variables and objects

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are some of the reasons that outliers occur

A
  • measurement error
  • labelling error
  • noise
  • unique/ extreme sample
    • interesting
    • unwanted
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are some ways of spotting and detecting outliers

A
  • scoreplot
  • residual
  • hotelling T^2
  • validation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the two kinds of outliers?

A

1) extreme but in model

2) outside the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How are outliers found from scores

A
  • by inspecting score plots

- useful because peculiar behaviour can be easily spotted

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How are outliers spotted using residuals

A
  • samples or variables not consistent with the model i.e. they are a different pattern than the model describes get high residuals
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is a model

A
  • a summary of your best knowledge of a system at the time of investigation
  • a good model can be used to predict future events with confidence
17
Q

What is validation

A
  • means texting if a model is general or only descriptive for the calibration samples
  • can be tested using only part of the samples for building the model and predicting the cost
18
Q

Describe the 3 types of validation

A
  • statistical - fit
    • confidence intervals
    • not used in chemometrics
  • empirical I - fit and prediction
    • cross validation or test set validation
    • often used in chemometrics
  • empirical II - fir to real life
    • external information, interview validation
      (Primarily PCA)
19
Q

How is cross validation performed

A
  • by systematically leaving some of the original samples out of the model and see how well they are predicted
  • the most simple method is leave one out where with n samples n models are built leaving every sample out once
20
Q

What are the benefits of preprocessing

A
  • models with lower prediction error

- simpler models that are more robust and/or more easy to interpret

21
Q

What is mean-centring

A
  • the mean value for each variable is substrated from the data
22
Q

What is scaling and what two categories does it fall under

A
  • multiplying (some) variables with a factor so that the variance gets similar numerical values
  • auto scaling and block scaling are the two types of scaling
23
Q

What is autoscaling

A
  • each column is divided by its standard deviation so that the variance gets similar numerical values
24
Q

What is block-scaling

A
  • giving blocks of data equal or modified importance

-

25
Q

What are consequences of mean centring

A
  • allows the PCs to capture the relevant variation in the data
  • without mean centring it is common that PC1 only describes the deviation from zero
26
Q

What are the consequences of correctly preformed scaling

A
  • gives each variable equal or comparable importance

- if scaling is not done variables with high numerical values maybe give inappropriate importance

27
Q

Define the term principle component analysis

A
  • describes the correlation structure X
  • uncovers the underlying variation in the data (X)
  • and provides a summary showing how observations are related and if there any deviating observations or groups of observations in the data