Chemometrics Flashcards
Define the term principal component regression
- finds relationships between two matrices (X and Y)
What are latent variables
- underlying factors not directly measured but that defines the important variation / information in the data
Define the term variance
- a measure of the spread of the data
- the square of the standard deviation
What is covariance and correlation
- measures on how two variables vary in the same way
- formally correlation is normalised covariance
What is the matrix equation and what do the various components represent
- X = TP^T + E
- T – scores (samples)
- P^T – loadings ( variables)
- E – residuals of unfitted data or noise
- TP^T – contains the structure or information in the data
- X – the raw data table
What are PCs
- latent variables that describe a new direction in n dimensional space that explains a large amount of the variation in the data
- usually a smaller number of LVs are needed to capture the relevant information in the data
What are outliers
- samples that are numerically distant from the rest of the data
What are score plots
- scores for the PC e.g. PC1 vs. PC3
- provides information about objects/ samples
What are loading plots
- loadings for the the PC e.g. PC1 vs. PC3
- provides information about the variables
What is a biplot
- an overlay of scores and loadings
- information about the relationship between variables and objects
What are some of the reasons that outliers occur
- measurement error
- labelling error
- noise
- unique/ extreme sample
- interesting
- unwanted
What are some ways of spotting and detecting outliers
- scoreplot
- residual
- hotelling T^2
- validation
What are the two kinds of outliers?
1) extreme but in model
2) outside the model
How are outliers found from scores
- by inspecting score plots
- useful because peculiar behaviour can be easily spotted
How are outliers spotted using residuals
- samples or variables not consistent with the model i.e. they are a different pattern than the model describes get high residuals