Lecture 5: Principle Component analysis Flashcards by oisin mcelwain

What does a PCA attempt to do?

Attempts to explain variables in as few new ‘components’ as possible

How well did you know this?

Not at all

Perfectly

How does a PCA attempt to explain variables in as few new ‘components’ as possible? What are these components

Transforms variables by rotating axes, these are the new
components.
•Axes are chosen to maximise explained variance.
•Total variance of new components is the same as total variance of the original variables.

How well did you know this?

Not at all

Perfectly

Explain the process of axis rotation

First we rotate the axis on which the data points are plotted on to draw a line through the dimensions (variables/ data points) which accounts for the most variance. You could describe this as the line that describes the data the best or costs to the data points in a p-dimensional space. For our second component we find a line that is perpendicular to the first one in one of the p-1 dimensions (if 2D then simply perpendicular.) If there are more than 2 dimensions then we repeat step 2 for each of the remaining p dimensions, finding a line through the data which is perpendicular to all previous components and explains the most possible variance in the data.

In other words:
1. Maximises amount of variance accounted for by the 1st (principal) component.
2. For each subsequent component: maximises variance accounted for as long as:
• component is orthogonal (perpendicular, uncorrelated) to all previous components

How well did you know this?

Not at all

Perfectly

How is the maximum number of components decided?

The number of dimensions/ variables

How well did you know this?

Not at all

Perfectly

How are the different principle components related to each other?

They’re not; Each principal component loading vector is unique (excluding sign flip)

How well did you know this?

Not at all

Perfectly

How does total variance differ following a PCA?

It stays the same

How well did you know this?

Not at all

Perfectly

What is meant by the eigen vectors in terms of PCA?

The orientation of the components (lines through the data)

How well did you know this?

Not at all

Perfectly

Finish this from the first lecture:

All p x p matrices A have _______ for which __ = __

All p x p matrices A have associated scalers λj and vectors xj for which Axj = λjxj

How well did you know this?

Not at all

Perfectly

In regards to PCA, what do the variables represent in the following equation?

Axj = λjxj

A represents an R (correlation) or S (covariance) matrix. λj represents the eigen value component, how much variance the component explains. xj represents the eigenvectors; how much each variable in the dataset loads onto each component (e.g 40% of petal width and 90% of petal length.)

How well did you know this?

Not at all

Perfectly

How can we project data onto the new component axis?

We can project the data onto the new component axis using a weighted sum of the original variables with the weights being the eigen vectors.

How well did you know this?

Not at all

Perfectly

Describe two important constraints when determining these component vectors

We choose the eigen vector that maximises the accounted variance.
The squared elements elements of an eigen vector add up to one

How well did you know this?

Not at all

Perfectly

Should you centre your variables in a PCA? What happens if you do or don’t?

You should always centre your variables and if variances differ greatly, you can also scale them with their standard deviation. If you don’t centre your variables then you’ll have to centre rotate the data after the PCA, so the origin of the new components at the new zero point.

How well did you know this?

Not at all

Perfectly

The eigen vectors are also known as the _____ and tell us how _____

The eigen vectors are also known as the rotation matrix and tell us how much to rotate the original axis.

How well did you know this?

Not at all

Perfectly

What does this matrix look like and how is it used?

the vector matrix may look like this in R;
(0.38772  -0.92178)
(0.92178    0.38772)
The rotation matrix template looks like:
(cosθ  -sinθ)
(sinθ    cosθ)

Since 0.388 represents the cosine of an angle, we take the arcosine to solve for the angle theta. arccoss .388 equals about 67 degrees so we rotate our component counterclockwise by 67 degrees.

How well did you know this?

Not at all

Perfectly

What is the next step after we have rotated the axis?

Plug the values into the components and we can have a new formula to draw the new axis for z1 and z2.9. Each component consists of the column values.
Component 1: z1 = 0.388y1 + 0.922y2
Component 2: z2 = 0.922y1 + 0.388y2

How well did you know this?

Not at all

Perfectly

If you have a point in the original axis with coordinates 1 for petal width and 1 for petal length, what would its coordinates be on the new axis?

Study These Flashcards

You would take the first eigen vector element (.388) times the petal length (1) to get .388, then the second eigen vector element (.922) times the petal width (1). Component 1: .388 + .922 = 1.31. Component 1: .388 - .922 = -.534.

What assumptions should we check before carrying out a PCA to check whether it is useful or not?

Study These Flashcards

Check that R or S is not an identity matrix using functions from psych
package.
Check if means and variances are similar. If not, then usually best to scale variables or analyse correlation matrix.

How do we check that R or S is not an identity matrix using functions from psych package?

Study These Flashcards

Using the KMO measure of sampling adequacy (>.7):
psych::KMO(cor(iris4D))

Or Bartlett’s test of spericity (p < .05):
psych::cortest.bartlett(cor(iris4D), nrow = nrow(iris4D))

WHy use the psych packagesrather than the ones specified in the book?

Study These Flashcards

Not are checked for their accuracy, we know the psych package is thoroughly checked

How do we check if means and variances are similar?

Study These Flashcards

apply(iris4D, 2, mean)
apply(iris4D, 2, var)

If one has a higher variance it will inevitably load on the 1st component so you should choos to conduct an alalysis with scaling.

How do you carry out scaling?

Study These Flashcards

In the line of code for the PCA, specify scale. = TRUE

How do you centre your data in R?

Study These Flashcards

Specify center = TRUE

Write out a line of code to carry out a pca on the iris data

Study These Flashcards

iris4Dpca

how does the prcomp function produce the eigen composition different?

Study These Flashcards

It gives the square roots of the eigenvalues which are standard deviations of components

Give 4 commands to get 4 different types of data from the comp function

iris4Dpca$center #gets the original means iris4Dpca$scale #equals original standard deviations iris4Dpca$rotation #rotation matrix iris4Dpca$x #transformed data

How do you decide how many components you're gonna use?

Create a scree plot and loot for the 'elbow'. proportion explained (pve) = component variance explained / total variance

How would you plot this in R?

pve

What other plot is often used with a PCA? How do you plot this in R?

Biplot: biplot(iris4Dpca, xlabs=rep("·", nrow(iris4D)), scale=0)

How should you interpret a biplot?

You should look to see which dimensions load the most on which components by inspecting how far they are along each axis.

Give four reasons on why we would use PCA

``` Dimension reduction: reduces data down to its essential components, dropping unnecessary ones. • Essential when # predictors > # observations. ``` Solve multicollinearity: transform correlated variables to uncorrelated components • Principal Components regression Clustering alternative: Discover groups of items or people Principal Axis Factoring (PAF): • PCA way of conducting Factor Analysis • conducted on correlation matrix where 1’s along diagonal are replaced by communality estimates (the % of variance explained by the underlying factors)

What is PCAs main application in machine learning? Why is this helpful?

Dimension reduction This is helpful for : •Data visualisation •Natural Language Processing (NLP) •Recommender systems

Contrast PCA and LDA/ MANOVA on transformations

PCA involves orthagonal linear transformations to rotate axes while the manova involves oblique linear transformations to rotate axes

Contrast PCA and LDA/ MANOVA on their learning methods

PCA involves unsupervised learning, LDA involves supervised learning.

Contrast PCA and LDA in regards to their relationship with variance

PCA maximises the explained variance for all observations while LDA/ MANOVA maximises explained variance for between group means

Contrast PCA and LDA in regards to their assumptions

PCA has no assumptions about distributions as it’s a data transformation, not a statistical model. LDA assumes homoscadesticity and multivariate normality (although ignored for LDA in ML applications).

Contrast PCA with an exploratory factor analysis in regards to their transformations, learning, assumptions about variance, relationship with variance and the type of model

Both PCA and EFA use orthogonal linear transformations to rotate axes, use unsupervised learning and the variables are assumed not to contain errors. PCA maximises the explained variance for all observations while an EFA maximises explained variance common to all variables. PCA is not a statistical model while EFA is a statistical model.

Lecture 5: Principle Component analysis Flashcards

(36 cards)