Lecture 5: Principle Component analysis Flashcards

1
Q

What does a PCA attempt to do?

A

Attempts to explain variables in as few new ‘components’ as possible

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How does a PCA attempt to explain variables in as few new ‘components’ as possible? What are these components

A

Transforms variables by rotating axes, these are the new
components.
•Axes are chosen to maximise explained variance.
•Total variance of new components is the same as total variance of the original variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Explain the process of axis rotation

A

First we rotate the axis on which the data points are plotted on to draw a line through the dimensions (variables/ data points) which accounts for the most variance. You could describe this as the line that describes the data the best or costs to the data points in a p-dimensional space. For our second component we find a line that is perpendicular to the first one in one of the p-1 dimensions (if 2D then simply perpendicular.) If there are more than 2 dimensions then we repeat step 2 for each of the remaining p dimensions, finding a line through the data which is perpendicular to all previous components and explains the most possible variance in the data.

In other words:
1. Maximises amount of variance accounted for by the 1st (principal) component.
2. For each subsequent component: maximises variance accounted for as long as:
• component is orthogonal (perpendicular, uncorrelated) to all previous components

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How is the maximum number of components decided?

A

The number of dimensions/ variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How are the different principle components related to each other?

A

They’re not; Each principal component loading vector is unique (excluding sign flip)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How does total variance differ following a PCA?

A

It stays the same

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is meant by the eigen vectors in terms of PCA?

A

The orientation of the components (lines through the data)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Finish this from the first lecture:

All p x p matrices A have _______ for which __ = __

A

All p x p matrices A have associated scalers λj and vectors xj for which Axj = λjxj

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

In regards to PCA, what do the variables represent in the following equation?

Axj = λjxj

A

A represents an R (correlation) or S (covariance) matrix. λj represents the eigen value component, how much variance the component explains. xj represents the eigenvectors; how much each variable in the dataset loads onto each component (e.g 40% of petal width and 90% of petal length.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How can we project data onto the new component axis?

A

We can project the data onto the new component axis using a weighted sum of the original variables with the weights being the eigen vectors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Describe two important constraints when determining these component vectors

A
  1. We choose the eigen vector that maximises the accounted variance.
  2. The squared elements elements of an eigen vector add up to one
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Should you centre your variables in a PCA? What happens if you do or don’t?

A

You should always centre your variables and if variances differ greatly, you can also scale them with their standard deviation. If you don’t centre your variables then you’ll have to centre rotate the data after the PCA, so the origin of the new components at the new zero point.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

The eigen vectors are also known as the _____ and tell us how _____

A

The eigen vectors are also known as the rotation matrix and tell us how much to rotate the original axis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does this matrix look like and how is it used?

A
the vector matrix may look like this in R;
(0.38772  -0.92178)
(0.92178    0.38772)
The rotation matrix template looks like:
(cosθ  -sinθ)
(sinθ    cosθ)

Since 0.388 represents the cosine of an angle, we take the arcosine to solve for the angle theta. arccoss .388 equals about 67 degrees so we rotate our component counterclockwise by 67 degrees.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the next step after we have rotated the axis?

A

Plug the values into the components and we can have a new formula to draw the new axis for z1 and z2.9. Each component consists of the column values.
Component 1: z1 = 0.388y1 + 0.922y2
Component 2: z2 = 0.922y1 + 0.388y2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

If you have a point in the original axis with coordinates 1 for petal width and 1 for petal length, what would its coordinates be on the new axis?

A

You would take the first eigen vector element (.388) times the petal length (1) to get .388, then the second eigen vector element (.922) times the petal width (1). Component 1: .388 + .922 = 1.31. Component 1: .388 - .922 = -.534.

17
Q

What assumptions should we check before carrying out a PCA to check whether it is useful or not?

A
  1. Check that R or S is not an identity matrix using functions from psych
    package.
  2. Check if means and variances are similar. If not, then usually best to scale variables or analyse correlation matrix.
18
Q

How do we check that R or S is not an identity matrix using functions from psych package?

A

Using the KMO measure of sampling adequacy (>.7):
psych::KMO(cor(iris4D))

Or Bartlett’s test of spericity (p < .05):
psych::cortest.bartlett(cor(iris4D), nrow = nrow(iris4D))

19
Q

WHy use the psych packagesrather than the ones specified in the book?

A

Not are checked for their accuracy, we know the psych package is thoroughly checked

20
Q

How do we check if means and variances are similar?

A

apply(iris4D, 2, mean)
apply(iris4D, 2, var)

If one has a higher variance it will inevitably load on the 1st component so you should choos to conduct an alalysis with scaling.

21
Q

How do you carry out scaling?

A

In the line of code for the PCA, specify scale. = TRUE

22
Q

How do you centre your data in R?

A

Specify center = TRUE

23
Q

Write out a line of code to carry out a pca on the iris data

A

iris4Dpca

24
Q

how does the prcomp function produce the eigen composition different?

A

It gives the square roots of the eigenvalues which are standard deviations of components

25
Q

Give 4 commands to get 4 different types of data from the comp function

A

iris4Dpca$center #gets the original means
iris4Dpca$scale #equals original standard deviations
iris4Dpca$rotation #rotation matrix
iris4Dpca$x #transformed data

26
Q

How do you decide how many components you’re gonna use?

A

Create a scree plot and loot for the ‘elbow’. proportion explained (pve) = component variance explained / total variance

27
Q

How would you plot this in R?

A

pve

28
Q

What other plot is often used with a PCA? How do you plot this in R?

A

Biplot:
biplot(iris4Dpca,
xlabs=rep(“·”, nrow(iris4D)),
scale=0)

29
Q

How should you interpret a biplot?

A

You should look to see which dimensions load the most on which components by inspecting how far they are along each axis.

30
Q

Give four reasons on why we would use PCA

A
Dimension reduction: reduces  data 
down to its essential  components, 
dropping  unnecessary  ones.
• Essential when # predictors > # 
observations.

Solve multicollinearity: transform correlated variables to uncorrelated components
• Principal Components regression

Clustering alternative: Discover groups of items or people

Principal Axis Factoring (PAF):
• PCA way of conducting Factor Analysis
• conducted on correlation matrix where 1’s along diagonal are replaced by communality estimates (the % of variance explained by the underlying factors)

31
Q

What is PCAs main application in machine learning? Why is this helpful?

A

Dimension reduction

This is helpful for :
•Data visualisation
•Natural Language Processing (NLP)
•Recommender systems

32
Q

Contrast PCA and LDA/ MANOVA on transformations

A

PCA involves orthagonal linear transformations to rotate axes while the manova involves oblique linear transformations to rotate axes

33
Q

Contrast PCA and LDA/ MANOVA on their learning methods

A

PCA involves unsupervised learning, LDA involves supervised learning.

34
Q

Contrast PCA and LDA in regards to their relationship with variance

A

PCA maximises the explained variance for all observations while LDA/ MANOVA maximises explained variance for between group means

35
Q

Contrast PCA and LDA in regards to their assumptions

A

PCA has no assumptions about distributions as it’s a data transformation, not a statistical model. LDA assumes homoscadesticity and multivariate normality (although ignored for LDA in ML
applications).

36
Q

Contrast PCA with an exploratory factor analysis in regards to their transformations, learning, assumptions about variance, relationship with variance and the type of model

A

Both PCA and EFA use orthogonal linear transformations to rotate axes, use unsupervised learning and the variables are assumed not to contain errors. PCA maximises the explained variance for all observations while an EFA maximises explained variance common to all variables. PCA is not a statistical model while EFA is a statistical model.