Principal Component Analysis Flashcards

1
Q

What is PCA?

A

An unsupervised method for visualizing data.

Principal components are linear combinations of variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

In PCA, we substract the mean from each xij so the mean of each variable is 0. Does is affect the variance of each Xi?

A

No.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

For the first principal component (Z1), loadings are selected to maximize or minimize the variance of Z1?

A

Maximize.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How are the loadings for the first principal component selected?

A

Maximize the variance subject to the comstraint that the sum of the squares of the loadings is 1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the score of an observation?

A
  1. Project the observation perpendicularly on the principal component line.
    The score of the observarion is the distance of that projection point to the the (0,0) coordinate.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How are all other principal components defined?

A

They are defined to maximize the variance of the component with the constraint that the sum of the square of the loadings is 1 and be uncorrelated to the previous component.
(Each principal component is orthogonal to the hyperplan of the previous principal component)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a biplot and what is it used for?

A

A biplot plots two things. One using labels on the bottom and the left and the other using labels on the top and the right.
It can be used to visualise principal components.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

True or false: principal components are the best linear approximation of the observations.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Does the scale of variables matter in linear regression, principal components or both?

A

Principal component only. If a variable is multiplied by a constant greater than one, its variance increses and PCA puts higher loading on the variable in order to maximize variance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Why are variance usually scaled?

A

To avoid giving some variables spurious importance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How does the variance of one variable affects its approximation using PCA?

A

The higher the variance of the variable is, the better the approximation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

The vector of loadings is unique up to sign. What does it mean?

A

One may obtain an equivalent solution by flipping the sign on all of the loadings. Flipping the sign of the loading vector results in flipping the sign on all of the scores.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the loadings?

A

The loadings indicate the direction of the principal component - direction is not affected by reversing the sign.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Why can’t cross validation be used to determine the number of principal component that should be used?

A

PCA is an unsupervised method and cross-validation is not available for unsupervised methods.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What methods can be used to determine the number of principal components that should be used?

A
  1. Cumulative proportion of the variance explained by the M first principal components and select M so that a specified proportion of variance is explained
  2. Scree plot: plot the proportion of variance explained by each principal component (m,PVEm). Look for the “elbow” point.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the elbow point in a scre plot?

A

Point at which the plot bends so much that very little of the variance is explained by further principal components.