PCA Flashcards

1
Q

What is the meaning of PCA

A

Principal components analysis is concerned with explaining the covariance structure of a set of variables through a few linear combinations of the original variables. PCA re expresses large amounts of data to account for most information int he data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the use of PCA

A

Its a dimension reduction technique or si used as a method for identifying associations among variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Explain the construction and structure of the new principal components

A

Aim of PCA is to describe variation in a set of correlation variables xi in terms of a new set of uncorrelated prinicpal compnonetns yi where the number of yis is substantially less than xis. Each yi is a linear combination of the xi variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is meant by the principal components being in decreasing order of importance

A

The first principal component yi accounts for most variation in the original data out of all of the linear combinations of xis - Usually would aim to explain 80% to 90% of variation in data using Principal components and PC1 will explain a large part of that.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do you find the eigenvalues and eigenvectors of a matrix

A

Solve det(A-lamdaI)=0 for eignvalues
Solve (A-lamda
I)v=0 for eigenvectors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the meaning of the eigenvalues and eigenvectors of covariance matrix S for set of data in terms of PCs

A

Eigenvalue j quantify how much of the variance is accounted for within each PCj. The eignevalue is the variance of each new PC.
Eigenvector j detail the linear combination of xi’s which form PCj

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How much of the total variance does PC1 explain?

A

Lamda1/(sum of all lamdas)= % of total variation explained

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Define the first principal component

A

First PC of a data set is the linear combination of the variables which has greatest variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the total variance of data

A

Sum of lamda i’s

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a key assumption of the set of PCs

A

They are uncorrelated with each other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

In words : what is the proportion of variation explained by each PC and how do we use this to decide how many PCs to use to describe the data

A

Each eigenvalue divided by the sum of all eigenvalues gives proportion of the variation explained by the associated principal component. This cumulative proportion of variation helps to decide how many PCs to use.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a disadvantage to PCA

A

Interpretation of the new PCs can be difficult
It gives large weight to variables who have a large range of values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

In the coefficients of linear combinations of the variables that construct PCs - What matters in terms of signs, comparison, magnitude

A

Signs on the coefficients are aribtray but it matters if they are opposite to another element. The magnitudes also matter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Why might standardisation be needed in PCA

A

To prevent variables with bigger variances perhaps in smaller units being weighted more heavily than other more important variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does standardisation mean and how does one do it

A

Standardisation means ensuring the data is expressed as comparable units - We divide each variable by the sample stdev for that variable which forces all variances to be 1. Hence we are now working with a correlation matrix instead of a covariance matrix

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Relationship between correlation and covariance matrix

A

Correlation matrix = standardised covariance matrix.

17
Q

Why might you want to avoid standardisation?

A

Is variance of a variable is an accurate representation of its importance relative to other variables variances then PCA should be performed on this unstandardized data.

18
Q

What is something to be cautious of when clustering

A

Clustering is very common analysis but not correct! : Uses PCs and even though we only lose a small amount of variability in the data using PCs its not theoretically correct. just to be aware of.

19
Q

What is the function of principal components analysis

A

prcomp(iris[,1:4])

20
Q

data(iris)
> fit<-prcomp(iris[,1:4])
> fit

What does this r code mean?

A

Its reading int he iris data set and performing principal components analysis on the data.

21
Q

summary(fit)
> round(fit$rotation,2)

What does this code mean?

A

It cna be easier to examine a summary of the output of prcomp (in the fit variable)
The ‘summary’ function provides a summary of the PCA output and the ‘round’ function simply rounds the eigenvector values to 2 decimal places.

22
Q

> plot(fit)

What would this r code do? fit is as such:
fit<-prcomp(iris[,1:4])

A

Would plot the proportion of variance explained by each PC

23
Q

> newiris<-predict(fit)
newiris

What does this code do? Fit is as such:
fit<-prcomp(iris[,1:4])

A

The ‘predict’ function is a generic function which predicts results of various model fitting functions — in this case it recognizes ‘fit’ as the result of a principal components analysis and calculates the values of the new PCs for each observation

24
Q

What would you expect the following code to output
data(iris)
> iris[1:10,]

A

Reading in data: would print the first 10 rows of the data set

25
Q

What would you expect the output to be of this code:
summary(iris)

A

Provides a summary of data set: for each field would give:
min, max, quartiles, mean, median or a count if not numeric

26
Q

What does princomp() do

A

princomp(obtains the principal components via an eigen-decomposition of the covariance
matrix of the data)

27
Q

What does prcomp() do

A

prcomp(obtains the principal components via singular value decompositions the data matrix)

28
Q

What would you expect output to be of
fit<-prcomp(iris[,1:4])
fit

A

Give standard deviations of all 4 components and the linear combinations of each variable that make up the PCs

29
Q

What would you expect output to be of
fit<-prcomp(iris[,1:4])
summary(fit)

A

Gives importance of components detailing the SD, Proportion of variance and cumulative proportion for each PCi