PCA Flashcards by Sarah Oliver

What is the meaning of PCA

Principal components analysis is concerned with explaining the covariance structure of a set of variables through a few linear combinations of the original variables. PCA re expresses large amounts of data to account for most information int he data

How well did you know this?

Not at all

Perfectly

What is the use of PCA

Its a dimension reduction technique or si used as a method for identifying associations among variables

How well did you know this?

Not at all

Perfectly

Explain the construction and structure of the new principal components

Aim of PCA is to describe variation in a set of correlation variables xi in terms of a new set of uncorrelated prinicpal compnonetns yi where the number of yis is substantially less than xis. Each yi is a linear combination of the xi variables.

How well did you know this?

Not at all

Perfectly

What is meant by the principal components being in decreasing order of importance

The first principal component yi accounts for most variation in the original data out of all of the linear combinations of xis - Usually would aim to explain 80% to 90% of variation in data using Principal components and PC1 will explain a large part of that.

How well did you know this?

Not at all

Perfectly

How do you find the eigenvalues and eigenvectors of a matrix

Solve det(A-lamdaI)=0 for eignvalues
Solve (A-lamdaI)v=0 for eigenvectors

How well did you know this?

Not at all

Perfectly

What is the meaning of the eigenvalues and eigenvectors of covariance matrix S for set of data in terms of PCs

Eigenvalue j quantify how much of the variance is accounted for within each PCj. The eignevalue is the variance of each new PC.
Eigenvector j detail the linear combination of xi’s which form PCj

How well did you know this?

Not at all

Perfectly

How much of the total variance does PC1 explain?

Lamda1/(sum of all lamdas)= % of total variation explained

How well did you know this?

Not at all

Perfectly

Define the first principal component

First PC of a data set is the linear combination of the variables which has greatest variance

How well did you know this?

Not at all

Perfectly

What is the total variance of data

Sum of lamda i’s

How well did you know this?

Not at all

Perfectly

What is a key assumption of the set of PCs

They are uncorrelated with each other

How well did you know this?

Not at all

Perfectly

In words : what is the proportion of variation explained by each PC and how do we use this to decide how many PCs to use to describe the data

Each eigenvalue divided by the sum of all eigenvalues gives proportion of the variation explained by the associated principal component. This cumulative proportion of variation helps to decide how many PCs to use.

How well did you know this?

Not at all

Perfectly

What is a disadvantage to PCA

Interpretation of the new PCs can be difficult
It gives large weight to variables who have a large range of values

How well did you know this?

Not at all

Perfectly

In the coefficients of linear combinations of the variables that construct PCs - What matters in terms of signs, comparison, magnitude

Signs on the coefficients are aribtray but it matters if they are opposite to another element. The magnitudes also matter

How well did you know this?

Not at all

Perfectly

Why might standardisation be needed in PCA

To prevent variables with bigger variances perhaps in smaller units being weighted more heavily than other more important variables

How well did you know this?

Not at all

Perfectly

What does standardisation mean and how does one do it

Standardisation means ensuring the data is expressed as comparable units - We divide each variable by the sample stdev for that variable which forces all variances to be 1. Hence we are now working with a correlation matrix instead of a covariance matrix

How well did you know this?

Not at all

Perfectly

Relationship between correlation and covariance matrix

Study These Flashcards

Correlation matrix = standardised covariance matrix.

Why might you want to avoid standardisation?

Study These Flashcards

Is variance of a variable is an accurate representation of its importance relative to other variables variances then PCA should be performed on this unstandardized data.

What is something to be cautious of when clustering

Study These Flashcards

Clustering is very common analysis but not correct! : Uses PCs and even though we only lose a small amount of variability in the data using PCs its not theoretically correct. just to be aware of.

What is the function of principal components analysis

Study These Flashcards

prcomp(iris[,1:4])

data(iris)
> fit<-prcomp(iris[,1:4])
> fit

What does this r code mean?

Study These Flashcards

Its reading int he iris data set and performing principal components analysis on the data.

summary(fit)
> round(fit$rotation,2)

What does this code mean?

Study These Flashcards

It cna be easier to examine a summary of the output of prcomp (in the fit variable)
The ‘summary’ function provides a summary of the PCA output and the ‘round’ function simply rounds the eigenvector values to 2 decimal places.

> plot(fit)

What would this r code do? fit is as such:
fit<-prcomp(iris[,1:4])

Study These Flashcards

Would plot the proportion of variance explained by each PC

> newiris<-predict(fit)
newiris

What does this code do? Fit is as such:
fit<-prcomp(iris[,1:4])

Study These Flashcards

The ‘predict’ function is a generic function which predicts results of various model fitting functions — in this case it recognizes ‘fit’ as the result of a principal components analysis and calculates the values of the new PCs for each observation

What would you expect the following code to output
data(iris)
> iris[1:10,]

Study These Flashcards

Reading in data: would print the first 10 rows of the data set

What would you expect the output to be of this code: summary(iris)

Provides a summary of data set: for each field would give: min, max, quartiles, mean, median or a count if not numeric

What does princomp() do

princomp(obtains the principal components via an eigen-decomposition of the covariance matrix of the data)

What does prcomp() do

prcomp(obtains the principal components via singular value decompositions the data matrix)

What would you expect output to be of fit<-prcomp(iris[,1:4]) fit

Give standard deviations of all 4 components and the linear combinations of each variable that make up the PCs

What would you expect output to be of fit<-prcomp(iris[,1:4]) summary(fit)

Gives importance of components detailing the SD, Proportion of variance and cumulative proportion for each PCi

PCA Flashcards

(29 cards)