6. Unsupervised Learning Flashcards by Marija Spehar

What is PCA.

Statistical tool that finds a low-dimensional representation of a dataset that contains as much info from the dataset as possible.

How well did you know this?

Not at all

Perfectly

Can PCA be used in a supervised and unsupervised setting?

Yes, PCR would be used in a supervised setting

How well did you know this?

Not at all

Perfectly

Formula for the mth PC

Formula

How well did you know this?

Not at all

Perfectly

The principle component loadings are constrained to what?

Formula

How well did you know this?

Not at all

Perfectly

Mth PC score formula

Formula

How well did you know this?

Not at all

Perfectly

For PCA, do the variables need to be scaled or centred?

Centred

How well did you know this?

Not at all

Perfectly

What is the maximum number of PC’s?

Min(n-1,p)

How well did you know this?

Not at all

Perfectly

What is the formula for PVE of the mth PC?

Formula

How well did you know this?

Not at all

Perfectly

What is the model equation for PCR? What happens when k=p?

Formula

How well did you know this?

Not at all

Perfectly

What are the two methods of calculating within cluster variation?

Formula

How well did you know this?

Not at all

Perfectly

What is the algorithm for k means clustering?

Randomly assign a cluster to each observation. This is the initial cluster assignments, pre determined number of clusters.
Calculate the centroid of each cluster
For each observation, identify the closest centroid and reassign to that cluster
Repeat steps 2 and 3 until the cluster assignments stop changing.

How well did you know this?

Not at all

Perfectly

What are 2 drawbacks of k means clustering?

Initial cluster assignments affect the final assignments.
Selecting k is an arbitrary process
Not robust

How well did you know this?

Not at all

Perfectly

Does k means need to have it variables standardized?

No, this relies heavily on the problem at hand

How well did you know this?

Not at all

Perfectly

Are k means and hierarchical clustering robust?

How well did you know this?

Not at all

Perfectly

Is k means clustering greedy?

Yes

How well did you know this?

Not at all

Perfectly

Centroid linkage is subject to _____. And single linkages has a dendogram that is _____

Inversions and skewed

How well did you know this?

Not at all

Perfectly

True or false: when performing PCR, it is recommended to standardize the predictors prior to generating the principle components.

True. This is to avoid high variance variables from monopolizing the principle components.

How well did you know this?

Not at all

Perfectly

Can PCR reduce overfitting?

Yes, instead of using all of the original variables, PCR uses only the first k PC’s to predict the response, which reduces overfitting.

How well did you know this?

Not at all

Perfectly

Is PCR useful for performing feature selection?

Study These Flashcards

No, because we are using all variables when we find the principle components.

Are all PCA loadings unique?

Study These Flashcards

No. Each PC loading vector is unique (up to a sign flip). So, two different softwares can find the same loading vectors, but the signs may differ.

NOT unique bc they can take the negative value of themselves.

Together, do all the principle components explain 100% of the variance?

Study These Flashcards

Yes

Which is more restrictive in its clustering nature, k means or hierarchical?

Study These Flashcards

K means is less restrictive, because hierarchical clustering must produce nested clusters as a function of the number of clusters.

K means simply used euc distances, which has no specific structure on the results.

If only 3 of the 4 principle components are used in a model, will the cumulative PVE ever be 100?

Study These Flashcards

No, only if all PC’s are used.

In cluster analysis, could we cluster the observations on the basis of the features or cluster the features based on the observations?

Study These Flashcards

Both

Which type of clustering has less areas of consideration?

K means. Only have to choose k. In hierarchical, we have to choose linkage, dissimilarity measure, and deciding on the number of clusters (what height to cut the dendogram).

True or false. Using all possible principle components provides the best understanding of the data.

False. Usually only the first couple are necessary

What is a scree plot used for? What does the graph show?

It provides a method for determining the number of PC’s to use. It graphs the PVE against the PCs

How many iterations of hierarchical clustering are done?

N-1

If two different PCA models have unique loadings up to their signs, will they have the same PC score on the same X variables?

Due to the fact that the loadings have the same magnitude, but different signs (complete opposite) the PC scores will have the same magnitude, but different signs as well.

Can we use categorical variables in clustering?

Only in hierarchical.

True or false: kmeans can identify outliers and hierarchical cannot.

False. They both force every obs into a cluster, outliers can greatly affect the result of clustering but cannot identify the outliers themselves

Is PCA a dimension reduction tool?

Yes

Which of the following are true for hierarchical clustering? A. Performing hierarchical clustering results in n-1 fusions B. Categorical variables can be used in the analysis. C. Hierarchical clustering is robust.

A and B. Categorical variables can be used in hierarchical clustering because it can use correlation based distance instead of Euclidean. K means cannot use categorical variables.

Can either k means or hierarchical clustering identify outliers?

No, neither can.

True or False: A hierarchical agglomerative clustering begins with a cluster of all observations, after which splits are made at each step.

False. That is a hierarchical divisive clustering approach (Top down).

True or false: hierarchical clustering yields better results than k-means clustering because the algorithm for hierarchical only needs to be run once.

False. That is an advantage of hierarchical over k means, but it doesn’t mean that hierarchical is better.

If we look at a correlation matrix, if there is perfect correlation between two features, what can we conclude?

If there are p features, then p-1 features can perfectly explain the variance in the data set.

From a correlation matrix, can we conclude the values (positive or negative) of the principle component loadings?

If we look at a correlation matrix, and we see that none of the features are highly/moderately correlated, what can we assume in terms of the number of principle components needed?

More than 1 PC will be needed to explain a good amount of variability in the model.

Can principle components be used for supervised learning?

Yes, because of PCR.

Are the following two statements true or false? A. The principle component score vectors have length p, while the principle component loading vectors have length n. B. The eigenvectors of the matrix XtX are the PC directions, while the eigenvalues are the variances of the components.

A. False. PC loading vectors have length n, while PC scores have length n. B. True. PCA involves finding the eigenvectors and eigenvalues of the matrix XtX.

True or false: A. Absolute correlation should not be used when performing hierarchical clustering on datasets with two features B. Euclidean distance focuses on the magnitude of observation profiles rather than their shape C. Two observations are said to be similar if they have a large correlation based distance

A. True. Because the absolute correlation between any two observations with two features is always 1. Absolute correlation needs at least 3 features. B. True. C. False. Large correlation distance = not similar

In a biplot, the coordinates represent _______ and the lines represent _____.

PC scores. They have their values on the left vertical and bottom horizontal. PC loading vectors. They have their values on the right vertical and top horizontal.

From a datasets correlation matrix, we see that the predictors are highly correlated. how can we determine if the first PC loading vectors are all positive/negative/a mix of both?

Since the data is highly positively correlated, this means the loadings of the first PC are either going to be all positive or all negative

True/false. The first k PC scores and the first k PC loadings provide the best approximation to the original dataset. If true, what is the formula?

True.

True/False. PCA is still useful if the variables in the data set are uncorrelated.

False. It’s not useful.

6. Unsupervised Learning Flashcards

(46 cards)