6. Unsupervised Learning Flashcards

1
Q

What is PCA.

A

Statistical tool that finds a low-dimensional representation of a dataset that contains as much info from the dataset as possible.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Can PCA be used in a supervised and unsupervised setting?

A

Yes, PCR would be used in a supervised setting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Formula for the mth PC

A

Formula

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

The principle component loadings are constrained to what?

A

Formula

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Mth PC score formula

A

Formula

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

For PCA, do the variables need to be scaled or centred?

A

Centred

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the maximum number of PC’s?

A

Min(n-1,p)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the formula for PVE of the mth PC?

A

Formula

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the model equation for PCR? What happens when k=p?

A

Formula

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the two methods of calculating within cluster variation?

A

Formula

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the algorithm for k means clustering?

A
  1. Randomly assign a cluster to each observation. This is the initial cluster assignments, pre determined number of clusters.
  2. Calculate the centroid of each cluster
  3. For each observation, identify the closest centroid and reassign to that cluster
  4. Repeat steps 2 and 3 until the cluster assignments stop changing.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are 2 drawbacks of k means clustering?

A
  1. Initial cluster assignments affect the final assignments.
  2. Selecting k is an arbitrary process
  3. Not robust
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Does k means need to have it variables standardized?

A

No, this relies heavily on the problem at hand

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Are k means and hierarchical clustering robust?

A

No

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Is k means clustering greedy?

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Centroid linkage is subject to _____. And single linkages has a dendogram that is _____

A

Inversions and skewed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

True or false: when performing PCR, it is recommended to standardize the predictors prior to generating the principle components.

A

True. This is to avoid high variance variables from monopolizing the principle components.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Can PCR reduce overfitting?

A

Yes, instead of using all of the original variables, PCR uses only the first k PC’s to predict the response, which reduces overfitting.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Is PCR useful for performing feature selection?

A

No, because we are using all variables when we find the principle components.

20
Q

Are all PCA loadings unique?

A

No. Each PC loading vector is unique (up to a sign flip). So, two different softwares can find the same loading vectors, but the signs may differ.

NOT unique bc they can take the negative value of themselves.

21
Q

Together, do all the principle components explain 100% of the variance?

A

Yes

22
Q

Which is more restrictive in its clustering nature, k means or hierarchical?

A

K means is less restrictive, because hierarchical clustering must produce nested clusters as a function of the number of clusters.

K means simply used euc distances, which has no specific structure on the results.

23
Q

If only 3 of the 4 principle components are used in a model, will the cumulative PVE ever be 100?

A

No, only if all PC’s are used.

24
Q

In cluster analysis, could we cluster the observations on the basis of the features or cluster the features based on the observations?

A

Both

25
Q

Which type of clustering has less areas of consideration?

A

K means. Only have to choose k.

In hierarchical, we have to choose linkage, dissimilarity measure, and deciding on the number of clusters (what height to cut the dendogram).

26
Q

True or false. Using all possible principle components provides the best understanding of the data.

A

False. Usually only the first couple are necessary

27
Q

What is a scree plot used for? What does the graph show?

A

It provides a method for determining the number of PC’s to use. It graphs the PVE against the PCs

28
Q

How many iterations of hierarchical clustering are done?

A

N-1

29
Q

If two different PCA models have unique loadings up to their signs, will they have the same PC score on the same X variables?

A

Due to the fact that the loadings have the same magnitude, but different signs (complete opposite) the PC scores will have the same magnitude, but different signs as well.

30
Q

Can we use categorical variables in clustering?

A

Only in hierarchical.

31
Q

True or false: kmeans can identify outliers and hierarchical cannot.

A

False. They both force every obs into a cluster, outliers can greatly affect the result of clustering but cannot identify the outliers themselves

32
Q

Is PCA a dimension reduction tool?

A

Yes

33
Q

Which of the following are true for hierarchical clustering?
A. Performing hierarchical clustering results in n-1 fusions
B. Categorical variables can be used in the analysis.
C. Hierarchical clustering is robust.

A

A and B.

Categorical variables can be used in hierarchical clustering because it can use correlation based distance instead of Euclidean. K means cannot use categorical variables.

34
Q

Can either k means or hierarchical clustering identify outliers?

A

No, neither can.

35
Q

True or False: A hierarchical agglomerative clustering begins with a cluster of all observations, after which splits are made at each step.

A

False. That is a hierarchical divisive clustering approach (Top down).

36
Q

True or false: hierarchical clustering yields better results than k-means clustering because the algorithm for hierarchical only needs to be run once.

A

False. That is an advantage of hierarchical over k means, but it doesn’t mean that hierarchical is better.

37
Q

If we look at a correlation matrix, if there is perfect correlation between two features, what can we conclude?

A

If there are p features, then p-1 features can perfectly explain the variance in the data set.

38
Q

From a correlation matrix, can we conclude the values (positive or negative) of the principle component loadings?

A

No

39
Q

If we look at a correlation matrix, and we see that none of the features are highly/moderately correlated, what can we assume in terms of the number of principle components needed?

A

More than 1 PC will be needed to explain a good amount of variability in the model.

40
Q

Can principle components be used for supervised learning?

A

Yes, because of PCR.

41
Q

Are the following two statements true or false?
A. The principle component score vectors have length p, while the principle component loading vectors have length n.
B. The eigenvectors of the matrix XtX are the PC directions, while the eigenvalues are the variances of the components.

A

A. False.
PC loading vectors have length n, while PC scores have length n.

B. True.
PCA involves finding the eigenvectors and eigenvalues of the matrix XtX.

42
Q

True or false:
A. Absolute correlation should not be used when performing hierarchical clustering on datasets with two features
B. Euclidean distance focuses on the magnitude of observation profiles rather than their shape
C. Two observations are said to be similar if they have a large correlation based distance

A

A. True. Because the absolute correlation between any two observations with two features is always 1. Absolute correlation needs at least 3 features.

B. True.

C. False. Large correlation distance = not similar

43
Q

In a biplot, the coordinates represent _______ and the lines represent _____.

A

PC scores. They have their values on the left vertical and bottom horizontal.
PC loading vectors. They have their values on the right vertical and top horizontal.

44
Q

From a datasets correlation matrix, we see that the predictors are highly correlated. how can we determine if the first PC loading vectors are all positive/negative/a mix of both?

A

Since the data is highly positively correlated, this means the loadings of the first PC are either going to be all positive or all negative

45
Q

True/false. The first k PC scores and the first k PC loadings provide the best approximation to the original dataset. If true, what is the formula?

A

True.

46
Q

True/False. PCA is still useful if the variables in the data set are uncorrelated.

A

False. It’s not useful.