week 4 - unsupervised learning Flashcards

Question 1

Q

how does unsupervised k-means clustering work?

Answer

A

It finds clusters within a dataset that doesn’t have any labels on it

first you initialise cluster centers randomly.

we then assign data points to the nearest centre. We then re-calculate the center as the mean of our cluster and reassign data to the nearest cluster centre

We keep doing this until it reaches some sort of convergence criteria (the centre shifts become very small)

Question 2

Q

how can we figure out whether a k-means clustering solution is good.

Answer

A

We can use the average sillouhette coefficient

This coefficient is based on intra/inter cluster distance

This measures whether data points within a cluster a close to other points within the cluster, but far away from points not within the cluster

each sillouhette coefficient can indicate whether a datapoint has been misclassified. -1 means probably misclassified. 0 means on the boundary and near 1 means well classified

Question 3

Q

why might we want to decompose highly correlated variables?

Answer

A

Because we might be able to get a similar amount of data with just one dimension

So dimensionality reduction can reduce the curse of dimensionality problems whilst retaining a lot of the information

Question 4

Q

what is the difference between regression and PCA?

Answer

A

Regression minimises the error when predicting y from x

PCA fits a line that minimises the distance between all the points. Rather than predicting y from x, this line captures the variance in the data.

Also, note that in multiple dimensions, PCA still fits a line through the data whereas regression fits a plane.

Question 5

Q

How does PCA capture sources of variance?

Answer

A

PCA fits a line that minimises the distance between all the points (minimum sum of squared distances between the points and the line). The position of points relative to this line captures the biggest source of variance in the data. This can get compressed into 1-dimension, by projecting each data point onto the one dimensional line.

Then, for a second dimension, we fit an orthogonal line that captures the remaining variance. This is because all the remaining variance will be orthogonal to the first line.

The amount of orthogonal lines i.e components, is equivalent to the amount of dimensions in the data.

Question 6

Q

How do we determine the best number of PCA components?

Answer

A

We can create a scree plot, which plots the explained variance and cumulative explained variance according to the number of components. You can then keep the number of components that correspond to the inflection point in the scree plot.

We can optimise the number of components as a hyperparameter (when PCA is combined with supervised learning)

Question 7

Q

Is PCA supervised or unsupervised? And can it be combined with the other type

Answer

A

PCA is unsupervised but it is often combined with supervised learning

Question 8

Q

What is the main goal of PCA?

Answer

A

To reduce the dimensionality of the data to address the curse of dimensionality.

This is particuarly helpful when we have correlated variables, as we don’t lose much important information by taking principle components

It also, when combined with supervised learning, can reduce overfitting to noise thus making supervised learning more powerful

Question 9

Q

what can combining pca with clustering do?

Answer

A

You can conduct clustering on a PCA from functional neuroimaging data to obtain particular brain ‘states’

Question 10

Q

What are alternatives to PCA?

Answer

A

ICA

Hierachal clustering

Gaussian mixture modelling

Question 11

Q

week 4 - unsupervised learning Flashcards

(11 cards)