wk56 Flashcards

Question 1

Q

What are some common distance measures

Answer

A

Square Euclidian Distance
Euclidian Distance
Manhattan Distance

Question 2

Q

Describe the K-means algorithm

Answer

A

-Assign random cluster centroids on the vector space
Repeat:
-Assign each data point to the nearest cluster centroid, the resulting assignments results in a newly formed cluster
-Find the mean of the newly formed cluster (the new centroid)
-Update assignments to the cluster if a data point is now closer to a different cluster
-Repeat until max iterations or no change in assignments

Question 3

Q

What are some limitations of k-means

Answer

A

-they will often find local and require multiple runs to find optimum
-no empirical way to choose the number of clusters that should be used
-will struggle with very high dimensional data, will instead require kernalisation

Question 4

Q

Which distance measure is used in kernalised k means

Answer

A

squared euclidean distance

Question 5

Q

How does the kernel trick work

Answer

A

-For calculating the distance of a datapoint to a new mean centroid when the data is created by some higher dimensional function that ordinary k-means cannot account for.
-We expand the square euclidian distance measure and find many dot products in the calculation, we also expand the mean to it’s equation to get rid of that variable in the process.
-Any time there is a dot product, we replace it with the kernal (e.g. polynomial or gaussian kernal). This projects the data into higher dimensional space
-So the data itself is not projected into higher dimensional space, but the distance measures and updates are calculated as if the data was in higher dimensional space

Question 6

Q

What are the advantages and limitations of kernalised k means

Answer

A

-brings flexibilty to k means
-can class higher dimensional data quickly
Limtations:
-computationally expensive
-high variance (sensitive to initalisation)
-suffers from local minima, may need repeated re-runs
-makes hard assignments, each point is 100% in some category, no nuanced probabilities

Question 7

Q

What is hierachical clustering

Answer

A

-each object starts as a singleton cluster
-shortest distance between two clusters is found, and those two clusters are combined into a single cluster
-when all objects in one cluster, end

Question 8

Q

What are the 3 different ways to measure the distance between two clusters

Answer

A

Mean / Single Link:
-select the two closest points between two clusters and compute the distance between them
Max / Complete Link:
-Find the two furthers points away from one another in the two clusters and compute the distance between those two
Average Link:
-Compute the mean point between all the points of the clusters and find the distance between the two (computing centroids)

Question 9

Q

How to decide which clusters to use?

Answer

A

Usually by eye or industry knowledge, cut off first long ass branch

Question 10

Q

Why do dimensionality reduction?

Answer

A

-Crazy high dimensions are not plottable or imaginable, plot a 6-dimensional graph and see how that works out for you fam
-Feature extraction, find which features are actually interesting to some task
-Subspace projection (link to point 1), projecting high dimensions to lower space

Question 11

Q

How do we do projection of high dimensional space to low space

Answer

A

Find some eigen vectors W such that original data Y can be projected as WY=X where W and X are of significant lower dimensions than Y

Question 12

Q

How do we choose projection (intuitively not formally)

Answer

A

look at the diagram, those with the highest variance in data (calculated as the distance from the mean)

Question 13

Q

How do we compute the variance of each variable in PCA

Answer

A

1/N * Sum over N for each n: ( y^(n) - mean^(n) )^2
Where n denotes the sample of some attribute (I), variance is calculated per attribute

Question 14

Q

define covariance matrix

Answer

A

a matrix denoting the variance of each variable (diagonally) and the variance of each variable with respect to the others (I.e their correlation)

Question 15

Q

What does each column of W represent,

Answer

A

each column, w_d represents a new dimension,. W looks like W = [w_1,w_2, … , w_d ] (each is a matrix)

Question 16

Q

What is the PCA Process (detail)

Answer

A

-First, we centre the data on the mean/origin (y_i - mu_i) for each data point
-constrain w^T.w to equal 1
-we want to find Var( matrix(y), matrix(w)) and argmax it for w
-expanding out var(YW) we get W^T.Y^T.Y.W^T which simplifies to w^T Covariance Matrix W
-after some magic we get:
argmax_w (W^T covariance W - lambda(W^T W -1))
-Differentiating it we get:
Covariance W = Lambda W
-Multiply both sides by w^T and we get:
w_i^T Covariance w_i = lambda_i, so lambda_i is the eigen value and variance
-select highest variance / lamda / eigen value for pc 1, second highest for pc2 etc

Question 17

Q

What is PCA in simple terms

Answer

A

We project the data onto a lower dimensional subspace and find the subspace such that we maximise the variance of the newly projected data
-each following principle component’s subspace is orthogonal to the next in order to get maximum uniqueness in features and data

Question 18

Q

How do you choose how many dimensions to use for PCA

Answer

A

Similar to k-means, looking at results, domain knowledge, no real empirical means

Question 19

Q

What are some limitations of principle component analysis

Answer

A

-Data must be real values
-Can’t have any gaps in data

Question 20

Q

Explain the process of kernalising PCA

Answer

A

Assume data is mean-centred. Recall equation:
-Covariance W = Lambda W
-expand covariance and you will see that the eigenvector can be found just by the linear combination of the projection of y_n and w_i. Then the projection of y_n
-after further expansion, we can kernalise all the dot products and find that solving for the variance is as in the image

Question 21

Q

We have previously assumed that the projected matrix of data is centred. What is the equation to centre the matrix

Answer

A

see image:
Kernal matrix - 2(1/N matrix)Kernal Matrix + 1/N Matrix K 1/N Matrix

Question 22

Q

What are the full steps for kernalised PCA

Answer

A

1) Select a kernel function
2) Kernel matrix computation, compute kernal K
3) centre the kernel matrix
4) solve eigen problem for centred kernel matrix K to obtain eigenvectors and eigenvalues
5) sort the eigenvalues and eigenvectors from largest to smallest by eigenvalue
6) compute the projections (aka scores) for each point (see image)