Block 2: Distance-based methods Flashcards

1
Q

Steps from euclidean distances E to configurations X

A
  • From E to B: put centroid at origin of data Xbar = 1^T X /n and set B = -1/2( I 11^T/n ) E ( I 11^T/n)
  • From B to X: construct Y with ith columns f(i) = srqt(λi) e(i) where λi and e(i) are the eigenvalues and eigenvectors of B = sum(from i=1 to n) λi e(i) e(i)^T
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Steps from configurations X to distances E and what information do we lose

A
  • let em,l = sum (from v=1 to p) (Xm,v = Xl,v)^2 = X(m)^T X(m) + X(l)^T X(l) -2X(m)^T X(l) = bm,m + bl,l -2bm,l where X(m) is the mth row of X and bm,l = X(m)^T X(l)
  • from X to B we lose orientation information: B = YY^T = (XP) (XP)^T = XX^T = B since P orthogonal rotation matrix
  • from B to E we lose position information: replace X by W = (X - μ) and get same E with W(m)^T W(l) = X(m)^T X(l) - X(m)^T μ - μ^T X(l) + μ^Tμ
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is classical multidimensional scaling? What is the impact of eigenvalues on CMS?

A
  • The method to obtain Y from E
  • A test for Euclidean-ness (negative eigenvalues correspond to the non-Euclidean nature of data)
  • A method for estimating dimensionality:
    n’’ ≈ the number of bigger eigenvalues than the rest
    or n’’ such that sum (from i=1 to n’’) λi ≈ tr(B)
    or reject λi such that |λi| <= |λn|
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How to run multidimensional scaling?

A

can use R function: cmdscale(distance object, k=max dimension, eig=TRUE) where the distance object E needs to be symmetrised

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Give properties of a metric d(x,y)

A
  • d(x,y) >= 0 and d(x,y) = 0 if x=y (not always)
  • symmetric: d(x,y) = d(y,x)
  • triangle inequality: d(x,y) + d(y,z) >= d(x,z)

keep in mind: if {dα} is a family of metrics then sumα (dα) is a metric

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Define Hamming distance

A

The number of mismatches: d(x,y) = sum(from i=1 to n) di(x,y) = b+c, where di(x,y) = 1 if xi = yi and 0 otherwise

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Define Jaccard distance

A
dJ(x,y) = (b+c) / (a+b+c) 
c = sum (1{x=0, y=1}) and b = sum (1{x=1, y=0})
a = sum (1{x=y=1}) and d = sum (1{x=y=0})
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Define example of 5 other dissimilarities distance

A
  • simple matching coefficient: (b+c) / (a+b+c+d)
  • Manhattan distance:
  • Canberra distance:
  • Maximum: L∞ norm
  • Gower’s smilirarity coefficient
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the stress function?

A

Stress function is the degree of agreement of dissimilarities {δm,l} and created euclidean distances {dm,l}
. monotone linear regression to get fitted {d^m,l} in the same order as {δm,l}
. S(X) = sqrt(S/T) where S*= sum(m

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Whate are the Miles Algorithm and Young’s boundary search algorithm?

A

Algorithms for the monotone linear regression resulting in increasing step function (each step value is the mean of the values in that “block”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do we chose configuration X from non-euclidean dissimilarities?

A
  • compute stress function

- minimise it over all possible configurations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How to find optimal configuration from stress function?

A
  • Initial configuration: random, L-shaped, ‘higher dimension K to lower’ or classical scaling solution
  • Optimisation using gradient method with ▽S:
    ∂S / ∂xi,k = S/2 (1/S* (∂S/∂xi,k) - 1/T(∂T* / ∂xi,k))
    with ∂T/∂xi,k and ∂S/∂xi,k in lecture notes with ∂xm,k / ∂xi,k = 1 if i=m and 0 otherwise
  • travel in the direction of ▽S to find optimal X
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Mention advantage of ordinal scaling

A

It can cope with missing data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Steps of K-means clustering

A
  • initialise with k centroids mj
  • for each observations Xi assign it to closest centroid (by euclidean distance) and update centroid by mean
  • stop when the clusters stay stable
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is self-organising maps SOM?

A

Similar to k-means but we assign and update values of closest neighbours instead of centroids

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is Procrustes Analysis?

A

Find the best configuration Y such that G(X,Y) = sum_k sum_i (Xi,k - Yi,k)^2 is minimised under translation, rotation and scale change.

17
Q

Steps of Procrustes

A
  • Translation: G(X,Y) minimised when ȳk = x̄k (matching centroids)
  • Rotation: G(X,PY) minimised when P* = UV^T for SVD of Y^TX = UΣV^T (because biggest Si,i of orthogonal S is 1 and therefore S=I)
  • Scale change: G(X,αY) minimised when α* = sum_i,k Xi,k Yi,k /sum_i,k Yi,k^2
18
Q

What is the Genral Procrustes Analysis algorithm?

A

(end of Block 2)

  • compute M, the mean of all the Yi (each modified using Procrustes)
  • repeat until convergence