Representative Based Flashcards

Question 1

Q

Representative Based

Answer

A

N instances, k clusters, partition C of N in K
centroid_i = (1/n_i) * sum(xj)
where n_i = |Ci|

Question 2

Q

Brute Force

Answer

A

All possible clustering
Select best one
O(k^N/k!)

Question 3

Q

K-means

Answer

A

Greedy iterative approach
	SSE = sum(sum(||xj-centroid_i||^2))
	C’ = arg min SSE
Most widely know rep.
Complexity O(nkd), centroid recompute O(nd)

Question 4

Q

K-means Centroid init

Answer

A

1) Pick points far away
2) First point at random. add point larga distance
3) Hierarchically so k clusters/ Pick point of each cluster
4) more than K centroids and then merge

P = K! / K^K

Question 5

Q

Bisecting

Answer

A

one cluster, then divide, then divide.

Question 6

Q

Updating Center incrementally

Answer

A

After each assignment

More expensive

Question 7

Q

Pre processing

Answer

A

Normalize data

Eliminate outliers

Question 8

Q

Post processing

Answer

A

Eliminate small clusters
Split loose clusters
Merge clusters closer

Question 9

Q

Problems of Kmeans

Answer

A

Densities
Non globular shapes
Sizes
Need to specify k
Sensitive to outliers

Question 10

Q

BFR Algorithm

Answer

A

Very large data sets
Clusters normally distributed around centroid
Axis aligned ellipses
O(clusters) and not O(data)

Question 11

Q

BFR Steps

Answer

A

K random points
Small random sample and cluster optimally
Take sample, pick random point, and then k-1 more points, as far from previous
Discard Set, Compression Set, Retained Set

Question 12

Q

Mean Shift Clustering

Answer

A

Iterative, non parametric algorithm
It searches for the mode
Find densest region
Number of modes gives number of clusters
Can handle arbitrarily shaped clusters
Robust to initializations
O(T*n^2) expensive

Question 13

Q

Mean Shift Pseudo

Answer

A

1) choose bandwidthh
2) initial location of search window
3) compute mean location
4) center search window point 3
5) repeat 3 and 4

Question 14

Q

Expectation Max - Clustering

Answer

A

soft assignment. each point has a probability
Mean vector u_i, covariance matrix E_i
phi = {u_i * E_i * P(C_i)}
Goal: maximize phi

Question 15

Q

Density Based

Answer

A

Arbitrary Shape
Handle noise
One Scan

Neighborhood within a radius E
Core object: if E-neig contains at least minpts
Direct density reachable from y if x is within e-neigh
Density Reachable, Density Connected.

Representative Based Flashcards

(15 cards)