Clustering Flashcards

Question 1

Q

What is clustering

Answer

A

taking a set of datapoints and putting them into groups so that each group has points that are close to or similar to each other

Question 2

Q

Euclidian distance (straight line)

Answer

A

sqrt((x1-y1)^2 + (x2-y2)^2)

Question 3

Q

Rectolinear distance

Answer

A

abs(x1-y1) + abs(x2-y2)

Question 4

Q

p norm distance (minkowski distance)

Answer

A

pthroot(abs(x1-y1) ^p+ abs(x2-y2)^p)
or
pthroot( sum i = 1 to no of abs(xi-yi)^p)
where p is 2 for straight line
and 1 for rectolinear

generalized to any dimension its the sum over all n dimensions

Question 5

Q

infinity norm distance

Answer

A

pthroot( sum i = 1 to no of abs(xi-yi)^p)

its approximately equal to inf root(max i abs(xi-yi)^p)

it equals themax i abs(xi-yi) because inf to the inf. root cancels

Question 6

Q

what is the infinity norm

Answer

A

the largest (absolute) of a set of numbers

Question 7

Q

Why would you use infinity norm?

Answer

A

how long does it take to do something with multiple simultaneous steps? whatever the maximum time length thing is! duh!

Question 8

Q

clustering workflow

Answer

A

pick k cluster centers within range of data
1.assign each data point to nearest cluster center
recalculate cluster centers
repeat and 1 then 2 until no datapoint changes groups and therefore the cluster centers don’t change

Question 9

Q

heurisitic

Answer

A

an algorithm that’s not gauranteedto find the best solution

Question 10

Q

expectation-maximization (like clustering)

Answer

A

an iterative procedure that alternates between taking an expectation (finding cluster centers) and maximizing (assign points to the clusters)

Question 11

Q

k-mean algorithm

Answer

A

-heuristic
-run several times with different intiial cluster centers and choose the best solution you find
-run with different values of k (# of clusters)

THEN
compare the total distance to the # of clusters and look for the elbow which represent diminishing returns
-important to consider qualitative aspects as well (if we want to)

Question 12

Q

Predictive clustering

Answer

A

if a new datapoint falls within a cluster we can assign it to that cluster. if it falls outside of a cluster we can assign it to the closest cluster center

Question 13

Q

varanoid diagram

Answer

A

basically just the space around each cluster center that we would predict a point to be apart of that cluster– based on distance

Question 14

Q

classification vs clustering

Answer

A

classification - we know response variable, this is supervised learning
clustering - we don’t know the response, this is unsupervised learning

Question 15

Q

What is clustering useful for

Answer

A

-targeted marketing
-personalized medicine
-physical distance (libraries, police station, branches!)
-image analysis

Clustering Flashcards

(15 cards)