Föreläsning 6(clustering) Flashcards

Question 1

Q

Vad är clustering?

Answer

A

Dela in objekt/data i grupper baserat på likheter

Question 2

Q

Med vad för typ av maskininlärning gör man clustering?

Answer

A

Unsupervised learning(no labels)

Question 3

Q

Vad är syftet med clustering?

Answer

A

Understand: gain insight into data, generate hypotheses and identify salient features
Structure: Identify taxonomies and relationships among data points
Compress: Organize data and summarize it through representatives(“prototypes”)

Question 4

Q

Vad är stegen i k-means clustering?

Answer

A

randomize the centers(you have decided on the number of centers before)
form the clusters
Move the centers so that they are closer to the points in their cluster(we do this by computing the mean
x-coordinate and the mean y-coordinate)
4 . form the clusters
->continue to move the centers and form clusters until the clusters stay the same when you try to move them
closer to their data points.

Question 5

Q

Ge exempel på vad k-means clustering används till

Answer

A

Marketing(suggesting products) and customer personas(identifying representatives of each cluster)

Question 6

Q

Vad står DBSCAN för och när uppfanns det?

Answer

A

Density-Based Spatial Clustering of Applications with Noise, 1996

Question 7

Q

Vad är hyperparametrarna i DBSCAN?

Answer

A

*a distance measure(metric)
*epsilon: a number defining the max distance between neighbors
*minpts: a number defining the min number of points in a cluster

Question 8

Q

Vilka tre typer av data points finns i DBSCAN och vad utmärker dem?

Answer

A

Core point: har minst(at least) minpts grannar
Border point: en icke-core point som har minst en core point som granne
Noise point: outlier. Ej core eller border(dvs har inga core points som grannar)

Question 9

Q

Vad är stegen i DBSCAN clustering?

Answer

A

välj ut en random core point(p) och gör den t.ex blå.
gör alla grannar till p blå
om någon av p:s grannar också är core points, gör även deras grannar blå etc
när det inte finns fler att fylla i blå, välj en annan random core point att fylla i t.ex grön. repetera

Question 10

Q

Hur gör man hierarchical clustering?

Answer

A

Hitta de två närmsta points som inte är i samma cluster och gör dom till samma cluster. Fortsätt tills du har
det önskade antalet clusters(är angett i hyperparametrarna)

Question 11

Q

Vad är ett dendrogram?

Answer

A

Ett “träd” som visar hela den hierarchichal clustering process(utan att stoppa vid ett angett antal clusters)

Question 12

Q

Vilken process användes tidigare för kartlägga släkt-avståndet mellan olika arter genom deras
ribosome DNA sequences och vilken process används i moderna approacher?

Answer

A

Omodern: hierarchical clustering
modern: hidden markov models