Data Science MODULE 6 Flashcards

1
Q

So wat is die hoof doelwot van k means clustering?

A

Om clusters to vorm, met die punte daarin baie soortgelyk, maar clusters moet behoorlik van mekaar verskil

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Wat se basiese beginsel word gebruik om similarity te meet met k-means clustering

A

Euclidean distancr between points

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Soos met neural networks, wat doen ons met die waardes waarmee ons werk by k-means clustering

A

Hulle moet skaleer word, dat hulle vergelykbaar is

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

In basiese beginsels, joe werk die k-menas algoritme

A

Kies, vir die hoeveelheid gedefinieer , ewekansige punte. Neem dan die gem van die punte naaste aan daardie punt. Die gen is die nuwe centroid. Hou aan met hierdie, tot die waarde nie meer rerig verskil nie

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Rule of thum om die hoeveelheid clusters te bepaal?

A

=sqrt(N/2) waar N die aantal observasies is.
Die aantal clusters moet altyd minder wees as N, en meer as 2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Elbow method - what is distortion? What is inertia?

A

Average of the squared distances from each of the observations to the specific centroid
Inertia is simply the sum

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

So hoe word die elbow method gebruik?

A

Plot distortion/inertia teenoor die aantal clusters. Daar waar dit nie meer regtig verminder nie, is waar ons die lyn trek met die hoeveelheid clusters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Silhouette method vir K-means clustering

A

Selfde beginsel as met die elbow method, ons soek net nou vir die grootse waarde as n funskie van clusters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Drie van die groot drawbacks met k-means clustering

A

Die aanvanklike initialisation en ook die hoeveelheid clusters. Ju vind local minima, nie noodwending global minima
K-means probeer die data eweredig versprei
Goed met sferiese areas

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Twee initialisation metodes van k-means

A

Via die init parameter, random en k-means++ wat die intialisation forseer om ver weg van mekaar te begin

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Nuwe imports met k-means clustering

A

Import math
Import seaborn as sns
From sklearn.metrics import silhoutte_score
From sklearn.cluster import KMeans

How well did you know this?
1
Not at all
2
3
4
5
Perfectly