L10; CLUSTERING Flashcards

Question 1

Q

clustering

Answer

A

clustering is used to group/ classify or to create subsets of data with similar attributes.
it works by calculating the similarity of different objects.
this is often considered as the inverse of distance.

limitation; similarity is sometimes difficult to define and different similarity criteria can lead to different clustering results.

Question 2

Q

clustering types

Answer

A

Hierarchical Clustering

2. Non-Hierarchical clustering

Question 3

Q

hierarchical clustering (2)

Answer

A

algorithms;

Agglomerative clustering (bottom up)
Divisive clustering (top down)

Question 4

Q

three way of distance measures

Answer

A

single linkage (closest point between two)
Complete Linkage (Furthest neighbour)
Average Linkage ( calculate every single points and then use average)

Question 5

Q

K-means clustering

Answer

A

K-Means is the most commonly used clustering algorithm.

K refers to the number of clusters you want to classify your data into.

Question 6

Q

procedure of k-means clustering

Answer

A

choose value for K, the number of clusters.
Randomly choose K points as centroids.
Assign items to cluster with nearest centroid(mean).
Recalculate centroids as the average of all data points in a cluster.
repeat steps 3 and 4 till no more reassignments or reach max number of iterations.

Question 7

Q

k-means clustering limitations

Answer

A

difficult to choose K, need human inspection or novel algorithms.
dependant on seeds/ center positions;
sensitive to outliers;

Question 8

Q

variable reduction

Answer

A

variable reduction techniques can be used to reduce the dimensions( variables/ columns) of a dataset before applying clustering methods.

This allows clustering on multidimensional data to be visualised in 2 or 3 dimensional space.

Principal Components Analysis and Exploratory Factor Analysis will be covered in the Forecasting and Advanced Business Analytics module.

L10; CLUSTERING Flashcards

(8 cards)