B05 k-Means Clustering Flashcards

Question 1

Q

k-means clustering is a ________, ________ and
__________ clustering approach that assigns all n items in a dataset to one of k clusters, such that the
differences within a cluster are minimized while the
differences between clusters is maximized

Answer

A

partitional, exclusive, complete

Question 2

Q

Good clustering will produce clusters with:

______ intra-class similarity.
______ inter-class similarity.

Question 3

Q

Other distance measures used in Clustering include:

Answer

A

Minkowski distance, Pearson
correlation distance, Spearman
correlation distance and Kendall
correlation distance.

Question 4

Q

Challenges with k-Means Clustering

k-Means is very sensitive to the initial randomly chosen cluster centers (this is known as the ________)

Answer

A

random

initialization trap

Question 5

Q

The _______ initialization approach mitigates the effects of the random initialization trap.

Answer

A

K-means++

Question 6

Q

Methods for choosing the right K include:

Answer

A

Elbow Method
Information Criterion Approach
Silhouette method
Jump method
Gap statistic

Question 7

Q

WCSS stands for ________ and is associated with the _____ for choosing K

Answer

A

Within Cluster Sum of Squares

Elbow Method

Question 8

Q

Strengths of k-Means Clustering?

Answer

A

-Uses simple non-statistical
principles.
-Very flexible and malleable
algorithm.
-Wide set of real-world
applications.

Question 9

Q

Weaknesses of k-Means Clustering?

Answer

A

-Simplistic algorithm.
–Relies on chance.
S-ometimes requires some
domain knowledge in
determining the ideal number
of clusters.
-Not ideal for non-spherical
clusters.
-Works with numeric data only.

Question 10

Q

An individual independent example of the concept
represented by the dataset. It is described by a set of
attributes or features

Answer

A

An instance (row)

Question 11

Q

Property or characteristic of an instance. These can

either be discrete or continuous.

Question 12

Q

The attribute or feature that is described by the other

features within an instance.

Question 13

Q

The ___________ of a dataset represents the number

of features in the dataset.

Answer

A

dimensionality

Question 14

Q

Data _______ and _______ describe
the degree to which data exists for
each feature of all observations.

Answer

A

sparsity

density

Question 15

Q

__________ describes the grain or level of detail in the data.

Answer

A

Resolution

Question 16

Q

_________ is the process
of reducing noise in the
data.

-
-

Answer

Study These Flashcards

A

Smoothing
Binning
Clustering
Regression

B05 k-Means Clustering Flashcards

(16 cards)