B05 k-Means Clustering Flashcards

1
Q

k-means clustering is a ________, ________ and
__________ clustering approach that assigns all n items in a dataset to one of k clusters, such that the
differences within a cluster are minimized while the
differences between clusters is maximized

A

partitional, exclusive, complete

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Good clustering will produce clusters with:

  • ______ intra-class similarity.
  • ______ inter-class similarity.
A

High

Low

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Other distance measures used in Clustering include:

A

Minkowski distance, Pearson
correlation distance, Spearman
correlation distance and Kendall
correlation distance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Challenges with k-Means Clustering

k-Means is very sensitive to the initial randomly chosen cluster centers (this is known as the ________)

A

random

initialization trap

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

The _______ initialization approach mitigates the effects of the random initialization trap.

A

K-means++

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Methods for choosing the right K include:

A
Elbow Method
Information Criterion Approach
Silhouette method
Jump method
Gap statistic
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

WCSS stands for ________ and is associated with the _____ for choosing K

A

Within Cluster Sum of Squares

Elbow Method

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Strengths of k-Means Clustering?

A
-Uses simple non-statistical
principles.
-Very flexible and malleable
algorithm.
-Wide set of real-world
applications.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Weaknesses of k-Means Clustering?

A

-Simplistic algorithm.
–Relies on chance.
S-ometimes requires some
domain knowledge in
determining the ideal number
of clusters.
-Not ideal for non-spherical
clusters.
-Works with numeric data only.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

An individual independent example of the concept
represented by the dataset. It is described by a set of
attributes or features

A

An instance (row)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Property or characteristic of an instance. These can

either be discrete or continuous.

A

Feature

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

The attribute or feature that is described by the other

features within an instance.

A

Class

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

The ___________ of a dataset represents the number

of features in the dataset.

A

dimensionality

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Data _______ and _______ describe
the degree to which data exists for
each feature of all observations.

A

sparsity

density

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

__________ describes the grain or level of detail in the data.

A

Resolution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

_________ is the process
of reducing noise in the
data.

-
-

A

Smoothing
Binning
Clustering
Regression