Cluster Analysis Flashcards

1
Q

Multivariate Statistics

A

Cluster = multivariate statistics
2 uses of multivariate statistics: dimensionality reduction and uncovering structure (unsupervised learning)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

EFA Summary

A

Finds structure in a set of variables by processing correlations between them
Extracts factors to best represent the variables’ interrelations
Measurements provide a dimensional model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Cluster Analysis

A

Finds structure in data
Looks for categories
Uses information about similarity between objects rather than correlations between variables
Classifies variables into categories based on similarities

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Grouping Data into Clusters

A

Given a set of points and a notion of distance between points, creates clusters
Members of clusters are similar and members of other clusters are dissimilar
Points in high-dimensional space
Similarity defined using a distance measure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Euclidean Distance

A

Length of line estimated by using dimesnion of x and y and squared differences
Very sensitive - affected by outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Hierarchical Clustering - Agglomerative

A

Bottom up
Initially each datapoint is a cluster and clusters are recursively combined
Combines the two nearest clusters into a large one and keeps going

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Hierarchical Clustering - Divisive

A

Top down
Initially all datapoints are a single cluster and is recursively split into smaller ones
Assumers they belong to one big cluster then breaks it down into smaller clusters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Point Assignments

A

K-means
Maintain a set of clusters (k)
Points belong to ‘nearest’ cluster

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Dendrogram

A

Tree-like diagram of relating points

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Euclidean Space

A

Stopping combining clusters
- pick a number (k) upfront and stop when clusters = k
- stop when next merge will create low cohesion
Cohesion - cluster diameter, radius from centroid

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

K-Means Clustering

A

Point-assignment method
Preferable for very large datasets
Assumes Euclidean space
Things become more manageable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Method of k-means Clustering

A

Assigns the k centroids to points
Creates clusters by assigning points to the cluster whose centroids they are closest to
Centroids selected then data is assigned to each centroid they are closest to
Reassigned by taking averages
Data points reassigned until it reaches maximum number of iterations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Limitations of Clustering

A

Number
Objects to cluster - representative and random
Only include variables with good reason
Require interpretation
Validation of classification
Define number of clusters after research
Choose objects carefully
Unmeaningful data = unmeaningful clusters
Validation using other measures and attempt to see if they differ significantly from other measures

How well did you know this?
1
Not at all
2
3
4
5
Perfectly