Week 5: Clustering 2 Flashcards

1
Q

What are the 2 ways we use to pick the k-value?

A

1) Elbow method
2) SSE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the 4 steps in data preparation for k-means clustering?

A

1) Select the variables of the dataset you want to use
2) Transform variables if needed
3) Standardize variables if needed
4) Weigh important variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

When and why do we perform variable transformations?

A

When there are skewed data. We do this because the k-means clustering would not give good results otherwise

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the 2 types of data transformations we can use, and what’s the difference?

A

1) Log transformation
2) Square root transformation

Both aim to compress higher values so that the lower values are more spread out, but log is more aggressive that SR

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Why do we standardize variables?

A

To prevent a scenario where variables with larger scales dominate variables with smaller scales

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the most common form of standardization and its formula?

A

Z-score: (X - Mean) / S.D

How well did you know this?
1
Not at all
2
3
4
5
Perfectly