Week 5: Clustering 2 Flashcards

Question 1

Q

What are the 2 ways we use to pick the k-value?

Answer

A

1) Elbow method
2) SSE

Question 2

Q

What are the 4 steps in data preparation for k-means clustering?

Answer

A

1) Select the variables of the dataset you want to use
2) Transform variables if needed
3) Standardize variables if needed
4) Weigh important variables

Question 3

Q

When and why do we perform variable transformations?

Answer

A

When there are skewed data. We do this because the k-means clustering would not give good results otherwise

Question 4

Q

What are the 2 types of data transformations we can use, and what’s the difference?

Answer

A

1) Log transformation
2) Square root transformation

Both aim to compress higher values so that the lower values are more spread out, but log is more aggressive that SR

Question 5

Q

Why do we standardize variables?

Answer

A

To prevent a scenario where variables with larger scales dominate variables with smaller scales

Question 6

Q

What is the most common form of standardization and its formula?

Answer

A

Z-score: (X - Mean) / S.D

Week 5: Clustering 2 Flashcards

(6 cards)