Data Mining Flashcards
1
Q
What 4 major issues do you need to addresss when clustering?
A
1) Similarity / dissimilarity measure
2) Standardization/normalization of numeric variables
3) Re-coding categorical variables
4) Number of clusters
2
Q
Describe ways you might improve your model?
A
1) Divide and conquer: multiple models for different areas of data
2) Derive new, delete, merge variables
3) Impute missing values
4) Normalize/standardize numeric variables; remove outliers