Jupyter Notebook 1.4 Multiclass classification Flashcards
1
Q
Why is it so important to normalize the data when using the SGDClassifier?
A
Stochastic Gradient Descent (SGD) is sensitive to the scale of different features. When features have different ranges, the shape of the cost function-esesentially a measure of how well the model is doing-becomse skewed.
Normalizing the data puts all features on the same scale, resulting in a more efficient and stable optimization process.
2
Q
How does the sensitivity to feature scaling differ between SGDClassifier and RandomForestClassifier?
A
- SGDClassifier: Sensitive to the scale of features; scaling is important for performance.
- RandomForestClassifier: Not sensitive to the scale of features; normalization is generally unnecessary.
3
Q
Why is RandomForestClassifier insensitive to feature scaling?
A
- Random forests make decisions based on the ‘purity’ of labels at each split in the decision trees.
- Decisions are binary (e.g., “Is Feature A greater than some value?”) and are not influenced by the magnitude or scale of the feature values.