Jupyter Notebook 1.4 Multiclass classification Flashcards

1
Q

Why is it so important to normalize the data when using the SGDClassifier?

A

Stochastic Gradient Descent (SGD) is sensitive to the scale of different features. When features have different ranges, the shape of the cost function-esesentially a measure of how well the model is doing-becomse skewed.
Normalizing the data puts all features on the same scale, resulting in a more efficient and stable optimization process.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How does the sensitivity to feature scaling differ between SGDClassifier and RandomForestClassifier?

A
  • SGDClassifier: Sensitive to the scale of features; scaling is important for performance.
  • RandomForestClassifier: Not sensitive to the scale of features; normalization is generally unnecessary.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Why is RandomForestClassifier insensitive to feature scaling?

A
  • Random forests make decisions based on the ‘purity’ of labels at each split in the decision trees.
  • Decisions are binary (e.g., “Is Feature A greater than some value?”) and are not influenced by the magnitude or scale of the feature values.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly