Jupyter Notebook 1.4 Multiclass classification Flashcards

Question 1

Q

Why is it so important to normalize the data when using the SGDClassifier?

Answer

A

Stochastic Gradient Descent (SGD) is sensitive to the scale of different features. When features have different ranges, the shape of the cost function-esesentially a measure of how well the model is doing-becomse skewed.
Normalizing the data puts all features on the same scale, resulting in a more efficient and stable optimization process.

Question 2

Q

How does the sensitivity to feature scaling differ between SGDClassifier and RandomForestClassifier?

Answer

A

SGDClassifier: Sensitive to the scale of features; scaling is important for performance.
RandomForestClassifier: Not sensitive to the scale of features; normalization is generally unnecessary.

Question 3

Q

Why is RandomForestClassifier insensitive to feature scaling?

Answer

A

Random forests make decisions based on the ‘purity’ of labels at each split in the decision trees.
Decisions are binary (e.g., “Is Feature A greater than some value?”) and are not influenced by the magnitude or scale of the feature values.

Jupyter Notebook 1.4 Multiclass classification Flashcards

(3 cards)