Jupyter Notebook 2.7 Exta Imputing Missing Values, 2.8 Extra Feature Scaling Flashcards

1
Q

Why do we use .fit_transform() on the training set, but only .transform() on the test set?

A

.fit_transform() is used on the training set because it both learns from the data (e.g., calculates statistics like mean and standard deviation for scaling) and applies the transformation to the training data.

.transform() is used on the test set to apply the same transformation learned from the training set. It ensures that the test data is transformed consistently without learning from it, avoiding “data leakage.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly