Preprocessing Flashcards
What is the first activity you want to do when you are creating a ML algorithm?
Preprocessing
What is Preprocessing?
any manipulation of the dataset before running it through the model
What is an example of preprocessing?
Changing csv or xlsx to npz file
Logarithmic transformations
Standardization
What are the main points of Preprocessing?
- Compatibility - TF uses tensors, not csv
- Orders of Magnitude - standardize inputs
- Generalization -
Relative metrics are especially useful when we have time-series data
What are some advantages of Logarithms?
Faster computation
Lower order of magnitude
Clearer relationships
Homogeneous Variance
What is the most common problem when working with numerical data?
Orders of Magnitudes
How do we solve the numerical orders of magnitude challenge?
Standardization
also called
feature scaling
normalization
What is standardization or feature scaling?
The process of transforming data into a standard scale
normal standard = subtract mean then divide by std dev
What is PCA?
Principle components analysis
dimension reduction technique used to combine several variables into a bigger (latent) variable
What is Whitening?
It is often performed after PCA.
removes most of the underlying correlation
useful for when the data should be uncorrelated
What are the two methods for encoding categorical data?
One-hot encoding
Binary encoding
What is the one big problem with one-hot encoding?
It requires a lot of new variables