Feature Engineering Flashcards
What is a feature? Give three examples.
A feature is a distinct characteristic or property of the data.
different data inputs - runoff/rainfall
transformations of data - normalised or standardised
combinations of data - interactions
What is the advantage of normalization or standardization. Aside from z-score, what is one other type of standardization?
It helps the machine learning model process data that has different order of magnitudes of scales and ranges. It improves the efficiency and performance of the machine learning model. Poisson is another type of standardisation
What two choices to make when making rolling window feature?
Window size, aggregation function (e.g average)
Describe how feature engineering relates to concepts of signal and noise.
Creating rolling window features suppress the noise that is not related to that aspect. They help identify the first order signals.
What are some of the reasons to construct new features from data?
Identify different relationships/correlations, helps increase efficiency/ability for ML model to work with the data, captures covarying data, largest variation, can reorientate thinking of how we interpret the data
Describe the process of one-hot encoding. If N categories, how many new features are created?
Converts the categorial data into binary input features (e.g residential/not residential)
(N+M)!/(N!M!) features created
What is combined to produce an interaction feature? How are PCA and interaction features related? (axis)
One or more type of data combined into a single feature. PCA takes interaction features and converts the variables into uncorrelated variables (principal components). Drops the redundant (correlated) features
What are some of the disadvantages of feature engineering
Hard to select the best features. Possible to create ‘good’ through randomness. Curse of dimensionality. Domain expertise is required
Features can be created from lagged time series. Why might past data be useful in a ML prediction and how would this feature be created?
Many phenomena exhibit temporal dependencies, where the current state depends on past states. Understanding past data can help with more accurate predictions.