Feature Selection & Dimensionality Reduction Flashcards
Notes from Lecture 4 that may help in my exam
What are the main aims for applying Feature Selection and Dimensionality Reduction techniques?
- Reduce the impact caused by Curse of Dimensionality
- Remove redundant features to improve performance
- Increase computational efficiency
- Reduce cost in new data acquisition
What factors should be considered when using Feature Selection or Dimensionality Reduction methods?
- The target dimension i.e. what you wish to reduce down to
- Interpretability (Yes - use Feature Selection, No - Use either method)
- Feature Correlations/dependency
- Feature reliability and repeatability
- Methods (different methods result in different features being used)
What are the three popular Feature Selection methods?
Wrapper Method - Searches for optimal feature subset that maximises the decision-making performance.
Embedded Methods - Integrates Feature Selection into the Model Learning process
Filter-based Methods - Selection is based on feature relationships and statistics, rather than performance
What are some examples of Wrapper Methods, in regards to Feature Selection?
Recursive Feature Elimination
Sequential Feature Selection
What are some examples of Embedded Methods, in regards to Feature Selection?
Ridge (ElasticNet)
LASSO
Random Forest (feature ranking)
What are some examples of Filter-based Methods, in regards to Feature Selection?
Univariate (ANOVA)
Chi Square
Correlation/Variance
How does the Forward Feature Selection method work?
Starts with an empty set of features (X), and adds features one by one. The goal is to identify a subset of features that maximises the model’s performance based on a chosen evaluation metric, such as Accuracy, F1 Score, Mean Squared Error, etc…
What is the step-by-step breakdown of Forward Feature Selection?
Start with an Empty Feature Set
Then, for each feature:
- If it increases the evaluation metric beyond the previously defined evaluation metric, add it to the list of features and update the evaluation value.
- If it doesn’t, then ignore it
After iterating through all the features, then return the newly defined subset of features as the features to be used
What is the difference between Forward Feature Selection and Recursive Feature Elimination?
Forward Feature Selection starts with an empty set of features, and adds them one by one
Recursive Feature Elimination starts with a full set of features, and removes them one by one if they add no value.
What is the LASSO method?
LASSO is an embedded method, and it is a regularisation technique used in regression analysis to enhance model performance. It introduces a penalty to the loss function to prevent overfitting and perform feature selection.
How does LASSO work?
Lasso regression works by adding an L1 regularisation term to reduce the number of effective features in the feature space. This penalty encourages sparsity in the coefficient vector, causing some coefficients to shrink to 0, which removes certain features from the model.
What does the regularisation parameter in Lasso control?
The regularisation parameter, also known as gamma, controls the degree of regularisation, balancing model accuracy and feature selection. It is the integral part of LASSO, which is used to identify key features whilst also building a model.
What can happen if you set the regularisation parameter too high/at a higher value in LASSO?
A higher regularisation parameter will make some of the weights of X in the equation become 0, which will reduce the number of dimensions by removing features that are less relevant/important.
What does a Chi-Square do?
A Chi-Square tests the independence of predictor and outcome events.
It is suitable for categorical features in categorical outcomes.
What does a T-test do?
T-test compares the statistical difference of two groups (binary class), and is used for continuous features.
It checks how the mean of two groups are different from one another.