L12 - Dimensionality Reduction Flashcards
What is meant by dimensionality reduction?
The reduction of feature count within a data set.
What 3 ways does dimensionality reduction improve / enhance the creation and running of ML models?
Saves time, saves money, removed irrelevant data.
What are the 2 methods of dimensionality reduction? Define each…
- Feature extraction -> Extract useful combinations of features from the data.
- Feature selection -> Analyse all features of the data to establish the relevant ones.
What are the 3 methods for feature selection?
- Filter Methods
- Wrapper Methods
- Embedded Methods
Explain the Filter Method…
- Method of feature selection for dimensionality reduction
1 -> Bring features to the same scale through normalisation or standardisation
2 -> Choose some variance threshold
3 -> Calculate the variance of each feature, dropping ones that are below the threshold
Explain the Wrapper Method…
- Method of feature selection for dimensionality reduction
- Can either be conducted via Forward Search or Recursive Feature Elimination
- Both FS and RFE conduct a battle royale process to establish best features.
- Both stop when there are N models each with N best features
Explain how Forward Search works in Wrapper Method of Feature Selection…
- Create N models with 1 feature each
- Find the best feature E.g Feature 3
- Create N-1 models each with previous best feature (F3) + another feature E.g Model1(F3,F1), Model2(F3,F2) etc…
- Repeat until we have a models with N features and can choose the best one
Explain how Recursive Feature Elimination works in Wrapper Method of Feature Selection…
- Reverse of Forward Search
- Start with N-1 models each with N-1 features
- Repeatedly remove the single worst feature from each model
- Result in final best model with M features
Explain the Embedded Method
- Method of feature selection for dimensionality reduction
- Use decision trees to establish the best features
- The use random forests to aggregate the result of the decision trees
What is a Random Forest?
An aggregation of decision trees
What are the 2 types of methods for Feature Extraction?
- Linear
- Non-Linear
What is the main Linear method for feature extraction? Explain it…
- Principal Component Analysis
- Find an orthogonal coordinate transformation such that every new coordinate is maximally effective
- This creates new N variables, named Principal Components
- Principal Components are linear combinations of the original coordinates
- The orthogonal coordinate with the most variation is the most informative
What is the worst case scenario of PCA?
- When all variables are equally important, but are uncorrelated.
- This provides us with no information.
What are the steps of PCA?
- Generate the covariance matrix from the dataset
- Diagonalise the covariance matrix
- Multiply XV
- Take first K principal components with the largest eigenvalues.
- This gives us a K-dimensional representation of the data, having extracted the K most important features.
- The dimensionality reduction comes from the removal of least important principal components
What do the Eigenvalues represent in PCA?
The variance capture by each principal component.