Dimension Reduction Flashcards
Reasons for dimension reduction
- Computational Cost
- Financial Cost
- Interpretability
Explain the Filter strategy for feature selection
Its a pre-processing step that ranks and filters features independently of the choice of classifier. It assigns a score to the different feature subsets using an evaluation function of choice
What is a good strategy for selecting the top features from a Filter
evaluate classifier performance using feature subsets of increasing size
Disadvantages of Filters
- No model bias: doesn’t account for feature suitability across models
- No feature dependencies
Explain the Wrapper strategy for feature selection
The classifier is “wrapped” in the feature selection mechanism. Feature subsets are evaluated directly based on their performance when used with that specific classifier
Advantages of Wrappers
- Accounts for bias
- considers features in context (feature dependencies)
List the types of search used in feature subset search
- exhaustive
- exponential (heuristics)
- sequential (add/remove one f at a time)
Describe the steps of Forward Sequential search
- Start with an empty subset
- find the most informative feature and add it to the subset
- Repeat until there us no improvement by adding features
Describe the steps of Backward Elimination
- Start with the complete set of features
- remove the least informative feature
- repeat until there is no improvement by dropping features
Compare Forward Sequential Search (FSS) to Backward Elimination (BE)
- FSS requires less running time if completed early
- BE tends to find better models, can find subsets with interacting features, but tends to be slower
Disadvantages of Wrappers
- Computational cost
- risk of overfitting
What is the general idea of projection methods
They are used in feature transformation to map the original d-dimensional space to a new (k-d)-dimensional space, with the minimum loss of information
What is Principal Component Analysis (PCA)
an unsupervised projection method which aims to keep as much of the variance in the data as possible
What are principal components in PCA
new dimensions constructed (from eigenvectors) as linear combinations of the original features, which are uncorrelated with one another.
The first PC accounts for the most variability in the data, and so on…
Give context as to what eigenvectors and eigenvalues are
Given a matrix X, an eigenvector of the matrix is a non-zero vector v that satisfies the equation: Xv = λv, where λ is the eigenvalue.