What is Dimensionality Reduction Flashcards
WHAT IS DIMENSIONALITY? P355
The number of input variables or features for a dataset is referred to as its dimensionality.
WHAT IS THE “CURSE OF DIMENSIONALITY”? P355
More input features often make a predictive modeling task more challenging to model, more generally referred to as the curse of dimensionality.
EXTERNAL Q: WHAT IS DEGREE OF FREEDOM IN ML?
In machine learning, degrees of freedom is the number of parameters of a model.
Parameters in machine learning and deep learning are the values your learning algorithm can change independently as it learns and these values are affected by the choice of hyperparameters you provide.
AT WHICH STAGE OF THE PROJECT WE DO DIMENSIONALITY REDUCTION? P356
Dimensionality reduction is a data preparation technique performed on data prior to modeling. It might be performed after data cleaning and data scaling and before training a predictive model.
WHAT ARE THE MAIN TECHNIQUES FOR DIMENSIONALITY REDUCTION? P357
Feature Selection Methods
Matrix Factorization: Most common is PCA
Manifold Learning: Often for the purposes of data visualization
Autoencoder Methods
WHAT ARE SOME EXAMPLES OF MANIFOLD LEARNING TECHNIQUE FOR DIMENSIONALITY REDUCTION? P358
ˆ Kohonen Self-Organizing Map (SOM).
ˆ Sammons Mapping
ˆ Multidimensional Scaling (MDS)
ˆ t-distributed Stochastic Neighbor Embedding (t-SNE).
WHAT ARE AUTOENCODERS? P358
An auto-encoder is a kind of unsupervised neural network that is used for dimensionality reduction and feature discovery. More precisely, an auto-encoder is a feedforward neural network that is trained to predict the input itself.
WHAT ARE ENCODERS AND DECODERS IN AUTOENCODERS? P358
In auto encoders, a network model is used that seeks to compress the data flow to a bottleneck layer with far fewer dimensions than the original input data. The part of the model prior to and including the bottleneck is referred to as the encoder, and the part of the model that reads the bottleneck output and reconstructs the input is called the decoder.
WHAT HAPPENS AFTER TRAINING AN AUTO-ENCODER?
The decoder is discarded and the output from the bottleneck is used directly as the reduced dimensionality of the input.
WHAT IS PROJECTION? P358
In mathematics, a projection is a kind of function or mapping that transforms data in some way.
DEEP AUTO-ENCODERS ARE AN EFFECTIVE FRAMEWORK FOR ____ DIMENSIONALITY REDUCTION. P358
Non-linear
WHEN USING DEEP AUTO-ENCODERS, WHICH LAYER DO WE USE AS THE REDUCED INPUT FOR THE PROBLEM? P358
The top-most layer of the encoder
WHY IS IT CHALLENGING TO INTERPRET OUTPUT OF THE BOTTLENECK? P358
The output of the encoder is a type of projection and like other projection methods, there is no direct relationship from the bottleneck output back to the original input variables, making them challenging to interpret.
WHICH METHODS OF DIMENSIONALITY REDUCTION ASSUME SAME SCALE OR DISTRIBUTION FOR ALL INPUT FEATURES, WHAT SHOULD WE DO PRIOR TO USING THEM? P359
Linear algebra and manifold learning (an approach to non-linear dimensionality reduction) methods; it is good practice to either normalize or standardize data prior to using these methods.