Representation Learning Flashcards
What is representation learning? Why is it important?
- learning representations of input data
- generally transforms it
- makes it easier to perform a task
- the performance of any machine learning model is critically dependent on the representations it uses or learns
What is Principal Component Analysis (PCA)? What does it do?
- method that aims at re-expressing a given dataset using a linear transformation
- center the data by subtracting off the mean of each measurement
- compute the eigenvectorsof XX^T, i.e., EDE^T and posing P = E
- select the k «_space;m most important components from P according to D
What transformations give PCA a problem?
- searching for a linear transformation, contamination of data given by:
- noise (errors, interferences on the data that deviates from the norm)
- redundacy (multiple variables that can be reduced intoa single one)
- we aim at finding a trasformation that minimizes both
How can we estimate noise and redundacy in data?
- both can be estimated using measures related to the variance of the data
- signal-to-noise-ratio to measure noise
- the covariance matrix can measure dedundacy between features
How does the covariance matrix work?
- the dataset must be centered (mean 0)
- the covariance matrix is computed as
- S_X = 1/n(X*X^T)
How does PCA reduce features covariance?
- we are aiming at diagonalizing the covariance matrix
- find some orthonormal matrix P where X’ = PX so that the covariance matrix of X’ is diagonal
How does PCA perform dimensionality reduction?
- the most important components are the first ones
- data presents the most variability
- ignoring the less important dimensions can help in simplyfing the data
- less important
What is kernel PCA?
- extension of conventional PCA to deal with non-linear correlations using the
kernel trick
1- data not used directly but mapped implicitly to some nonlinear feature space
2- center data in feature space
3- apply PCA in the feature space
4- obtained non-linear transformations in the original data space
What can PCA be used for?
- apply lossy compression
- visualize multi-dimensional data in 2D or 3D
- reduce the number of dimensions to discard noisy features
- perform a change of representation in order to make analysis of the
data at hand
What is an autoencoder?
- unsupervised learning technique based on feed forward neural networks
- learns a representation for a set of data
- generally for dimensionality reduction
- can be used for learning generative models of data
How is the autoencoder composed? What do the components do?
- encoder
- creates a new representation (the code) of the input
- decoder
- reconstructs the input starting from the code
- bottleneck layer
- compress the data and make the task harder
- can have more hidden layers (simple vs deep)
- non-linear activation functions
What loss is generally used in an autoencoder? What is the learning procedure?
- the standard loss function is a mean squared error loss
- called reconstruction loss
- backpropagation and SGD generally used for training
- no specialized algorithm
What types of autoencoders are there?
- regularized autoencoders
- loss function that encourages sparsity in the representation and robustness to noise
- sparse autoencoders
- sparsity in the activation of hidden units -> L1/L2 regularization
- denoising autoencoders
- remove noise and corruption
What are CNN? How do they work?
- Convolutional Neural Networks
- specialized kind of NN for processing data that has a known grid-like topology (such as images)
- employes convolution, mathematical operation
- earns different levels of abstraction of the input
- fist hl detects general patterns (edges)
- deeper hl more specific abstractions (textures, patterns)
What is a typical CNN architecture?
- Several CL + pooling (subsampling) followed by a fully connected network with ReLu