Unsupervised Learning Flashcards
unsupervised learning
learning algorithm works out the patterns data contains for the purpose of compact representation (dimensionality reduction) and categorisation & analysis (clustering)
dimensionality reduction
transform data into a latent space representation of that data
latent space
reduced dimensional space in which essential features of input data are encoded
latent space in autoencoder
compressed representation of input data
encoding
process of transforming/compressing data into latent representation
decoding
process of recovering/reconstructing data from latent space compressed
auto encoder
MLP (fully connected network) whose first hidden layer constitutes an encoder with the output of layer being the latent representation then the next layer after latent space is decoder
how does dimensionality reduction happen with autoencoders
encoding input data into a lower-dimensional space by setting #neurons to be smaller than # of neurons
how is network trained with autoencoder
network is trained end to end to minimise MSE (difference between input and reconstructed output)
usages of autoencoders
compressing file into a zip
noise reduction
auto encoder architecture picture
input and output number same
hourglas shape
more hidden layers allows you to find more features
reduced neuron count in encoder to find compressed latent data (important patterns) then expand neuron count to extract
PCA stands for
principal component analysis
what is PCA and what is it used for
used for dimensionality reduction while preserving as much variance as possible in the data
It transforms the original dataset into a new coordinate system, where the axes (called principal components) are ordered by the amount of info it conveys about data
how to do PCA to compute components
given Nx D matrix
compute mean value for each attribute
compute covariance matrix C
get sorted eigen values
first K rows of E gives K most important components of X
applications of PCA
data visualisation (visualise high-dimensional data in 2D or 3D)
noise reduction (retains only the most significant principal components)
feature reduction
AE vs PCA
AE
- can handle non-linear data
- more flexible but longer to train
PCA
- only handle linear projections
latent variable
variable that is not directly observed but is inferred from other variables that are observed
like intelligence which is inferred from say responses on tests
K means algorithm
unsupervised learning algorithm designed to partition data into K distinct clusters
minimises within cluster variance aiming to group similar data points
4 step of K-Means algorithm
1) initialise number of clusters K and randomly select K initial clusters
2) assignment
each data point assigned to nearest cluster group by calculating euclidean distance
3) algorithm updates cluster centres as means of each cluster group
4) repeat 2 and 3 until assignment to clusters doesn’t change
K means alg is prone
prone to finding local minima
how do you find correct # of clusters
run and use heuristic to judge quality of clustering for different choices of K
have __ sets of data to mitigate under/over fitting data
training data to train the model
validation data to tune model parameters
test data to test the performance of the model
instead of a single validation set you could have
repeated cross validation
The dataset is divided into
k subsets (or folds).
The model is trained on
k-1 and validated on the last one
generates approx set of how well classifier will do on unseen data
evaluation metrics for both classification and regression
classification: accuracy, ROC curve (illustrates the trade-off between the true positive rate (sensitivity) and the false positive rate )
regression: MSE and MAE