5/6 - Unsupervised Learning in the era of generative AI Flashcards
Unsupervised learning
Learning from data without labels
Unsupervised v supervised
- Unsupervised: cheap; no definition of error; can discover new things
- Supervised: expensive labelling; require def of error; can do as well as the labels
Clustering
Detecting that data points can be grouped into distinct clusters.
What metric is required for clustering?
Compute distances
For example euclidean distance or manhatten distance
Euclidean is length of the line between them
Manhatten is the distance if you had to go horizontal then vertical for example.
Hierarchical Clustering
- Each point is a cluster
- Compute all distances
- Find shortest distance among any two points and merge clusters.
- Recompute distance matrix with all points and new cluster
- repeat from 3.
- Until you have 1 then looking back you have a tree
Clustering Means
K means
Hierarchical
K means clustering
- Start with k known centroids manually chosen
- “place” each datapoint
- Compute the new barycentre of each cluster (new centroids)
- repeat point 2
- stop when centroids no longer move
principal component analysis/dimensionality reduction
Find the direction of maximum variation of the data and this we can reduce the dimensions (eg using an auto encoder) .
Finds the new axis and “rotates” the data to use this line as the x axis.
Auto encoders
Neural Net that learns to compress/effectively represent data without labels
This is a neural network with input and output of the same dimensions and a bottleneck in the middle.
x1 x1*
\ /
.
/ \
x2 x2*
. is a latent value (reduced dimensions/compressed layer)
How does an auto encoder learn weights?
Backprop: Loss computed as the difference between input and reconstructed output eg L2=1/2(x*-x)^2
FOrward:
- Feed first datapoint to input,
- compute loss as above,
- then propagate gradients w/r to the weights.
-update weights by multiplying them with negative of the gradients (times a learning rate)
Can you perfectly capture data again after reducing the dimensionality to 1then back to 2?
No, there will be some variation .
You can produce new data points from the original that is not exactly the same.
Generative Adversarial Networks
Generator/Discriminator
Input random noise
network generates fake image
then discriminator network is given real and fake images.
You train generator to create images that produce a real response from the discriminator.
GANs: Discriminator outputs
Discriminator outputs likelihood in range 0->1 that image is either fake (0) or real (1).
Generator wants to produce images that are given a 1.
Interpolating generator images
You can transition the input vector and sum or subtract values to produce all sorts of interpolations.
Conditional GANs
Take additional input to specify which class of objects you want to generate.
With random vector, introduce condition.
pix2pix: Adversarial Loss
Ensures the generated images are indistinguishable from real to the discriminator
pix2pix: L1/L2 Loss
Ensures images are structureally similar to target images
Penalises pixel-wise differences between generated and real images.
CycleGAN
Two generators, two discriminators.
Ga->b translateas domain a to b
Gb->a translates domain b to a
Da differentiates between domain a and translated images.
Db in reverse
Cycle:
A loss penalises when an image converted and converted back does not equal the original image.
Hierarchical Clustering: single linkage
the min dist between any two points from the two clusters
Hierarchical clustering: Complete Linkage
is the max distance between any two points from the two clusters
Hierarchical Clustering: Average Linkage
Average distance between all two points, one from each cluster.