week 2 - CNN's Flashcards
what is a CNN convolutional layer
This layer applies convolutional filters (kernels) to the input data (e.g., an image) to detect local patterns such as edges, textures, or more complex features as the network deepens.
Each filter slides (or “convolves”) over the input image, performing a dot product between the filter and a small local region of the input image. This results in a feature map.
The convolutional operation allows the network to focus on local dependencies (spatial hierarchies), making it more efficient than fully connected networks.
explain local dependencies and spatial hierachys in the convolutional layer
The convolutional operation focuses on local dependencies by looking at small, localized regions of the image and detecting basic features (e.g., edges, textures). This process builds up spatial hierarchies where simple features combine into more complex ones as the network goes deeper. This local, hierarchical approach makes CNNs more efficient than fully connected networks, as they require fewer parameters and computational resources while still being able to learn complex patterns from data.
what is the sparesity aspect of the curse of dimensionality?
as dimensions increase, the number of data points becomes increasingly sparse
this makes it harder for ml agorithms to find meaningful patterns because there are fewer instances of meaningful data points
This means that models need more training data to find patterns
It also means that models are more likely to overfit to the data
how to solve the curse of dimensionality in image classification by convoluting
just take a small portion of the image - e.g 9 pixels. however this doesn’t work if theres no single portion of the image that contains enough information for a classification
to get around this just do it for different portions of the image to get a feature map
then get the features out of the first feature map, and then do that again on a second feature map
this creates an MLP where each succesive layer learns to capture more pieces of information about the original image
because each feature map is smaller than the previous one, each layer condenses information about the original image. however, the smaller layers are also deeper
why, in a CNN, is the classifier able to work with a significantly reduced quantity of information
because the feature maps contain compressed information, however this information is still reflective of the image as a whole
why do CNNs work with images?
because local neighbourhoods in the image contain similar information
if the pixels were scrambled, the accuracy of CNNs would decrease to the same as an MLP. This is because MLP’s don’t take into account the ordering of the pixels
what else can you use CNN’s for?
neural style transfer
how does neural style transfer work
you train an image classifier on natural images to recognise patterns in images
Then you feed into it your image and your style target,
you have a content loss function and a style loss function
for the content, you extract the features at some point in the middle layers, before it reaches the actual classifiers. the content feature extraction ensures that the larger features of the image are present in the generative image. Content loss = mse
for the style, the correlation is computed with a gram matrix (correlation matrix). this tells us the correlations between different features in the style image. This tells us about style/texture, because style is determined by global patterns which are spatially independent. the style loss is then calculated as the difference in the gram matrix between the target image and the style image.
pixels are then adjusted to minimise in balance the content loss and the style loss functions
why use the gram matrix in the style loss function, and not MSE
The Gram matrix is used in the style loss function because it captures the correlations between features (texture patterns) across the image, which is essential for transferring the style of one image to another. It focuses on global structure and spatial relationships between feature activations rather than pixel-wise differences.
MSE, on the other hand, is more suited for tasks where the exact values of features (such as pixel intensities or specific activations) are important, which is why it is more commonly used for content loss (where exact pixel similarity is desired).
in neural transfer, what are the parameters of the model?
the parameters of the model
so updating the parameters is equivalent to changing the pixels
what is an application of neural style transfer in neuroscience?
super-resolution MRI scanners are expensive
hyperfine swoop scanners are much cheaper, however images are worse
you could use deep learning U-net techniques to improve the hyperfine images
what is transposed convolution?
a way to increase the spatial resolution of feature maps/upsamples them
it is a reverse operation of convolution which downsamples.
you can go from compressed representation, to whole size images
if you add in style loss to the MSE loss when upsampling, you can end up with a much higher resolution image than previous