09_convolutional neural networks and computer vision Flashcards
Why are some layers called “hidden layers”?
because it’s difficult to tell what happens in them exactly
What happens in different stages in the layers of a fully connected neural network?
input layer
1) feature extraction
2) classification
output layer
–> end-to-end learning (no feature engineering needed)
What happens if we linearize/vectorize an input?
the network is unable to leverage spatial information from a linearized image
What are convolutions?
a mathematical operation that operate on dense data (such as images)
we consider an image I and a “Kernel” K. the “convolution” of I and K will result
in an output feature map S
S is constrained by the size of the inputs and the kernel
–> convolutional kernels acts as feature filters and are sometimes called “filters” for this exact reason
(provide high output signals where the projections of the kernel resembles the input pattern)
What is the sobel filter?
is constructed to extract edges
(only cares about the light changes that indicate edges)
On what parameters are convolutional layers based on?
- number of input channels
- number of output channels
(affect output feature map size)
- kernel size
- stride
- padding
What is a channel?
a two-dimensional feature map
What is multi-channel data?
stacks of feature maps
(eg with different colors)
we use one kernel per channel (don’t have to be identical)
What does the kernel size define?
the size of the kernel operator - affects the size of the output of the convolution
How can we calculate the output feature map in regards to the kernel size?
O = I - K + 1
O = output feature map size
K = kernel size
I = input feature map size
What is the receptive field of a kernel?
depends on their size and the features of different sizes they are therefore able to see
–> large kernels are able to identify larger and more complex features such as an eye: they have more CONTEXT
What is a stride in CNNs?
step size of a kernel
(default is 1)
If we choose a stride >1, it reduces the size of the output feature map
–> O = (I - K) / S + 1
S= stride
Why do we need padding and what is it?
convolutions shrink feature maps, padding counter-acts this
–> padding input map with rows and columns of zeros
O = (I - K + 2P) / S + 1
What is a huge advantage of convolutional layers regarding parameters?
because the kernel parameters are learned by the model itself and shared over the entire input feature map,
the number of parameters of the network is vastly reduced
How many parameters does a convolutional layer have?
K^2 * C(in) * C(out)
the number of parameters is therefore independent of the input feature map size as opposed to linear layers