Quiz 2 Flashcards
Receptive fields
Each node only receives input from a ๐พ1ร๐พ2 window (image patch).
The region from which a node receives its input from is called a receptive field.
Shared Weights
Nodes in different locations can share features.
Uses the same weights/parameters in the computation graph.
- Reduce parameters to (๐พ1ร๐พ2+1)
- Explicitly maintain spatial information
Learning Many Features
Weights are not shared across different feature extractors.
Reduce parameters to (๐พ1ร๐พ2+1)โ๐ where M is the number of features to be learned
Convolution
In mathematics a convolution is an operation on two functions f and g producing a third function that is typically viewed as a modified version of one of the original functions, giving the area of overlap between the two functions as a function of the amount that one of the original functions is translated.
T or F: Convolutions are linear operations
True
What are CNN hyperparameters
- in_channels(int): Number of channels in the input image
- out_channels(int): Number of channels produced by the convolution
- Kernel_size(int;tuple): Size of convolving kernel
- stride (int;tuple;optional): denotes the size of the stride used by the convolution (default is 1)
- padding (int;tuple;optional): Zero padding added to both sides of the input (default is 0)
- padding_mode (string):
โzerosโ,โreflectโ,โreplicateโ,โcircularโ (default โzerosโ)
Output size formula for a vanilla convolution operation
Specifically:
(N - F + 2P) / S + 1 = output_dim_1 * output_dim_2 * N_channels
N: Input dimension
F: Filter dimension
P: Padding
S: Stride
1: Bias term
โvalidโ convolution
where the kernel fits inside the image
T or F: Larger the filter the smaller the shrinkage
False. Larger filter = larger shrinkage.
โSameโ convolution
zero-padding the image borders to produce an output the same size as the raw input
CNN: Max pooling
For each window, calculate its max.
Pros: No parameters to learn.
CNN: Stride
Movement of the convolution layer.
CNN: Pooling layer
Make the representations smaller and more manageable through downsampling.
Only pools width and height, not depth
CNN: Cross-correlation
Takes the dot product of a small filter (also called a kernel or weights) and an overlapping region of the input image or feature map.
Doesnโt flip the convolution layer.
CNN: T or F - Using a stride greater than 1 results in loss of information.
True. Stride > 1 implies jumping over some pixels.
CNN: Output size of vanilla/valid convolution vs full convolution
Vanilla: m-k+1
Full: m+k-1
CNN: Benefit of pooling
Makes the representation invariant to small changes in the input
CNN: Full convolution
Enough zeros are added to the borders such that, every pixel is visited k times in each direction. This results in an image of size m+k-1.
Full = Bigger size than original
Sigmoid
Min=0; max=1
Output is always positive
Saturates at both ends
Gradients vanish at each end (converging to 0 or 1 - gradient approaches zero)
Always positive
Computationally complexity high due to exponential term
tanh
min=-1; max=1; and we note that is centred
Saturates at both ends (-1,1)
Gradients: vanish at both ends ; always positive
medium compexity as tanh is not as simple as say multiplication
ReLU
Min=0, Max= โ; always positive
Not saturated on the positive side
gradients: 0 when X <= 0 (aka dead ReLU); constant otherwise (doesnโt vanish which is good)
Cheap: doesnโt come much easier than max function
T or F: ReLU is differentiable
Technically no, but only at zero.
Initialization: What happens if you initialize close to a bad local minima
Poor gradient flow
Initialization: What happens if you initialize with large activations
Reach saturation quickly