Deep Learning Flashcards
data for convolutional networks
grid-like topology (1D time series and 2D images)
distinguishing feature of convolutional networks
CNNs use convolution (and not matrix multiplication) in some layer
convolution function
integral of the product two functions (after one is reversed and shifted)
(f * g)(t) = ∫ f(a)g(t-a) da
think of f as a measurement and g as a weighting function that values the most recent measuremnts
parts of convolution
main function: input, n-dimension array of data
weighting function: kernel, n-dimension array of parameters to adjust
output: feature map
computational features of convolutional networks
- sparse interactions - kernal usually much smaller than input
- tied weights - same set of weights applied throughout the input
- equivariant to translation - convolution will give same result if input is translated. An event detector on a time series will find same event if it’s moved.
stacked convolutional layers
receptive fields of deep units is larger (but also indirect) compared receptive field of shallower units
if layer 2 has a kernel width of 3, then each hidden unit receives input from 3 units.
if layer 3 also has a kernal width of 3, then these hidden units here receive indirect input from 9 inputs
stages of a convolutional layer
- Convolution stage: convolution to get linear activation function
- Detector stage: Nonlinear function on linear activations
- Pooling stage: Replace output at some location with a summary statistic of nearby units
pooling and translation
small changes in location won’t make big changes to the summary statistics in the regions that are pooled together
pooling makes network invariant to small translations
what convolution hard codes
the concept of a topology
(non convolutional models would have to discover the topology during learning)
local connection (as opposed to convolution)
like a convolution with a kernel width (patch size) of n, except with no parameter sharing.
each unit has a receptive field of n, but the incoming weights don’t have be the same in every receptive field.
iterated pixel labelling
suppose convolution step provides a label for a pixel. repeatedly applying the convolution on the labels creates a recurrent convolutional network.
repeated convolutional layers with shared weights across layers is a kind of recurrent network.
why convolutional networks can handle different input sizes
each convolution step scales the input. if you repeat the convolution an appropriate number of times, you can normalize the size.
convolutions for 2D audio
convolutions over time: invariant to shifts in time
convolutions over frequency: invariant to changes in frequency.
primary visual cortex
- V1 has a 2D structure matching the 2D structure of retinal image
- Simple cells inspired detectors in CNNs and respond to features in small localized receptive fields
- Complex cells inspired pooling units. They also respond to features but are invariant to small changes in input position.
- Inferotemporal cortex responds like last layer in CNN
differences between human vision and convolutional networks
- Human vision is low resolution outside of fovea. CNNs have full-resolution over whole image
- Vision integrates with other senses
- Top down processing happens in human system
- Human neurons likely have different activation and pooling functions
regularization
- Modifications to training regime to prevent overfitting
- Increasing training error for reduced testing error
- Trading increased bias for reduced variance
dataset augmentation strategies
- Adding transformations to training input (e.g., translating images a few pixels)
- Adding random noise to input data
- Model needs to find regions insensitive to small perturbations
- Not just a local minima but a local plateau
- Adversarial training
- Create inputs that the network will probably misclassify