Quiz 3 - CNN Architecture, Visualization, Advanced CV Architecture Flashcards
T/F: Visualization makes assessing interpretability easy
False
- Visualization leads to some interpretable representations, bt they may be misleading or uninformative
- Assessing interpretability is difficult
- Requires user studies to show usefulness
- Neural networks learn distributed representation
- no one node represents a particular feature
- makes interpretation difficult
Steps to obtaining Gradient of Activation with respect to input
- Pick a neuron
- Run forward method up to layer we care about
- Find gradient of its activation w.r.t input image
- Can first find highest activated image patches using its corresponding neuron (based on receptive field)
T/F: A single-pixel change can make a NN wrong
True (single-pixel attacks)
Shape vs. Texture Bias
- Ex: take picture of cat and apply texture of elephant
- Humans are biased towards shape (will see cat)
- Neural Networks are biased towards texture (will classify cat as elephant, likely)
Estimation Error
Even with the best weights to minimize training error, doesn’t mean it will generalize to the testing set (ie. overfit or non-generalizable features in training)
Limitations to Transfer Learning
- If source dataset you train on is very different from target dataset
- If you have enough data for the target domain, it just results in faster convergence
____ can be used to detect dataset bias
Gradient-based visualizations
Saliency Maps
- Shows us what we think the neural network may find important in the input
- sensitivity of loss to individual pixel changes
- large sensitivity imples important pixels
What is non-semantic shift for label data?
Two images of the same thing, but different
Ex: Two pictures of bird but different – one a picture one a sketch
T/F: CNNs have scale invariance
True - but only some
low-labeled setting: domain generalization
- Source
- multiple labeled
- target
- unknown
- shift
- non-semantic
T/F: For larger networks, estimation error can increase
True - With a small amount of data and a large amount of parameters, we could overfit
Backward Pass: Deconvnet
- Pass back only the positive gradients
AlexNet - Key aspects
- ReLU instead of sigmoid/tanh
- Specialized normalization layers
- PCA-based data augmentation
- Dropout
- Ensembling
Gram Matrix
- Take a pair of channels in a feature map of n layers
- Get correlation (dot product) between features and then sum it up
- Feed into larger matrix (Gram) to get correlation of all features
- Get Gram matrix loss for style image with respect to generated image
- Get Gram matrix loss for content image with respect to generated image
- Sum up the losses with parameters (alpha, beta) for proportion of total loss contributed by each Gram matrix
Low-labeled setting: Semi-supervised learning
- Source
- single labeled (usually much less)
- target
- single unlabeled
- shift
- none
low-labeled setting: cross-category transfer
- Source
- single labeled
- target
- single unlabeled
- shift
- semantic
T/F: We can generate images from scratch using gradients to obtain an image with maximized score for a given class?
True - Image optimization
Creating alternating layers in a CNN (convolution/non-linear, pooling, and fully connect layers at the end) results in a ________ receptive field .
It results in an increasing receptive field for a particular pixel deep inside the network.
What is the problem for visualization in modern Neural Networks?
Small filters such as 3x3
Small convolution outputs are hard to interpet
Increasing the depth of a NN leads to ___ error (higher/lower)
higher - hard to optimize (but can be mitigated with residual blocks/skip connections)
Since the output of of convolution and pooling layers are ______ we can __________ them
Since the output of of convolution and pooling layers are (multi-channel) images we can sequence them just as any other layer
What is semantic shift for labeled images?
Both objects are image but different things
Most parameters in the ___ layer of a CNN
Fully Connected Layer - input x output dimensionality + bias
Normal backpropagation is not always the best choice for gradient-based visualizations because…?
- You may get parts of image that decrease the feature activation
- likely lots of these input pixels
Grad-CAM
- Feed image through CNN (only convolution part) for last Convolution Feature Map (most abstract features closest to classification on the network).
- Following CNN with any Task-specific network (classification, question/answering)
- Backprop until convolution
- Obtain a feature map the size of the original feature maps
- Obtain per-channel weighting (global average pooling for each channel of gradient) for neuron importance, then normalize
- Multiply feature maps with their weighting
- Feed through ReLU to obtain only positive features
- Final result, values that are important will have higher values
VGG - Key Aspects
- Repeating particular blocks of layers
- 3x3 conv with small strides
- 2x2 max pooling stride 2
- Very large number of parameters
Convolution layers have the property of _____ and output has the property of _______
(choose translation equivariance or invariance for each)
Convolution layers have the property of translation equivariance and output has the property of invariance
Note: Some rotation invariance and scale invariance (only some)
Visualizing Neural Network Methods
- Weights (kernels)
- See what edges are detected in kernels
- Activations
- What does image look like in activation layer
- Gradients
- Assess what is used for the optimization itself
- Robustness
- See what weaknesses/bias are of NN
The gradient of the Convolution layer Kernel is equivalent to the _________
Cross-Correlation between the upstream gradient and input (until K1xK2 output)
Defenses for adversarial attacks
- training with adversarial examples
- perturbations, noies, or re-encoding of inputs
- there are no universal methods to prevent attacks
T/F: Computer vision segmentation algorithms can be applied directly to gradients to get image segments
True
Exploring the space of possible architecture (methods)
- Evolutionary Learning and Reinforcement Learning
- Prune over-parameterized networks
- Learning of repeated blocks is typical
The gradient of the loss with respect to the input image is equivalent to ____
Convolution between the upstream gradint and the kernel
Backward Pass:
Guided Backpropagation
- Zero out gradient for negative values in forward pass
- Zero out negative gradients
- Only propagate positive influence
- Like a combination of backprop and deconvnet
Gradient Ascent
- Compute the gradient of the score for a particular class with respect to the input image
- Add the learning rate times gradient to maximize score (not subtracting)
- Algorithm
- Start from random/zero image
- Compute forward pass
- Compute gradients
- Perform Ascent
- Iterate
- Note: Uses scores to avoid minimizing other class scores
- Need regularization as well
How do we represent similarity in terms of textures?
- Should remove most spatial information
- Key ideas revolved around summary statistics
- Gram Matrix
- feature correlations
We can take the activations of any layer (FC, conv, etc.) and perform _____________
- dimensionality reduction
- often to reduce to two dimensions for plotting
- PCA
- t-SNA (most common)
- non-linear mapping to preserve pair-wise distances
- good for visualizing decision boundaries (esp non-linear)