Quiz 3 - CNN Architecture, Visualization, Advanced CV Architecture Flashcards

Question

Normal backpropagation is not always the best choice for gradient-based visualizations because...?

Answer 1

* You may get parts of image that decrease the feature activation * likely lots of these input pixels

Answer 2

1. Feed image through CNN (only convolution part) for last Convolution Feature Map (most abstract features closest to classification on the network). 2. Following CNN with any Task-specific network (classification, question/answering) 3. Backprop until convolution 1. Obtain a feature map the size of the original feature maps 2. Obtain per-channel weighting (global average pooling for each channel of gradient) for neuron importance, then normalize 4. Multiply feature maps with their weighting 5. Feed through ReLU to obtain only positive features 6. Final result, values that are important will have higher values

Answer 3

* Repeating particular blocks of layers * 3x3 conv with small strides * 2x2 max pooling stride 2 * Very large number of parameters

Answer 4

Convolution layers have the property of _translation equivariance_ and output has the property of _invariance_ Note: Some rotation invariance and scale invariance (only some)

Answer 5

* Weights (kernels) * See what edges are detected in kernels * Activations * What does image look like in activation layer * Gradients * Assess what is used for the optimization itself * Robustness * See what weaknesses/bias are of NN

Answer 6

Cross-Correlation between the upstream gradient and input (until K₁xK₂ output)

Answer 7

* training with adversarial examples * perturbations, noies, or re-encoding of inputs * there are *no* universal methods to prevent attacks

Answer 8

* Evolutionary Learning and Reinforcement Learning * Prune over-parameterized networks * Learning of repeated blocks is typical

Answer 9

Convolution between the upstream gradint and the kernel

Answer 10

* Zero out gradient for negative values in forward pass * Zero out negative gradients * Only propagate positive influence * Like a combination of backprop and deconvnet

Answer 11

* Compute the gradient of the score for a particular class with respect to the input image * Add the learning rate times gradient to maximize score (not subtracting) * Algorithm * Start from random/zero image * Compute forward pass * Compute gradients * Perform Ascent * Iterate * Note: Uses scores to avoid minimizing other class scores * Need regularization as well

Answer 12

* Should remove most spatial information * Key ideas revolved around summary statistics * Gram Matrix * feature correlations

Answer 13

* dimensionality reduction * often to reduce to two dimensions for plotting * PCA * t-SNA (most common) * non-linear mapping to preserve pair-wise distances * good for visualizing decision boundaries (esp non-linear)

Answer 14

Region where generalization error (log-scale) decreases linearly with sufficient data

Answer 15

Given a NN architecture, actual model that represents the real world may not be in that space. There may be no set of weights that model the real world. Ie. a simple architecture or function may not be able to model complex reality (potentially low capacity)

Answer 16

Transfer Learning - 1. Train on large-scale dataset and optimize parameters 2. Take custom data set and initialize the network with weights trained before (step 1) 3. Replace last layer with new fully-connected layer for output nodes per category 4. Continue to train on new dataset (finetune - update parameters, freeze feature layer - update only last layer weights if not enough data)

Answer 17

* Source * single labeled * target * single few-labeled * shift * semantic

Answer 18

convolution layers - large output

Answer 19

Allow information from a layer to propagate to any future layer (with identity (ie no transform) ) can help with better gradient flow

Answer 20

* Source * single labeled * target * single unlabeled * shift * non-semantic

Answer 21

False * In practice, saliency maps find gradient of the classifier *scores* (pre-softmax) * softmax and then loss function adds some complexity (weird effects in terms of the gradient)

Answer 22

* Match features at different layers * Use a loss for this * optimize image by minimizing the difference between the images (content and generated images) * Multiple losesses * Backward edges going to same node are summed * Loss is sum of the difference across the identified layers

Answer 23

Optimization algorithm may not be able to find the weights that 100% model the world

Answer 24

False - The 'Irreducible Error Region' has not been reached

Answer 25

Neighborhood around it (where part of the kernel touches it)

Answer 26

* Fully Connect Layers * Reshape weights for a node back into size of image, then scale to 0-255 * Convolution Layers * For each kernel, scale values from 0-255 and observe: * oriented edges * color * texture

Answer 27

Defines what set of input pixels in the original image affect the value of a particular node deep in the neural network.

Answer 28

Everywhere! The pixels in the kernel stride across the entire input image

Answer 29

* Source * single labeled * target * many labeled * shift * both/task

Answer 30

increase - dynamics of optomization could get more difficult with deeper network

Answer 31

Horizontal split architecture - couldn't fit into one GPU conv -\> max pool -\> norm (x2) conv x 3 -\> max pool fully connected x3

Answer 32

False - They have some

Answer 33

Gradient Ascent - optimization of an image to increase score for a particular class

Answer 34

Surprisingly effective Features learned for 1000 object categories will work well for the 1001st! Generalizes even across tasks (classification to object detection)

Answer 35

likely increase in size.

Answer 36

Large-scale data benchmarking

Answer 37

* Repeated blocks composed of simple layers * parallel filters of different sizes * 1x1 convolution, 3x3 convolution, 5x5 convolution, 3x3 max pooling -\> filter concatenation * increases computational complexity (4 times)

Answer 38

False - Gradient ascent perturbations can make model confidently wrong (adversarial noise)

Answer 39

* Find gradient of classifier scores (pre soft-max), instead of loss * take absolute value of gradients * sum across channels * We don't care specifically about RBG specifics

Answer 40

* Visualization of activation/filter * Larger early in the network * Looking at activations across the input * which images have the highest activation?

Answer 41

* Each kernel has size of entire input * Equivalent to Wx+b * output is one scalar * One kernel per output node

Answer 42

Probability distribution over classes for each pixel.

Answer 43

Convolutions work on arbitrary input sizes (because of striding)

Answer 44

In max-unpooling, contributions from multiple windows are summed.

Answer 45

Take each input pixel, multiply by learnable kernel, "stamp" it on output

Answer 46

Begin with a pre-trained trunk/backbone (e.g. network pretrained on ImageNet)

Answer 47

skip connections

Answer 48

Given an image, output a list of bounding boxes with probability distribution over classes per box

Answer 49

Variable number of boxes Need to determine candidate regions (position and scale) first

Answer 50

* multi-headed * classification * predicting distribution over class labels * regression * predicting bounding box for each image region * both heads share features * jointly optimized (summing gradients)

Answer 51

Combining redundant boxes to find bounding box for object in image

Answer 52

* uses grid idea as anchors * different scales * different aspect ratios * tricks used to increase resolution (decrease subsampling ratio)

Answer 53

Single-scale faster for same size than SSD

Answer 54

large-scle object detection, segmentation, and captioning dataset

Answer 55

1. For each bounding box, calculate intersection over union (IoU) * extract intersection over union with closest ground truth 2. Keep only those with IoI \> threshold 3. Calculate Precision/Recall curve across classification probability threshold 4. Calculate average precision (AP) over recall of [0, 0.1, 0.2, ..., 1.0] 5. Average over all categories to get mean Average Precision (mAP)

Answer 56

* Find regions of interests (ROIs) with object-like things * Classify those regions (refine their bounding boxes)

Answer 57

* unsupervised (non-learned) algorithms * downsides * 1+ second per image * returns thousands of mostly backgrund images * resize each candidate to full input size and classify

Answer 58

* Takes 1+ second per image * return thousands of (mostly background) boxes

Answer 59

Computations for convolutions are re-done for each image patch, even if overlapping

Answer 60

* Reuse computation by finding regions in **feature maps** * ****feature extraction once per image

Answer 61

* Variable input size to FC layers due to different feature map sizes

Answer 62

* ROI Pooling * Given an arbitraryily-sized feature map, we can use pooling across a grid (ROI Pooling Layer) to convert to fixed-sized representation

Answer 63

* Use Neural Networks for the region proposal * Region Proposal Network (RPN) * output: objectness score * top k selected for classification * complexity in implementation due to some non differentiable parts (gradient with respect to bounding box coordinates)

Answer 64

* Neural Network model to find regions of objects * Uses anchors in a grid * *k* anchor boxes * various sizes and shapes * hyperparameters * *2k* scores * object or not-object like * *4k* coordinates

Answer 65

Two-stage object detection methods are slower but more accurate

Quiz 3 - CNN Architecture, Visualization, Advanced CV Architecture Flashcards

(97 cards)