Definitions Flashcards

Question

Training on bottleneck features

Answer 1

Complex models can be built from simpler models, not from scratch. Bottleneck features are extracted and the classifier is trained on them. Bottleneck features are the features that are produced by complex architectures training several million images. The images are done with a forward pass and the pre-final layer features are stored. From these, a simple logistic classifier is trained for classification. This gives a different approach to training the model and is useful when the training data is low. This is often a faster method to train a model. Only the final activations of the pre-trained model are used to adapt to the new task. Example: VGG (exclude top layer) - run the image through it via prediction, these feed the pre-final layer features into a sequential model for training.

Answer 2

A pre-trained model can be loaded and only a few layers can be trained. Use when dataset is smaller. Training a deep network on a small dataset --> overfitting. This can be avoided using fine-tuning. The model trained on a bigger dataset should be similar - hoping the activations features are similar to the smaller dataset. Load VGG, set initial layers to non-trainable. Replace the fully connected layers with new trainable layers. Depending on the data size, the number of layers to fine-tune can be determined. The less data, the lesser the number of layers to fine-tune.

Answer 3

Underfitting happens when the model is too small and can be measured when training accuracy is less. Underfitting can be solved by the following: (1) more data (2) try a bigger model (3) if the data is small, try transfer learning and/or data augmentation

Answer 4

Overfitting happens when the model is too big and there is a large gap between training and testing accuracies. Solution: (1) regularizing with dropout and/or batch norm (2) data augmentation.

Answer 5

Class imbalance can be dealt with by weighting the loss function.

Answer 6

Deep learning can also be called representation learning because the features or representations in the model are learned during training. The visual features generated during the training process in the hidden layers can be used for computing a distance metric. These models learn how to detect edges, patterns etc, depending on the classification task. These can be used to compute similarity between a query image and the set of targets using those features and increase the speed of the retrieval system.

Answer 7

'black box' since DL models are non-linear due to activation functions so cannot be visualised easily. BUT, visualisation can be done using the activation and gradient of the model. The activation can be visualized using: 1. Nearest neighbor - a layer activation of an image can be taken and the nearest images of that activation can be seen together. 2 Dimensionality reduction - the dimension of the activation can be reduced by PCA and t-SNE for visualizing in two/three dimensions. PCA reduces the dimension by projecting the values in the direction of max variance. t-SNE reduces the dimension by mapping the closest points to three dimensions. 3. Maximal patches - one neuron is activated and the corresponding path with maximum activation is captured 4. Occlusion: the images are occluded (obstructed) at various positions and the activation is shown as heat maps to understand what portions of the images are important.

Answer 8

The neuron activations can be amplified at some layer in the network rather than synthesizing the image (as in guided backprop). This concept of amplifying the original image to see the effect of features is called DeepDream. 1. Take an image and pick a layer from CNN 2. Take the activations at a particular layer. 3. Modify the gradient such that the gradient and activations are equal. 4. Compute the gradients of the image and backpropagate. 5. Image has to be jittered and normalized using regularization 6. The pixel values should be clipped 7. Multi-scale processing of the image is done for the effect of fractal.

Answer 9

Any new data can be passed to the model to get the results. This process of getting the classification results or features from an image is termed as inference.

Answer 10

The technique of learning with just one example. In this case, an image can be shown and it can tell whether they are similar. For most of the similarity learning tasks, a pair of positive and negative pairs are required to train.

Answer 11

If the weight initialization of NN is sloppy, these non linearity functions can saturate and stop learning. Training loss will be flat and refuse to go down. For example, if your weight matrix W is initialized too large, the output of the matrix multiple could have a v large range, which is turn will make all the outputs in the vector z almost binary: 1 or 0 (using sigmoid). If this is the case, then, z*(1-z) which is the local gradient of the sigmoid non-linearity, will become 0 (vanish) in both cases, which will make the gradient for both x and W also zero. The rest of the backward pass will come out all zero from this point onward on account of the multiplication in the chain rule.

Answer 12

used for computing the derivative of the composition of two or more functions.

Answer 13

Combines the L1 regularization with the L2 regularization: lambda1|w| + lambda2w2

Answer 14

Another form of regularization is to enforce an absolute upper bound on the magnitude of the weight vector for every neuron and use projected gradient descent to enforce the constraint.

Answer 15

The number of data points needed to fill the available space grows exponentially with the number of dimensions (or plot axes). If a classifier is not fed with data points that span the entire feature space, the classifier will not know what to do once a new data point is presented that lies far away from all the previously encountered data points. In practice, the curse of dimensionality means that for a given sample size, there is a maximum number of features, above which the performance of our classifier will degrade rather than improve.

Definitions Flashcards

(39 cards)