CNN Flashcards by Harsh Raj

IMAGENET dataset

Dataset about daily use materials and animals
2272273
* 1,281,167 training images,
* 50,000 validation images, and
* 100,000 test images
* 1000 classes

How well did you know this?

Not at all

Perfectly

CNN

Special kind of neural network for processing data that has a grid-like topology, like time series data (1D) or image data (2D).

CNNs consists of: -
1. Convolution layer
2. Pooling layer
3. Fully connected layer (ANN)

How well did you know this?

Not at all

Perfectly

Why not use ANN on image data?

Computational complexity (Large no. of pixels)
Overfitting
Loss of spatial arrangement (Since, 2D image is converted to 1D layer)

How well did you know this?

Not at all

Perfectly

How does CNN work on image data?

Initial convolutional layer extracts primitive features (edges).
Going further in network, more complex features are extracted.

How well did you know this?

Not at all

Perfectly

Greyscale image

B/W
Single channel
Values between [0-255]

How well did you know this?

Not at all

Perfectly

Colored Image

RGB
Three channel

How well did you know this?

Not at all

Perfectly

What is convolution?

Convolution is element-wise matrix multiplication, where kernel (filter) is multiplied with the input pixels to get the feature map.

The process of detecting features in an image is called convolution.

How well did you know this?

Not at all

Perfectly

How is the value of filter (kernel) is decided?

Initialized with random value.
Decided during backpropagation

How well did you know this?

Not at all

Perfectly

What is filter?

A matrix of weights that slides over the input pixels, perform element wise multiplication to give a single output pixel.

How well did you know this?

Not at all

Perfectly

What is padding?

Contribution of edge pixel is less to form the output than the central pixels. In order to make them equal, we use padding.

How well did you know this?

Not at all

Perfectly

What is stride?

Stride decide how our weight matrix should move in the input, i.e. jumping one step or two.

How well did you know this?

Not at all

Perfectly

Valid padding

No padding

How well did you know this?

Not at all

Perfectly

Same padding

Automatic padding so that size of input image is same as feature map

How well did you know this?

Not at all

Perfectly

Formula to find the output after convolution

[n + 2p - f ]/s + 1

How well did you know this?

Not at all

Perfectly

Why are strides required?

Extract only high level features
Limit feature; helps reduce complexity

How well did you know this?

Not at all

Perfectly

Why is pooling required?

This is because convolution has:

Memory issue
Translation variance

Though, increasing the stride will address the memory issue but translation variance problem will not be solved by stride.

Pooling down sample the feature map.

Translation Invariance

The ability to ignore positional shifts, or translations, of the target in the image.

Type of pooling

Max pooling
Avg pooling
Global pooling (Global max & Global avg)

Advantage of pooling

Reduced image size (due to down sampling)
Translation invariance
Enhanced feature (Only in max pooling)

Disadvantage of pooling

Not suggested for Image segmentation tasks
Loss of information

ANN vs CNN

Similarity: -
1. Input*weights + bias; both works in the same way. In case of CNN, weights means filters’ weights.

Differences: -
1. No. of learnable parameter do not depend on input in CNN

How to reduce overfitting in CNN model?

Add more data
Data Augmentation
L1/L2 Regularization
Batch Normalization
Dropout

Why do we need data augmentation?

To generate more data
To reduce overfitting (Increase generalization of image)

It includes Image rotation, scaling, flip, zoom,

Why do we need pretrained models?

Absence of labeled data
CNN is computationally expensive to train

What is pooling

Also called down sampling, as it carries out dimensionality reduction. The feature parameters in the input are reduced to only the necessary parameters to reduce complexity and improve the network’s performance. It also helps avoid the problem of overfitting. Pooling operation involves sliding a two-dimensional filter over each channel of feature map and summarizing the features lying within the region covered by the filter.

Feature map

Output of the convolutional layer, a numerical representation of the image, which is used to identify patterns from the image.

ResNet

ResNet stands for residual network, which refers to the residual blocks that make up the architecture of the network.

Residual connections

Skip connections, also known as residual connections are implemented by adding the output of an earlier layer to the output of a later layer. They allow for the preservation of information from earlier layers, which helps the network to learn better representations of the input data and solve the problem of vanishing gradients.

Key Features of ResNet-50

* ILSVRC’15 classification winner (3.57% top 5 error ResNet-152) * Has other variants also (with 35, 101, 152 layers) * Every ‘residual block‘ has two 3×3 convolution layers * No FC layer, except one last 1000 FC softmax layer for classification * Global average pooling layer after the last convolution * Batch Normalization after every convolution layer * SGD + momentum (0.9) * No dropout used