CNN Flashcards
IMAGENET dataset
Dataset about daily use materials and animals
2272273
* 1,281,167 training images,
* 50,000 validation images, and
* 100,000 test images
* 1000 classes
CNN
Special kind of neural network for processing data that has a grid-like topology, like time series data (1D) or image data (2D).
CNNs consists of: -
1. Convolution layer
2. Pooling layer
3. Fully connected layer (ANN)
Why not use ANN on image data?
- Computational complexity (Large no. of pixels)
- Overfitting
- Loss of spatial arrangement (Since, 2D image is converted to 1D layer)
How does CNN work on image data?
Initial convolutional layer extracts primitive features (edges).
Going further in network, more complex features are extracted.
Greyscale image
B/W
Single channel
Values between [0-255]
Colored Image
RGB
Three channel
What is convolution?
Convolution is element-wise matrix multiplication, where kernel (filter) is multiplied with the input pixels to get the feature map.
The process of detecting features in an image is called convolution.
How is the value of filter (kernel) is decided?
Initialized with random value.
Decided during backpropagation
What is filter?
A matrix of weights that slides over the input pixels, perform element wise multiplication to give a single output pixel.
What is padding?
Contribution of edge pixel is less to form the output than the central pixels. In order to make them equal, we use padding.
What is stride?
Stride decide how our weight matrix should move in the input, i.e. jumping one step or two.
Valid padding
No padding
Same padding
Automatic padding so that size of input image is same as feature map
Formula to find the output after convolution
[n + 2p - f ]/s + 1
Why are strides required?
- Extract only high level features
- Limit feature; helps reduce complexity
Why is pooling required?
This is because convolution has:
- Memory issue
- Translation variance
Though, increasing the stride will address the memory issue but translation variance problem will not be solved by stride.
Pooling down sample the feature map.
Translation Invariance
The ability to ignore positional shifts, or translations, of the target in the image.
Type of pooling
- Max pooling
- Avg pooling
- Global pooling (Global max & Global avg)
Advantage of pooling
- Reduced image size (due to down sampling)
- Translation invariance
- Enhanced feature (Only in max pooling)
Disadvantage of pooling
- Not suggested for Image segmentation tasks
- Loss of information
ANN vs CNN
Similarity: -
1. Input*weights + bias; both works in the same way. In case of CNN, weights means filters’ weights.
Differences: -
1. No. of learnable parameter do not depend on input in CNN
How to reduce overfitting in CNN model?
- Add more data
- Data Augmentation
- L1/L2 Regularization
- Batch Normalization
- Dropout
Why do we need data augmentation?
- To generate more data
- To reduce overfitting (Increase generalization of image)
It includes Image rotation, scaling, flip, zoom,
Why do we need pretrained models?
- Absence of labeled data
- CNN is computationally expensive to train