Algorithms for semantic segmentation Flashcards

Question 1

Q

The Fully Convolutional Network

Answer

A

FCN introduced the idea of an end-to-end convolutional network. Any standard CNN can be used for FCN by removing the fully connected layers and replacing it with a convolution layer. The depth is higher in the final layers and the size is smaller. For segmentation, the spatial dimension has to be preserved - construct without a max pooling. Loss is computed by averaging the cross-entropy loss of every pixel and mini-batch. The final layer has a depth equal to the number of classes. The output produced by the architecture will be coarse as some pixels may be mispredicted. Computation is high.

Question 2

Q

SegNet

Answer

A

The SegNet has an encoder and decoder approach. The encoder has various convolution layers and decoder has deconvolution layers. SegNet improved the coarse outputs produced by FCN, therefore less intensive on memory. When the features are reduced in dimensions, it is upsampled again to the image size by deconvolution, reversing the convolution effects. Deconvolution learns the parameters for upsampling. The output will be coarse due to the loss of info in pooling layers.

Question 3

Q

Upsampling the layers by pooling

Answer

A

Max pooling is a sampling strategy that picks the maximum value from a window. This could be reversed for upsampling. Each value can be surrounded with zeros to upsample the layer. The zeros are added at the same locations which are the numbers that are upsampled. Up-pooling can be improved by remembering the locations of downsampling and using it for upsampling. Index-wise upsampling yields better results than appending zeros. This upsampling the layers by pooling is not learned and works as it is.

Question 4

Q

Sampling the layers by convolution

Answer

A

The layers can be upsampled or downsampled directly using convolution. The stride used for convolution can be increased to cause downsampling. Downsampling by convolution is called atrous convolution or dilated convolution or strided convolution. Similarly, it can be reversed to upsample by learning a kernel. Upsampling directly using a convolution can be termed as transposed convolution/deconvolution/fractionally strided convolution/up-convolution.

Question 5

Q

Skipping connections for better training

Answer

A

The coarseness of segmentation output can be limited by skip architecture, and higher resolutions can be obtained.

Question 6

Q

Dilated convolutions

Answer

A

The pixel-wise classification and image classification are structurally different. Hence, pooling layers that decrease information will produce coarse segmentation. But, pooling is essential for having a wider view and allows sampling. A new idea called dilated convolution was introduced to solve this problem for less-lossy sampling while having a wider view. The dilated convolution is essentially convolution by skipping every pixel in the window. The dilation distance varies from layer to layer. The output of such a segmentation result is upscaled for a finer resolution. A separate network is trained for multi-scale aggregation.

Question 7

Q

DeepLab

Answer

A

DeepLab proposed by Chen et al. performs convolutions on multiple scales and uses the features from various scales to obtain a score map. The score map is then interpolated and passed through a conditional random field (CRF) for final segmentation. This scale processing of images can either be performed by processing images of various sizes with its own CNN or parallel convolutions with varying level of dilated convolutions.

DeepLab v3 - BN is used to improve performance. The multi-scale of the feature is encoded in a cascaded fashion.

Question 8

Q

RefiNet

Answer

A

Dilated convolutions need bigger input and hence are memory intensive. This presents computational problems when using high-res pictures. RefiNet overcomes this problem as it uses an encoder followed by a decoder. Encoder outputs of CNN. The decoder concatenates the features of various sizes. The concatenation is done upscaling the low dimensional feature.

Question 9

Q

PSPnet

Answer

A

Global content is utilized in PSPnet introduced by Zhoa et al. by increasing the kernel size of pooling layers. The pooling is carried in a pyramid fashion. The pyramid covers various portions and sizes of the images simultaneously. There is a loss in-between the architecture which enables moderate supervision.

Question 10

Q

Large kernel matters

Answer

A

Peng et al. showcased the importance of large kernels. Large kernels have bigger receptive fields than small kernels. The computational complexity of these large kernels can be used to overcome with an approximate smaller kernel. There is a boundary refinement network at the end.

Question 11

Q

UNET

Answer

A

UNET model proposed by Ronneberger et al. resembles an autoencoder but with convolutions instead of a fully connected layer. There is an encoding part with the convolution of decreasing dimensions and a decoder part with increasing dimensions. The convolutions of the similar sized encoder and decoder part are learning by skip connections. The output of the model is a mask that ranges between 0 and 1.

Question 12

Q

Contour detection

Answer

A

One of the easiest techniques of segmentation. Contours are boundaries of objects in an image. Contours always form closed loops. A contour detection algorithm will try to group edges together that will result in a closed loop.

Question 13

Q

The Watershed algorithm

Answer

A

Find out the local gradients of the image and identify all the local minimums in our images. These local minimums give us an approximate idea of where the objects could possibly be located, we call local minimums markers. Assign each marker with a unique colour and then start filling these colours until we reach the boundary of an adjacent marker. We essentially fill out the regions in the image with a unique colour.

Algorithm:

Read the image
Convert to grayscale
Convert the image pixel values to unsigned int using the img_as_ubyte() function, because the gradient function expects the image in a certain format.
Calculate the local gradients of the image.
Apply the Watershed algorithm.

Question 14

Q

Superpixels

Answer

A

Images are always dealt with by the granularity of a pixel. But this can be computationally expensive. You do not always want to iterate through all the pixels in the image. As an attempt to remove redundancy in the pixels of an image, we try to combine pixels closer to each other that have the same colour value into a cluster, called superpixels. Superpixels are used as the starting point for many image segmentation algorithms as they increase the efficiency of the algorithms.

from skimage import segmentation, colour
from skimage.io import imread
from skimage.future import graph
from matplotlib import pyplot as plt
img=imread(‘’)
img_segments=segmentation.slic(img, compactness=20, n_segments=500)
superpixels=color.label2rgb(img_segments, img, kind=’avg’)

Question 15

Q

Normalizsed graph cut

Answer

A

One of the most popular image segmentation techniques today. The simplest explanation of the graph cut technique is that each pixel in the image is treated as a node. Apart from these nodes, we have some extra nodes that each represent, say an object in the image. All the pixels are connected to all of its adjacent pixels and each to the object nodes.

After we have defined our graph, we iteratively start cutting the edges in the graph to obtain subgraphs. After a certain time, we will reach a point where we cannot further cut the graph into subgraphs. Each pixel in the image will be connected to one object. We can label each pixel in the image to which object it belongs.

Read image
Perform k-means clustering over colour values.
Use clustered pixels and create a weight graph over these clusters. The weight of each edge is determined by how similar two regions are.
Apply the normalized graph cut technique over the graph obtained in the last step.

Algorithms for semantic segmentation Flashcards

(15 cards)