4. Image Processing Flashcards
What is image processing / filtering? Why is it done?
It is converting one image into a different one according to some algorithm
- Reduce noise (smartphones always have noise)
- Fill-in missing information (like in bayer grid using demosaicing)
- extracting image features (edges, corners)
What are images mathematically?
It is a function that converts (a, b) (c, d) -> [0, m]
Where a, b, c, d are dimensions of the image (start, end of row/column) and m is the max value of the result (like 0-255 in grayscale)
[0, m]3 for color images (r,g,b)
What are the properties of linear filters?
- homogenety: if you amplify image and then apply the filter, it will be the same like if you first apply the filter and then amplify.
- additivity: applying a filter to an image computes like a sum of two images is the same like applying the filter to each separately and then summing them
- superposition: combination of the two
How can linear filters be done using vector-matrix operations?
We first convert an image-matrix into one vector of all values (1x200000) and then multiplying it with a matrix 2000000x2000000.
This is very expensive for just a simple linear filter
Problems with shifting image for a few pixels to one side and then performing the filter will result in a different image compared to filtering and then shifting. (shift-invariance)
Convolution is better
What is convolution? How is it performed?
It is a way to perform a linear filter on the image.
We have a filter (kernel). We perform a sliding window over the whole image and multiply each number in the image with the OPOSITE filter value.
f*g
What are the properties of convolution?
- can be represented as matrix-vector product
- linear
- associative
- commutative
- shift-invariant: if we shift pixels, the convoluted image will be the same
What is correlation?
It is similar to convolution but we do not mirror the filter (we multiply the pixels with the kernel on the EXACT position)
What are the properties of noise-pixel? What could be the cause of those noise-pixels?
A single pixel that is the noise is an outlier of its neighbours (has much lower or higher intensity).
- Light fluctuations (more photons go to one sensor-cell compared to other ones), sensor noise (sending voltage levels around), quantization effects (continuous object is quantized on a finite grid, with a finite (integer) intensity)…
How to deal with noise using linear filters?
- Average filter: we use a 3x3 filter of 1/9 values (sum to 1) and the resulting pixel intensity is the average of its neighbours. This is also called a box filter.
- Gaussian filter: weighted average -> weights nearby pixels more than distant ones
What is a box filter?
It is a linear filter that calculates the pixel intensity as an average of its neighbors. It usually has the box-like looking artifacts.
What is a gaussian filter?
Essentially it is a linear filter that weights nearby pixels more than the distant ones, according to the gaussian distribution. We basically take one gaussian distribution for x-axis and one gaussian distribution for y-axis.
This technique removes the box-artifacts from the box filter and makes the filter more smooth.
How to make box filter and gaussian filter more efficient?
Since they are separable (can be represented as a convolution of two 1D filters) -> gaussian as a horizontal+vertical guassian 1D filter, and box filter as average of vertical+horizontal.
This of it as passing through the image two times, ones for all verticals and once for all horizontal pixels.
This way, instead of having 9 parameters (for 3x3 kernel), we have only 6.
How to deal with boundaries in convolution?
Since convolution reduces the image size (for 3x3 filter, it reduces the size by 2 pixels), we need to add some padding to the original image before the convolution. How to choose the padding?
- Leave them as 0 (black pixels): this introduces dark edges in the blurred image
- Wrap: pretend that the image is infinite and the next pixel goes around to the other side. n pixel becomes 1, n+1 becomes 2…
- Clamp: just extend the last pixel how much we need
- Mirror: similar to wrap, just n becomes n-1, n+1 becomes n-2…
Wrap method have a higher chance of introducing artifacts, but deciding is a matter of use cases and what images we have
How to remove noise but while preserving edges?
Using non-linear filter called median filter. Similar to average but instead of average, we take the median of neighbors (sort and take the middle). This preserves high jumps between the edge but kinda blurs the image.
What are morphological filters?
They are the filters that usually only apply to binary images. There are two types:
- Dilation: If there is at least one 1 in the image part of the filter, it results in 1. Has the foreground-expanding feature
- Erosion: Results in 1 only if all image pixels of the kernel are 1. If there is at least one 0, it results in 0.
This can be generalized for grayscale images
What are image pyramids? What is the general process?
It is representing one image at multiple scales (resolutions). It essentially creates multiple versions of the image with the lower resolutions so that there is a smaller search space to find an object. When it is found on the smaller image, then this info is propagated onto the bigger images and the object is pinpointed on the original image.
What are gaussian pyramids?
It is a pyramid where first we apply the gaussian filter and then do downsampling (take every 2nd pixel both row-wise and column-wise)
Why do we have to apply gaussian filter before downsampling in gaussian pyramid?
Because high frequencies (sharp edges, sharp color transitions) can’t be represented anymore.
We can experience aliasing: smaller objects might appear bigger in the downsampled image.
What is aliasing in the context of pyramids?
They appear when we don’t do any gaussian filtering before the downsampling in gaussian pyramids. This might cause that some small objects appear bigger in the low-resolution images (downsampled images). Also, it might introduce some patterns that badly represent the image.
What is edge detection and why do we need it? What are edges?
It is finding edges of objects in the image. Edges are basically lines that represent the object. We need it because humans can recognize objects only with their drawings/edges, so we can assume we can also find objects easier using their edges in image processing.
Edges are fast changes of intensity of pixels in the image (big color contrast).
What are the goals of edge detection?
- Good detection: the result corresponds to the edge of the object, not some noise
- Good localization: Edge is near the true edge of the object
- Single response: One line per edge
How to detect an edge in 1D? A line of the image?
The idea is that the edges correspond to fast changes (derivative is large)
- Apply gaussian smoothing
- Calculate the derivative of this curve
- Find local optimas of the curve but above some preset threshold (don’t care about low changes)
How to compute derivative of the 1D image-lines for edge detection?
- We can compute the 1st derivative and find local optimums. This can be implemented as linear filter 1 -1 but the derivative will be computed in between the cells (even number of kernel-cells)
- It can also be implemented using 1/2 * 1 0 -1 filter and it is better because we don’t have to shift the image, derivative is calculated at the center of the pixels
- We can calculate the 2nd derivative and find zero crossings (not just where the derivative is 0, but it crosses the y=0 line)
How can we simplify edge detection when having a gaussian smoothing and an edge-detection filtering?
By applying the edge-detection filter to the gaussian filter first, and then applying this to the image. It reduces the computation. This world because of the associative attribute of convolution.
Why the 1st derivative might be problematic in the real-world images?
Because small variations in pixel intensity will lead to big derivatives. Problem: where is the threshold of the 1st derivative?
How to detect either vertical of horizontal edges in 2D images?
Similarly to 1D. We compute the partial derivatives in both x and y direction. It is approximated using linear filters:
x-direction:
1 0 -1
1/6 * 1 0 -1
1 0 -1
y-direction:
1 1 1
1/6 * 0 0 0
-1 -1 -1
This combines the edge detection with smoothing (box-filter) but there is a problem of box filter: we get box artifacts in our image (bad smoothing)
We can use the Sobel filter (use gaussian smoothing with edge detection). Idea is gaussian filter: pixels closer influence more than the further away ones.
x-direction:
1 0 -1
1/8 * 2 0 -2
1 0 -1
y-direction:
1 2 1
1/8 * 0 0 0
-1 -2 -1
What filter was used in this image?
x-direction 2D edge detection, because we detect differences in x direction (we detect when white is left and black is right, and vice versa). That means we get vertical edges, but horizontal ones are bad.
How can we detect edges in 2D images? Explain the whole Canny process
- We have an image and we apply gaussian filter to reduce the noise.
- Compute partial derivatives of the image (x and y direction) using Sobel filters. With the Sobel filters, we also have the gaussian filtering. This results in two images: horizontal and vertical edge detection
- Calculate the gradient magnitude (combining the Ix and Iy from the step 2 into one image). This results in an image with edge detection.
- We can apply some thresholding to only have high gradient magnitudes. Consider the hysteresis to have edges connect.
- Thinning: performing the non-maximum suppression. This process improves the localization: only takes pixels which are the local maximums along the gradient direction. It makes the edges thin
What are Sobel filters?
They are convolutional filters that approximate partial derivatives. They also perform gaussian smoothing in opposite direction of the edge detection. They are used in edge detection and x-direction filter detects vertical edges.
What is hysteresis?
When doing edge detection and we perform thresholding, we will see that edges are dashed lines, they stop and continue etc. This is due to high threshold but if we put the threshold down, we might introduce fake edges / noise. To fix this, we can use hysteresis. There are two thresholds. 1. is the high threshold that should detect true edges only. The second is a bit lower and pixels that pass the second but not the first one are kept only if some noughboring pixels passed the 1. threshold.
Do this iteratively
What is non-maximum suppression?
When we have computed the magnitudes of the gradient and we performed some thresholding (optionally), multiple pixels along the edge can pass the threshold (thick edge). To make this edge thin (perform better localization), we do the non-maximum suppression.
It basically checks if pixel is local maximum along the gradient direction. For each pixel, the largest gradient is calculated. Pixels p and r are taken along the direction of the gradient that intersect the next row or the next column (pixel row/column) and we check if the selected pixel is bigger than both of them. If it is, we keep it and if not, we remove it. The p and r are approximated using linear interpolation (averaging 2 pixels that p/r are in between)
How does a 1st derivative linear filter look like and how does the 2nd in 1D?
1 -1
1 -2 1
How to do edge detection using second derivative? How about smoothing?
Idea is to find zero crossings of the second derivative. The second derivative can be approximated using linear filter 1 -2 1. In 2D, this is
0 1 0
1 -4 1
0 1 0
or
1 1 1
1 -8 1
1 1 1
To perform second derivative with gaussian smoothing, we use Laplacian of Gaussian (LoG) which is just a convolution of gaussian filter and laplacian filter.
The LoG in 2D has a mexian hat shape and can be approximated using DoG (difference of gaussians)
How can LoG be approximated?
Laplacian of Gaussian can be approximated using DoG (difference of gaussians). Ususlly signa1 = 1.6 sigma2 and when we subtract the functions, we get the LoG.
Explain the Laplacian pyramid.
- We compute the Gaussian pyramid. Then, each layer of gaussian pyramid we expand (each pixels is copied 4 times). This means that we have a high resolution image of the information from the lower resolution. To get the Laplacian pyramid, we have to subtract the gaussian layer with the expanded Gi+1 layer. This will result in Li layer of laplacian pyramid.
What this represents is what information is lost when we went from Gi to Gi+1. At the lower levels of the pyramid, the high-frequency information is lost (represented in the laplacian pyramid) and as we move upwards, we get lower and lower frequency information (in laplacian pyramid). We can reconstruct the original image using the laplacian pyramid since the top-most image is the copy of Gn = Ln. We can reverse the process.
What is the application of the Laplacian pyramid?
Image sharpening. For example, if we want to make the edges crisper, we can multiply some of the laplacian layers with some constant (like 1.1) and reconstruct the image. This should make those details more visible.