Computer Vision Flashcards
Roberts Cross Operator (Diagonal)
2x2 operator that looks for diagonal gradient
Prewitt Operator (3x3) (Horizontal & Vertical)
Similar to Sobel in that it uses two 3 x 3 kernels.
One for changes in the horizontal direction, and one for changes in the vertical direction.
The two kernels are convolved with the original image to calculate the approximations of the derivatives.
Sobel Operator (Horizontal & Vertical)
Calculates the gradient of image intensity at each pixel within the image.
The result shows how abruptly or smoothly the image changes at each pixel, and therefore how likely it is that that pixel represents an edge.
Prewitt Operator (4x4) (Horizontal & Vertical)
Similar to Prewitt (3x3)
Covers a larger area, where the focus is not necessarily on immediate pixels but others contributing to the edge change.
Canny Edge Detector Assumption
Linear filtering and additive Gaussian noise.
Canny Edge Detector Properties
Edge detector should have:
- Good detection: filter responds to edge not noise.
- Good localization: detected edge near true edge.
- Single response: one per edge
Canny Edge Detector Process - 1. Gradient Direction Identification
Uses filter based on first derivative of a Gaussian, because Canny is susceptible to noise present in raw image data. Therefore:
- Raw image is convolved with a Gaussian filter. (The result is a slightly blurred version of original image that is not affected by a single noisy pixel to any significant degree)
- Canny algorithm uses 4 filters to detect horizontal, vertical, and diagonal edges in blurred image. The edge detection operator (Roberts, Prewitt, Sobel for example) returns a value for first derivative in the horizontal and vertical directions.
- Now, the edge gradient and direction can be determined. The edge direction angle is rounded to one of the four angles representing vertical, horizontal and the two diagonals (0, 45, 90 + 135 degrees).
Canny Edge Detector Process - 2. Edge Detection
Using the gradient directions, we can start to detect edges:
- Norm of gradient (i.e., along the direction of line/curve).
- Thresholding (to respond to edges, not noise).
- Thinning (for good localization and single response).
- Hysteresis (to improve localization, use a high threshold to start curves and a low threshold to continue them).
Effect of Gaussian kernel (σ) size on the Canny Detector algorithm
Choice of σ depends on desired behaviour where:
- Large σ detects large-scale edges (and better noise suppression).
- Small σ detects fine features.
Hough Transform (HT) Purpose
Detects a line using a “voting” scheme, where points vote for a set of parameters that describe a line. The more votes for a particular set, the more evidence that the corresponding line is present in the image. So, the HT can detect MULTIPLE lines in one shot.
Hough Transform (HT) Process
- Initialise H[d,Ѳ] = 0.
- For each edge point I[x,y] in the image from Ѳ = 0 to 180:
i. d = xcos(Ѳ) + ysin(Ѳ) (where d is the
perpendicular distance from the line to origin)
ii. H[d,Ѳ] += 1 - Find the value(s) of (d,Ѳ) where H[d,Ѳ] is maximum.
- The detected line in the image is given by:
i. d = xcos(Ѳ) + ysin(Ѳ)
What is the running time (measured in the number of votes)?
Finding Curved Line with Hough Transform
For circles:
Instead of the equation for a straight line (y=mx+b), we can use the equation of a circle r2 = (x-h)^2 + (y-k)^2 and translate it to d = xcos(Ѳ) + ysin(Ѳ) where d is the perpendicular distance from the line to the origin and Ѳ is the angle this perpendicular makes with the x-axis.
For each point in the image ( given by [x,y] ), we can initialise H[d,Ѳ] = 0 for Ѳ values in the range (0,180). From there we can find the value(s) of (d, Ѳ) where H[d, Ѳ] is a maximum.
The Hough transform can then be generalized to detect any curve which can be expressed in parametric form i.e., Y = f(x, a1, a2, …an ). This essentially reduces it into some simple space of parameters and asks how many points in the image would conform to that?
Corner detection
• Corners contain more edges than lines
• A point on a line is hard to match, while corners are easier.
• Edge detectors tend to fail at corners
• By intuition corner:
1. Right at corner gradient is ill defined
2. Near corner, gradient has two different values.
To detect corners:
- Filter image
- Compute magnitude of the gradient everywhere
- Construct a C window
- Use linear algebra to find lambda1 and 2
- If both are big = corner
Local Features Definition
Matching points across images important for recognition and pose estimation.
A good local image feature to track should:
- Satisfy brightness constancy.
- Have sufficient texture variation.
- Correspond to a “real” surface patch.
- Not deform too much over time.
Q: Depth values in a stereo pair of images
- One image is rectified (aligned) with respect to the other (using the “essential matrix E”).
- Points on a horizontal line in one image are matched with corresponding points on the same line in the other image.
- “x” is the distance between a matching pair of points is called the disparity. The larger the disparity, the closer that point is to the camera based on triangulation.
Q: Lukas Kanade Algorithm: Optical flow points in two successive frames of video
- Lucas-Kanade Algorithm: Integrates gradients over an image patch to find features good enough to track using the Harris detector.
- A constant velocity is assumed for all pixels within an image patch.
- Optical flow is the measurement of movement that feature points undergo in successive frames.
Q: Describe how depth can be calculated from optical flow
- Relative depth can be calculated from the velocity of optical flow points – which is larger when depth is less. So absolute depth could be determined if the velocity is known.
- Even for a camera moving forwards or backwards with no rotation – as depth decreases, the “focus of expansion” velocity increases (and vice-versa).
Harris detector
Hint: Eigenvalues.
Captures structure of local neighbourhood using an auto-correlation matrix, where:
* 2 strong eigenvalues = good local feature. * 1 strong eigenvalue = contour. * 0 = uniform region.
Measures quality of a feature – because the best feature points can be thresholded on the eigenvalues.
SIFT (Scale-invariant Feature Transform)
- Thresholding image gradients are sampled over 16x16 array of locations in scale space.
- An array of orientation histograms is created at each location.
- Because SIFT is based on a vector of angles, it is computationally efficient.
Comparison of Harris Detector and SIFT algorithms
- Both algorithms are illumination and rotation invariant because they are based on operators of the gradient but are not deformation invariant.
- SIFT is scale-invariant because it is sampled at different scales, but Harris is not scale-invariant.
- They are translation invariant for x and y motion perpendicular to the camera – but not for z.
Erosion
Removes outside pixels of a region/blob (and internal holes/regions) usually using a convolution kernel/Mask and an AND operation (or subtracts the convolution of the kernel with the image).
Erosion Effect
Removes small details such as thin lines, noise point, and widens gaps. Shrinks a region to a skeleton with successive erosions.
Dilation
Adds pixels to the outside of a region/blob using a convolution kernel and an OR operation (or adds the convolution of the kernel with the image).
Dilation effect
Enlarges region/blob, thickens lines, and fills small holes/gaps.
Open
Erode then Dilate image.
Open Effect
Removes small details such as thick lines, spurs and noise. Smoothes jagged edges without changing the size of the original image.
Close
Dilate then Erode image.
Close Effect
Closes/fills small gaps/holes and preserves thin lines without changing the size of the original object.
Homography H
Relates the relative pose of 2 cameras viewing a planar scene. Estimate from feature correspondences using RANSAC.