Local Features - Week 5/6/7 Flashcards
What are the two types of object recognition?
Model based object recognition - Find the razors given a 3D model
Image based object recognition - Given a picture of the razor find the razors in the image
What is a feature in computer vision?
Local, meaningful, detectable parts of the image
Location of a sudden change
Why are features useful in computer vision?
They have a high information content
Invariant to change of view point, illumination
Reduces computational burden
Why are features useful in computer vision?
They have a high information content
Invariant to change of view point, illumination
Reduces computational burden
What are some applications of image features?
Visual SLAM (simultaneous Localisation and Mapping)
Image Matching (Are two images of the same thing?)
Image alignment
3D reconstruction
Motion Tracking
Indexing and database retrieval
Robot Navigation
Other…
What is the procedure for image stitching?
Detect feature points in both images
Find corresponding pairs
Use these pairs to align the images
What is the general approach to finding and describing features in images?
- Find a set of distinctive key points (interest points)
- Define a region around each key point (interest point)
- Extract and normalise the region content
- Compute a local descriptor from the normalised region
- Match local descriptors
What are the requirements for local features?
Region extraction needs to be repeatable and:
- Invariant to translation, rotation, scale changes
- Robust or covariant to out-of-plane (~affine) transformations
Robust to lighting variations, noise, blue, quantisation
Locality: Features are local, therefore robust fo occlusion and clusster
Quantity: Need a sufficient number of regions to cover the object
Distinctiveness - The regions should contain “interesting” structure.
Efficiency - Close to real-time performance
Why are corners good for features?
Edges only localise in one direction
Corners provide repeatable points for matching, so are worth detecting.
What is the idea behind Harris corner detection?
In the region around a corner, image gradient has two or more dominant directions.
So shifting a window around a corner in any direction should give a large change in intensity.
What are the three distinctive interest points for corners?
“flat” region: no change in all directions
“Edge” region: no change along the edge direction
“Corner” significant change in all directions
How is the 2x2 matrix M computed for a region that is being checked for a corner? What actually is M?
the gradient with respect to x times the gradient with respect to y multiplied by the window function at the point same point
What are the three types of covariance matrices?
Spherical, Diagonal and full covariances
Given eigenvalues lambda 1 and 2 of the M matrix of a point on the image, what identifies flat, edge and corner regions?
If both lambdas are small, then E is almost constant in all directions, so the region is flat
If one lambda is much greater than the other then it is an edge region
If both lambdas are large and lambda1 ~ lambda2 then the region is a corner
How is the R corner response in the Harris Corner Detector calculated?
R = Det(M) - alpha * trace(M)^2
How does the value R in the Harris Corner Detector related to the image regions?
The flat regions R has a low value
For Corner regions, R has a high value
For edge regions R has a negative value
How does the value R in the Harris Corner Detector related to the image regions?
The flat regions R has a low value
For Corner regions, R has a high value
For edge regions R has a negative value
What is the Harris corner detector workflow?
Compute the corner responses R
Find the points with large corner responses through thresholding.
Take only the local maxima of R
Why do we need to sum over a window for the uniform window function but not for the gaussian window?
By computing gaussian blur in the first place we’ve already computed the weighted sum
Do Harris detectors provide rotation invariance?
Yes the corner response is invariant to the image rotation
Do Harris detectors provide scale invariance?
No, if only looking at a small area, a corner could be classified as many edges
What are the advantages of the Harris detector providing interest points
Precise localisation
High repeatability
In order to compare these points, we need to compute a descriptor over a region. Scale invariant interest regions
What is the naive approach to scale invariant region selection / description?
Multi-scale procedure, compare descriptors while varying the patch size.
Computationally inefficient, still possible for matching
Prohibitive for retrieval in large databases
Prohibitive for recognition
What is the solution to scale invariant region selection?
To design a function on the region which is “scale invariant” (the same for corresponding regions, even if they are at different scales) to use as the descriptor
What is the common approach to automatic scale selection?
Take a signature function (Maybe scale-invariant?) (e.g. a good one is the laplacian of the gaussian filter - the second derivative of 2D gaussian)
Compute the signature function at different region sizes, find the maximum of the function at those scales, the region size for which the maximum is achieved should be invariant to image scale.
The two f against region size plots are generated separately from each other, then their local maximums are computed and compared.
Is the Laplacian a scalar?
Yes, can be found using a single mask
Does the laplacian of an image retain orientation information?
No, orientation information is lost
What effect does taking the laplacian of an image have on noise
The Laplacian is the the sum of second-order derivatives of the image.
Taking derivatives increases noise, so the laplacian is very noise sensitive.
What is the laplacian always paired with?
It is always paired with a smoothing operation, to deal with the fact that is amplifies noise.
How is the characteristic scale defined?
The scale that produces peak of Laplacian response
What can be used as an approximation of Laplacian of Gaussian?
Difference of gaussians
Take the gaussian with a value sigma, and a scaled value k*sigma and compute their difference, producing a similar function to the laplacian of gaussians
What is the state of the art for feature matching?
SIFT
How is feature invariance achieved?
- Make sure the feature detector is invariant to translation, rotation and scale
- Know how to find interest points (locations and their corresponding characteristic scales)
- Know how to remove the effects of difference in scale once we detect one of these interest points - Design an invariant feature descriptor
What is the disadvantage of using patches with pixel intensities as descriptors?
Small changes (scale, rotation, 3D viewpoint change) can affect matching score a lot
What is a better method for descriptors than pixel intensities?
Histogram of gradient directions of the patch
How are rotation invariant descriptors created?
Final local orientation
- Dominant direction of gradient for the image patch
Rotate the patch according to the above angle
- This puts the patches into a canonical orientation
How does SIFT describe features?
It first gets features and normalises them (typically to 16x16 size), say with DoG (Difference of gaussians - approximation of Laplacian).
Then it gets the gradient orientation over a 16x16 pixel region around the interest point. Computes a histogram of image gradient orientation for all pixels within 4 4x4 sub-patches, makes a histogram of them in 8 bins.
All of the histogram counts are then concatenated to give a 128 dimension descriptor vector for the feature.
What do the sift descriptors of one image yield?
For each feature / patch:
- 128-dimensional descriptor: each is a histogram of the gradient orientations within a patch
- A scale parameter specifying the size of the patch
- An orientation parameter specifying the specifying the angle of the patch
- 2D points giving the position of each patch
How is the best feature for each feature determined?
- Define a distance function that compares two descriptors
- Test all the features in one image to a feature in the other, find the one with minimum distance
What common distance functions exist to compare features?
SSD - Sum squared distance of the descriptor, can give good scores to ambiguous matches (think fence tops)
Ratio distance = SSD(f1,f2) / SSD(f1,f2’)
Where f is the best SSD match, f’ is the second best SSD match. Gives large values (~1) for ambiguous matches
How are bad feature matches removed?
Thresholding based on the feature distance (matches with a distance greater than e.g. 100, removed)
What are true positives for feature matches?
The number of detected matches which are correct
What are false positives for feature matches?
The number of detected matches that are incorrect
What curve is used to evaluate a feature matcher?
An ROC curve (“Receiver Operator Characteristic”)
What is the true positive rate for feature matches?
The number of matches found by the matcher / The number of potentially correct matches
What is the false positive rate for feature matches?
The number of matches found by the matcher that were incorrect / The number of features that don’t actually have a match
What are some of the applications of features?
Image alignment (e.g. mosaics)
3D reconstruction
Motion Tracking
Object recognition
Indexing and database retrieval
Robot navigation
Panorama stitching
Recognition of specific objects