Week 6 and 7 - Local Features Flashcards
What is Exhaustive search in corner detection
Naive approach
Can change the scale (vary patch size) and compare surrounding features
What is wrong with exhaustive search
Computationally inefficient
Unfeasible for large collections of data
Ineffective for recognition tasks
How can we use Automatic scale selection
We need scale invariant feature detection
We want a function that is scale invariant when applied to a region
We consider the function to be any that involves all pixels in the region (eg intensity, texture)
We take the local maximum of this function
We scale the image (squish and expand along x axes)
x = region size
Again find the local maximum (will correspond to the same feature - not same coordinates)
What is “blob detection”
The Laplacian of Gaussian
What is The Laplacian of Gaussian for feature detection
it is scale invariant
Essentially the 2nd derivative of either x or y (partial second derivative)
How do you carry out LoG on an image
Convolve the image with a Gaussian kernel followed by the Laplacian kernel, capturing regions of rapid intensity change
combine the two to make the Laplacian of gaussian kernel (one operation)
What are the characteristics of LoG
Found using a single mask
Orientation information is lost
very noise sensitive (taking derivatives increases noise)
What is the characteristic scale for LoG
The scale that produces the extreme value (peak) of the LoG response
Largest blob
The size of the blob is proportional to the characteristic scale
How are blobs useful for local interest points
They give us the area to use
How do we detect interest points at different scales
Image is convolved with the LoG filter at various levels of blur, determined by the standard deviation (σ) of the Gaussian kernel
Allows for feature detection at different scales
What is the Harris-Laplace method
Multiscale harris corner detection: Identify points that are likely to be corners at different scales
Scale Selection based on Laplacian: Compute LoG at different scales
We do both because harris detection only finds corners and blob detection will find other interest points (combine to variety of interest points)
How can we approximate LoG with DoG for efficient implementation
DoG = G(x, y, kσ)- G(x,y,σ)
(difference of gaussians - from two different levels of blur)
This is more efficient as we do not need to compute 2nd derivative
Gaussians are already automatically being computed in the gaussian pyramid
If we choose the correct values for sigma, we will get a very efficient computation
How does a Gaussian pyramid work
The more Gaussian layers you use, the sigma adds up
large sigma -> more blurring
Increasing scale, lowers the resolution
So we subsample (do not need as many pixels to present a lower resolution image)
How does interest point localisation with DoG work
construct DoG pyramid
detect local maxima in scale space
Thresholding, reject points with low contrast
edge elimination (ratio of principle curvatures)
Note: at this point we are still at candidate points and candidate patches -> have not actually begun to use interest regions
What are the two strategies for scale-invariant detection
LoG
DoG as fast approximation
These can be used on their own or in combinations with single scale key-point detectors (eg harris corner detector)
General Summary: Scale Invariant Detection
● Given: Two images of the same scene with a large scalemdifference between them.
● Goal: Find the same interest points independently in each image
● Solution: Search for maxima of suitable functions in scale and in space (over the image)
What are the two main components of achieving feature invariance
1) Detector is invariant to translation, rotation and scale
Can find interest point and remove the effects of different scales
2) Design an invariant feature descriptor
This captures the information in a region around a detected interest point
What is the simplest local feature descriptor
A square window of pixels
We write a region A as vector a and Region b as vector b
Each vector is a list of intensities within a patch (or some characteristic)
Calculate vector similarities
What is the simplest local feature descriptor invariant to
Translation
but intensity, 3D viewpoint change and rotation will cause big differences
How do detectors remove the effect of different scales for feature matching
Ensure they are scale invariant
use σ which is proportional to scale
normalise scales
How do we use gradients as a feature descriptor
Take the intensity gradient of each pixel in the patch
Take the orientation (ignore magnitude as that is affected by illumination)
And build a histogram (after normalising using dominant direction)
Why is it important to have rotation invariant descriptors
Want to find the dominant direction of gradient for the image patch
Then we rotate the patch according to this angle
This puts the patches into a canonical orientation (normalising)
What is SIFT
Scale Invariant Feature Transform
A descriptor computation
How does SIFT work
1) Localisation, uses DoG to find candidate interest points (takes note of scale)
2) Normalises to predefined size (16x16)
3) check if it is an edge and reject
4) Then finds the gradient orientation over a 16x16 pixel region around interest point
5) Computes histogram of image gradient orientations for all pixels within the patch
How does Orientation Normalisation work
The resulting gradient orientation of the patch is quantised into 8 bins over 0-360 degrees
- compute orientation histogram
-select dominant orientation
-normalise: rotate to fixed orientation
How does SIFT form vectors
It then divides the initial 16x16 window into 4x4 sub-patches: 16 cells each
compute histogram of gradient orientations for pixels in the sub patch ( 8 reference angles/bins)
Essentially for each sub patch, we have a vector of 8 dimensions (all invariant of scale and rotation)
We do this further sub patch computation to find finer details
(128 dimension: 4x4x8)
What are the advantages of SIFT
- can handle changes in viewpoints up to ~60 degrees
-can handle significant changes in illumination
-fast and efficient
What does one image computed with SIFT yield
n 128-dimensional descriptors
n scale parameters specifying size of each patch
n orientation parameters specifying angle of each patch
n 2D points giving positions of patches
What are the two main components of match a features I1 to I2
- Define a distance function that compares the two descriptors
- Test all the features in I2, find the one with min distance
How do we define a distance between two features f1, f2 (in descriptor I1, I2)
SSD(f1, f2)
sum of square difference between entries of two descriptors
Why is SSD not good for distance calculating feature matching
Can give a very good score for a bad match (ambiguous)
How to we improve SSD feature matching distance calculation
Ratio distance = SSD(f1, f2) / SSD(f1, f2’)
f2 = best SSD match to f1
f2’ = 2nd best SSD match to f1
What will the ratio distance be for ambiguous matches
Large values (~1)
How do we eliminate bad matches in feature matching
Thresholding
Throw out features with larger distance than threshold
high threshold -> more false positives
low threshold -> less true positives
What is a ROC Curve
plots the true positive rate (y axis/recall) against the false positive rate (x axis/precision)
We want to maximise the area under the curve
useful for comparing different feature matching methods
Advantages of local features
- Critical to find distinctive and repeatable local regions for multi-view matching
- Complexity reduction via selection of distinctive points.
- Describe images, objects, parts without requiring
segmentation; robustness to clutter & occlusion. - Robustness: similar descriptors in spite of moderate view changes, noise, blur, etc.