Exam Flashcards
Describe the steps of Image classification
1) Feature extraction
2) Feature description
3) Classification
Describe the typical pre-processing of digit recognition in general images
- Detect the digits in the large image
- Normalize the size of the digit, for example, to 28x,28 pixels
- Normalize the location, place mass center in the middle
- “Slant” Make the orientation canonical
Name some advantages and disadvantages of K-Nearest Neighbour
- It works reasonably well
- No training required
- Nonlinear decision boundaries
- Multi-class
- All training data must be stored in memory
- Long evaluation time
Name the three conditions Canny proposed for a good edge detector
1) Good detection; should detect all edges
2) Good localization; should detect edges where they are
3) A single response, should only detect edges where they are
Describe Canny’s algorithm for edge detection
1) Gaussian filtering
2) Calculate gradient magnitude and direction
3) Perform non-maximum repression
4) Perform hysteresis thresholding
How can we approximate E(u,v) the error when shifting the neighborhood by u,v pixels?
E(u,v) = [u,v] M [u, v]^T where M is the matrix:
M = [Ix^2 Ixy
Iyx Iy^2]
and Ix, Iyx, Iy are the derivatives.
Given the M matrix, what metrics can be used to determine if we have a corner?
1) R = min(lambda_1, lambda_2)
2) R = lambda_1lambda_2 /(lambda_1 + lambda_2 + epsilon)
3) R = lambda_1lambda_2 - k(lambda_1 + lambda_2)^2
4) R= det(M) - k*trace(M)^2
where lambda_1 and lambda_2 are the eigenvalues
Describe the Harris Corner Detector
1) compute Ix and Iy
2) Create M
3) calculate eigenvalues lambda_1, lambda_2
4) Calculate the Respons R = lambda_1lambda_2 + k(lambda_1 + lambda_2)^2
5) Threshold and Non-Maxima repression with respect to R
How are DoG and LoG filters correlated
DoG(x,y,k,sigma) = I * G(ksigma) - I *G(sigma) approx= (k-1)sigma^2Log(x,y,sigma)
Describe the SIFT algorithm for keypoint detection
1) Find scale/ space extrema using DoG response
2) Fit a quadratic function over space to the extremes and estimate a refined new keypoint (can be in between pixels…)
3) Threshold keypoint responses
Describe how we can determine the canonical orientation of each patch in the SIFT algorithm
1) Create a histogram of 36 bins for degrees from 0-360
2) Each pixel in a neighborhood vote for an orientation weighted by gradient magnitude.
3) The keypoint is assigned an orientation corresponding to the largest bin.
How is the SIFT descriptor created?
1) Take a small window around the keypoint
2) weigh gradients near the center more.
3) Calculate the gradient orientation and magnitude after using a Gaussian filter.
4) Calculate the canonical orientation
5) Rotate all gradient directions relative to the canonical orientation
6) Create a histogram of gradient orientations for each subregion in the local window (Originally 16 subregions and 8 direction histogram).
7) These 16 histograms form a 128-feature vector and are used as a descriptor.
Describe the main advantages of SIFT
1) Robust to intensity changes
2) Invariant to scale
3) Invariant to rotation
What is the main difference between SURF and SIFT?
SURF is faster since it only considers horizontal and vertical gradients using the Haar wavelet. It uses 4 descriptors for each subregion, (sum dx, sum dy, sum |dx| , sum |dy|) and the total descriptor is of length 16*4=64.
Name some improvements to neural networks the past 30 years
1 ) Better hardware
2) Deeper networks
3) Larger datasetets
4) Other changes, better activation funcitons, different layers…