Exam Flashcards
Describe the steps of Image classification
1) Feature extraction
2) Feature description
3) Classification
Describe the typical pre-processing of digit recognition in general images
- Detect the digits in the large image
- Normalize the size of the digit, for example, to 28x,28 pixels
- Normalize the location, place mass center in the middle
- “Slant” Make the orientation canonical
Name some advantages and disadvantages of K-Nearest Neighbour
- It works reasonably well
- No training required
- Nonlinear decision boundaries
- Multi-class
- All training data must be stored in memory
- Long evaluation time
Name the three conditions Canny proposed for a good edge detector
1) Good detection; should detect all edges
2) Good localization; should detect edges where they are
3) A single response, should only detect edges where they are
Describe Canny’s algorithm for edge detection
1) Gaussian filtering
2) Calculate gradient magnitude and direction
3) Perform non-maximum repression
4) Perform hysteresis thresholding
How can we approximate E(u,v) the error when shifting the neighborhood by u,v pixels?
E(u,v) = [u,v] M [u, v]^T where M is the matrix:
M = [Ix^2 Ixy
Iyx Iy^2]
and Ix, Iyx, Iy are the derivatives.
Given the M matrix, what metrics can be used to determine if we have a corner?
1) R = min(lambda_1, lambda_2)
2) R = lambda_1lambda_2 /(lambda_1 + lambda_2 + epsilon)
3) R = lambda_1lambda_2 - k(lambda_1 + lambda_2)^2
4) R= det(M) - k*trace(M)^2
where lambda_1 and lambda_2 are the eigenvalues
Describe the Harris Corner Detector
1) compute Ix and Iy
2) Create M
3) calculate eigenvalues lambda_1, lambda_2
4) Calculate the Respons R = lambda_1lambda_2 + k(lambda_1 + lambda_2)^2
5) Threshold and Non-Maxima repression with respect to R
How are DoG and LoG filters correlated
DoG(x,y,k,sigma) = I * G(ksigma) - I *G(sigma) approx= (k-1)sigma^2Log(x,y,sigma)
Describe the SIFT algorithm for keypoint detection
1) Find scale/ space extrema using DoG response
2) Fit a quadratic function over space to the extremes and estimate a refined new keypoint (can be in between pixels…)
3) Threshold keypoint responses
Describe how we can determine the canonical orientation of each patch in the SIFT algorithm
1) Create a histogram of 36 bins for degrees from 0-360
2) Each pixel in a neighborhood vote for an orientation weighted by gradient magnitude.
3) The keypoint is assigned an orientation corresponding to the largest bin.
How is the SIFT descriptor created?
1) Take a small window around the keypoint
2) weigh gradients near the center more.
3) Calculate the gradient orientation and magnitude after using a Gaussian filter.
4) Calculate the canonical orientation
5) Rotate all gradient directions relative to the canonical orientation
6) Create a histogram of gradient orientations for each subregion in the local window (Originally 16 subregions and 8 direction histogram).
7) These 16 histograms form a 128-feature vector and are used as a descriptor.
Describe the main advantages of SIFT
1) Robust to intensity changes
2) Invariant to scale
3) Invariant to rotation
What is the main difference between SURF and SIFT?
SURF is faster since it only considers horizontal and vertical gradients using the Haar wavelet. It uses 4 descriptors for each subregion, (sum dx, sum dy, sum |dx| , sum |dy|) and the total descriptor is of length 16*4=64.
Name some improvements to neural networks the past 30 years
1 ) Better hardware
2) Deeper networks
3) Larger datasetets
4) Other changes, better activation funcitons, different layers…
How can we adapt gradient descent to fix the vanishing and exploding gradient problems?
1) We can use adaptiv stepsizes.
2) We can Clip the gradient using thresholding or L2 norm.
What is the R_CNN method?
Uses selective search for region proposal, SVM for classification and linear regression for localization. Both use CNN features.
What is the fast R_CNN method
Uses selective search for proposal, CNN for localization and classification
What is a RoI pooling layer?
Converts convo feature maps into a fixed size. Used because region proposals can be of arbitary size.
What is the main difference between fast R_CNN and faster R_CNN?
Faster R_CNN uses a a proposal CNN. The proposal and and detection networks share feature maps.
How does the Region Proposal Network work in faster R_CNN?
It uses a 3x3 sliding window On the convo feature map and proposes a number of bounding boxes with different scales, aspects and anchors.
Name som advantages/ disadvantages of K-Means clustering for segementation
+ Fast and easy to implement
- No semantics and supervised methods often perform better
How can we use K-means clustering for image segmentation?
1) On the histogram of the pixel intensities
2) Colour similarity
3) position and colour similarity
4) Others…
How can we use a classification convo net for segmentation?
Change the fully connected parts at the end to convo. This gives a heat map of class probabilities. Use tranposed convolayer to upsample this heat map.
What are the basic assumptions behind optical flow?
Brightness constancy and small displasement
What do we know about I(x+u, y+v, t+1) given displacement (u,v) and the brigthness constancy assumption?
It is the same as I(x,y,t)
What is the optical flow constraint?
dI/du * u + DI/dv * v + dI/Dt = 0
What assumption does the the Lucas Kanade method make?
Flow, or displacement, is constant in a local neighbourhood
What is the condition number of a 2-d matrix?
abs(lambda_max) / abs(lambda_min)
How can the condition number be interpreted?
Large conditon number makes the matrix inverse sensitive to noise and small changes in values.
What is the idea behind the Horn- Schunk method?
Define a global energy function:
Integral_(x,y) ((Ix*u) + (Iy * v) + I_t)^2 +
alpha(abs(gradient(u))^2 + abs(gradient(u))^2) dxdy
And minimize the energy with respect to u(x,y), v(x,y).
What can we do if the small displacement condition does not hold?
Use spatial pyramids of downsampled images. Estimate motion from the coarsest (smallest) image first and iteratively to larger versions.
Derive the optical flow constraint
Constant brightness: I(x+u, y+v, t+1) = I(x,y,t)
Taylor: I(x+u, y+v, t+1) = I(x,y,t) + Ixu + Iyu + It
Combining: Ix + Iy + It = 0
What is the:
1) formula for calculating output sizes of convo layers
2) formula for calculating k for dilation layers
1) (Xin + 2p - k) / s + 1
2) k + (k-1)(d-1)