Exam Flashcards

1
Q

Describe the steps of Image classification

A

1) Feature extraction
2) Feature description
3) Classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Describe the typical pre-processing of digit recognition in general images

A
  1. Detect the digits in the large image
  2. Normalize the size of the digit, for example, to 28x,28 pixels
  3. Normalize the location, place mass center in the middle
  4. “Slant” Make the orientation canonical
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Name some advantages and disadvantages of K-Nearest Neighbour

A
  1. It works reasonably well
  2. No training required
  3. Nonlinear decision boundaries
  4. Multi-class
  5. All training data must be stored in memory
  6. Long evaluation time
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Name the three conditions Canny proposed for a good edge detector

A

1) Good detection; should detect all edges
2) Good localization; should detect edges where they are
3) A single response, should only detect edges where they are

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Describe Canny’s algorithm for edge detection

A

1) Gaussian filtering
2) Calculate gradient magnitude and direction
3) Perform non-maximum repression
4) Perform hysteresis thresholding

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How can we approximate E(u,v) the error when shifting the neighborhood by u,v pixels?

A

E(u,v) = [u,v] M [u, v]^T where M is the matrix:

M = [Ix^2 Ixy
Iyx Iy^2]

and Ix, Iyx, Iy are the derivatives.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Given the M matrix, what metrics can be used to determine if we have a corner?

A

1) R = min(lambda_1, lambda_2)
2) R = lambda_1lambda_2 /(lambda_1 + lambda_2 + epsilon)
3) R = lambda_1
lambda_2 - k(lambda_1 + lambda_2)^2
4) R= det(M) - k*trace(M)^2

where lambda_1 and lambda_2 are the eigenvalues

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Describe the Harris Corner Detector

A

1) compute Ix and Iy
2) Create M
3) calculate eigenvalues lambda_1, lambda_2
4) Calculate the Respons R = lambda_1lambda_2 + k(lambda_1 + lambda_2)^2
5) Threshold and Non-Maxima repression with respect to R

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How are DoG and LoG filters correlated

A
DoG(x,y,k,sigma) = I * G(ksigma) - I *G(sigma) 
approx= (k-1)sigma^2Log(x,y,sigma)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Describe the SIFT algorithm for keypoint detection

A

1) Find scale/ space extrema using DoG response
2) Fit a quadratic function over space to the extremes and estimate a refined new keypoint (can be in between pixels…)
3) Threshold keypoint responses

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Describe how we can determine the canonical orientation of each patch in the SIFT algorithm

A

1) Create a histogram of 36 bins for degrees from 0-360
2) Each pixel in a neighborhood vote for an orientation weighted by gradient magnitude.
3) The keypoint is assigned an orientation corresponding to the largest bin.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How is the SIFT descriptor created?

A

1) Take a small window around the keypoint
2) weigh gradients near the center more.
3) Calculate the gradient orientation and magnitude after using a Gaussian filter.
4) Calculate the canonical orientation
5) Rotate all gradient directions relative to the canonical orientation
6) Create a histogram of gradient orientations for each subregion in the local window (Originally 16 subregions and 8 direction histogram).
7) These 16 histograms form a 128-feature vector and are used as a descriptor.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Describe the main advantages of SIFT

A

1) Robust to intensity changes
2) Invariant to scale
3) Invariant to rotation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the main difference between SURF and SIFT?

A

SURF is faster since it only considers horizontal and vertical gradients using the Haar wavelet. It uses 4 descriptors for each subregion, (sum dx, sum dy, sum |dx| , sum |dy|) and the total descriptor is of length 16*4=64.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Name some improvements to neural networks the past 30 years

A

1 ) Better hardware

2) Deeper networks
3) Larger datasetets
4) Other changes, better activation funcitons, different layers…

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How can we adapt gradient descent to fix the vanishing and exploding gradient problems?

A

1) We can use adaptiv stepsizes.

2) We can Clip the gradient using thresholding or L2 norm.

17
Q

What is the R_CNN method?

A

Uses selective search for region proposal, SVM for classification and linear regression for localization. Both use CNN features.

18
Q

What is the fast R_CNN method

A

Uses selective search for proposal, CNN for localization and classification

19
Q

What is a RoI pooling layer?

A

Converts convo feature maps into a fixed size. Used because region proposals can be of arbitary size.

20
Q

What is the main difference between fast R_CNN and faster R_CNN?

A

Faster R_CNN uses a a proposal CNN. The proposal and and detection networks share feature maps.

21
Q

How does the Region Proposal Network work in faster R_CNN?

A

It uses a 3x3 sliding window On the convo feature map and proposes a number of bounding boxes with different scales, aspects and anchors.

22
Q

Name som advantages/ disadvantages of K-Means clustering for segementation

A

+ Fast and easy to implement

- No semantics and supervised methods often perform better

23
Q

How can we use K-means clustering for image segmentation?

A

1) On the histogram of the pixel intensities
2) Colour similarity
3) position and colour similarity
4) Others…

24
Q

How can we use a classification convo net for segmentation?

A

Change the fully connected parts at the end to convo. This gives a heat map of class probabilities. Use tranposed convolayer to upsample this heat map.

25
Q

What are the basic assumptions behind optical flow?

A

Brightness constancy and small displasement

26
Q

What do we know about I(x+u, y+v, t+1) given displacement (u,v) and the brigthness constancy assumption?

A

It is the same as I(x,y,t)

27
Q

What is the optical flow constraint?

A

dI/du * u + DI/dv * v + dI/Dt = 0

28
Q

What assumption does the the Lucas Kanade method make?

A

Flow, or displacement, is constant in a local neighbourhood

29
Q

What is the condition number of a 2-d matrix?

A

abs(lambda_max) / abs(lambda_min)

30
Q

How can the condition number be interpreted?

A

Large conditon number makes the matrix inverse sensitive to noise and small changes in values.

31
Q

What is the idea behind the Horn- Schunk method?

A

Define a global energy function:

Integral_(x,y) ((Ix*u) + (Iy * v) + I_t)^2 +
alpha(abs(gradient(u))^2 + abs(gradient(u))^2) dxdy

And minimize the energy with respect to u(x,y), v(x,y).

32
Q

What can we do if the small displacement condition does not hold?

A

Use spatial pyramids of downsampled images. Estimate motion from the coarsest (smallest) image first and iteratively to larger versions.

33
Q

Derive the optical flow constraint

A

Constant brightness: I(x+u, y+v, t+1) = I(x,y,t)
Taylor: I(x+u, y+v, t+1) = I(x,y,t) + Ixu + Iyu + It
Combining: Ix + Iy + It = 0

34
Q

What is the:

1) formula for calculating output sizes of convo layers
2) formula for calculating k for dilation layers

A

1) (Xin + 2p - k) / s + 1

2) k + (k-1)(d-1)