Computer Vision Flashcards

1
Q

Roberts Cross Operator (Diagonal)

A

2x2 operator that looks for diagonal gradient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Prewitt Operator (3x3) (Horizontal & Vertical)

A

Similar to Sobel in that it uses two 3 x 3 kernels.
One for changes in the horizontal direction, and one for changes in the vertical direction.
The two kernels are convolved with the original image to calculate the approximations of the derivatives.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Sobel Operator (Horizontal & Vertical)

A

Calculates the gradient of image intensity at each pixel within the image.

The result shows how abruptly or smoothly the image changes at each pixel, and therefore how likely it is that that pixel represents an edge.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Prewitt Operator (4x4) (Horizontal & Vertical)

A

Similar to Prewitt (3x3)

Covers a larger area, where the focus is not necessarily on immediate pixels but others contributing to the edge change.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Canny Edge Detector Assumption

A

Linear filtering and additive Gaussian noise.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Canny Edge Detector Properties

A

Edge detector should have:

  • Good detection: filter responds to edge not noise.
  • Good localization: detected edge near true edge.
  • Single response: one per edge
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Canny Edge Detector Process - 1. Gradient Direction Identification

A

Uses filter based on first derivative of a Gaussian, because Canny is susceptible to noise present in raw image data. Therefore:

  1. Raw image is convolved with a Gaussian filter. (The result is a slightly blurred version of original image that is not affected by a single noisy pixel to any significant degree)
  2. Canny algorithm uses 4 filters to detect horizontal, vertical, and diagonal edges in blurred image. The edge detection operator (Roberts, Prewitt, Sobel for example) returns a value for first derivative in the horizontal and vertical directions.
  3. Now, the edge gradient and direction can be determined. The edge direction angle is rounded to one of the four angles representing vertical, horizontal and the two diagonals (0, 45, 90 + 135 degrees).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Canny Edge Detector Process - 2. Edge Detection

A

Using the gradient directions, we can start to detect edges:

  1. Norm of gradient (i.e., along the direction of line/curve).
  2. Thresholding (to respond to edges, not noise).
  3. Thinning (for good localization and single response).
  4. Hysteresis (to improve localization, use a high threshold to start curves and a low threshold to continue them).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Effect of Gaussian kernel (σ) size on the Canny Detector algorithm

A

Choice of σ depends on desired behaviour where:

  • Large σ detects large-scale edges (and better noise suppression).
  • Small σ detects fine features.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Hough Transform (HT) Purpose

A

Detects a line using a “voting” scheme, where points vote for a set of parameters that describe a line. The more votes for a particular set, the more evidence that the corresponding line is present in the image. So, the HT can detect MULTIPLE lines in one shot.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Hough Transform (HT) Process

A
  1. Initialise H[d,Ѳ] = 0.
  2. For each edge point I[x,y] in the image from Ѳ = 0 to 180:
    i. d = xcos(Ѳ) + ysin(Ѳ) (where d is the
    perpendicular distance from the line to origin)
    ii. H[d,Ѳ] += 1
  3. Find the value(s) of (d,Ѳ) where H[d,Ѳ] is maximum.
  4. The detected line in the image is given by:
    i. d = xcos(Ѳ) + ysin(Ѳ)
    What is the running time (measured in the number of votes)?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Finding Curved Line with Hough Transform

A

For circles:

Instead of the equation for a straight line (y=mx+b), we can use the equation of a circle r2 = (x-h)^2 + (y-k)^2 and translate it to d = xcos(Ѳ) + ysin(Ѳ) where d is the perpendicular distance from the line to the origin and Ѳ is the angle this perpendicular makes with the x-axis.

For each point in the image ( given by [x,y] ), we can initialise H[d,Ѳ] = 0 for Ѳ values in the range (0,180). From there we can find the value(s) of (d, Ѳ) where H[d, Ѳ] is a maximum.

The Hough transform can then be generalized to detect any curve which can be expressed in parametric form i.e., Y = f(x, a1, a2, …an ). This essentially reduces it into some simple space of parameters and asks how many points in the image would conform to that?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Corner detection

A

• Corners contain more edges than lines
• A point on a line is hard to match, while corners are easier.
• Edge detectors tend to fail at corners
• By intuition corner:
1. Right at corner gradient is ill defined
2. Near corner, gradient has two different values.

To detect corners:

  1. Filter image
  2. Compute magnitude of the gradient everywhere
  3. Construct a C window
  4. Use linear algebra to find lambda1 and 2
  5. If both are big = corner
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Local Features Definition

A

Matching points across images important for recognition and pose estimation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

A good local image feature to track should:

A
  1. Satisfy brightness constancy.
  2. Have sufficient texture variation.
  3. Correspond to a “real” surface patch.
  4. Not deform too much over time.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Q: Depth values in a stereo pair of images

A
  • One image is rectified (aligned) with respect to the other (using the “essential matrix E”).
  • Points on a horizontal line in one image are matched with corresponding points on the same line in the other image.
  • “x” is the distance between a matching pair of points is called the disparity. The larger the disparity, the closer that point is to the camera based on triangulation.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Q: Lukas Kanade Algorithm: Optical flow points in two successive frames of video

A
  • Lucas-Kanade Algorithm: Integrates gradients over an image patch to find features good enough to track using the Harris detector.
  • A constant velocity is assumed for all pixels within an image patch.
  • Optical flow is the measurement of movement that feature points undergo in successive frames.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Q: Describe how depth can be calculated from optical flow

A
  • Relative depth can be calculated from the velocity of optical flow points – which is larger when depth is less. So absolute depth could be determined if the velocity is known.
  • Even for a camera moving forwards or backwards with no rotation – as depth decreases, the “focus of expansion” velocity increases (and vice-versa).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Harris detector

Hint: Eigenvalues.

A

Captures structure of local neighbourhood using an auto-correlation matrix, where:

 * 2 strong eigenvalues = good local feature.
 * 1 strong eigenvalue = contour.
 * 0 = uniform region.

Measures quality of a feature – because the best feature points can be thresholded on the eigenvalues.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

SIFT (Scale-invariant Feature Transform)

A
  1. Thresholding image gradients are sampled over 16x16 array of locations in scale space.
  2. An array of orientation histograms is created at each location.
  3. Because SIFT is based on a vector of angles, it is computationally efficient.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Comparison of Harris Detector and SIFT algorithms

A
  1. Both algorithms are illumination and rotation invariant because they are based on operators of the gradient but are not deformation invariant.
  2. SIFT is scale-invariant because it is sampled at different scales, but Harris is not scale-invariant.
  3. They are translation invariant for x and y motion perpendicular to the camera – but not for z.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Erosion

A

Removes outside pixels of a region/blob (and internal holes/regions) usually using a convolution kernel/Mask and an AND operation (or subtracts the convolution of the kernel with the image).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Erosion Effect

A

Removes small details such as thin lines, noise point, and widens gaps. Shrinks a region to a skeleton with successive erosions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Dilation

A

Adds pixels to the outside of a region/blob using a convolution kernel and an OR operation (or adds the convolution of the kernel with the image).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Dilation effect

A

Enlarges region/blob, thickens lines, and fills small holes/gaps.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Open

A

Erode then Dilate image.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Open Effect

A

Removes small details such as thick lines, spurs and noise. Smoothes jagged edges without changing the size of the original image.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Close

A

Dilate then Erode image.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Close Effect

A

Closes/fills small gaps/holes and preserves thin lines without changing the size of the original object.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Homography H

A

Relates the relative pose of 2 cameras viewing a planar scene. Estimate from feature correspondences using RANSAC.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Essential Matrix E

A

Relates the relative pose of 2 cameras viewing a 3D scene. Estimates from feature correspondences using RANSAC.

32
Q

Bundle Adjustment (BA)

A
  • Initialize using RANSAC (for E).

- Estimates a set of 3D points and camera poses which minimizes reprojection error.

33
Q

Classification

A

Detect whether an object is present or not in an image. (i.e. not detecting where it is, but just detecting if such an object exists anywhere in an image).

34
Q

Object Detection

A

Detecting the location of an object in an image (returning regions-of-interest/bounding-box coordinates).

35
Q

Dense Segmentation

A

Label every pixel in an image as belonging to a class (e.g. grass pixels or sheep pixels).

36
Q

Instance Segmentation

A

Label segmented pixels for each instance of a class (such as recognise which general sheep pixels in an image belong to which individual sheep in a flock of sheep in an image).

37
Q

Unsupervised Learning

A

Algorithms work on datasets that are unlabelled and find patterns which would previosuly not be known to us.

Example: Graph cut

38
Q

Supervised Learning

A

The use of labelled datasets to train algorithms to classify data or predict outcomes accurately.

Example: Supply annotated images

39
Q

Reinforcement Learning

A

The next input depends on the output of the previous input.

Example: Interact with the environment to evaluate performance.

40
Q

Advantages and disadvantages of a structured Light camera (5)

A
  1. Cannot work in direct sunlight because the strong infrared light interferes with the low-intensity projected infra-red camera light.
  2. Cannot work closer than 0.5m because the projected pattern of dots becomes too close together.
  3. Doesn’t work further than 3.5m, project dots get too far apart and the intensity is too low.
  4. Motion blur occurs for fast motion because of low intensity
  5. Accuracy decreases with distance.
41
Q

Advantages and disadvantages of a Time-of-Flight camera (3)

A
  1. Cannot work in direct sunlight, infrared sunlight interferes with the low-intensity infrared camera light,
  2. Limited range due to low-intensity infrared light.
  3. Accuracy is independent of distance.
42
Q

Advantages and disadvantages of a Stereo Camera (9)

A
  1. Potential for highest resolution.
  2. Colour available for each pixel (as well as depth).
  3. Works well in direct sunlight.
  4. Accuracy depends on distance.
  5. Noisy depth values in low ambient light.
  6. Depth accuracy can be increased over long distances using a wider baseline.
  7. Cheap cameras (e.g. Webcams) need expensive calibration for useful depth accuracy.
  8. Works for motion (if well illuminated).
  9. Many gaps in-depth values in image regions without features (i.e. regions of uniform color/intensity). Depth accuracy can be increased using higher resolution cameras.
43
Q

Effect of varying Threshold on the Canny Detector algorithm

Hint: Hysteresis

A

Hysteresis requires two thresholds - high and low:

  1. Apply a high threshold to find genuine edges.
  2. Then while tracing an edge, apply a low threshold to trace faint sections of edges.

A threshold set too high can miss important information but a threshold set too low will falsely identify irrelevant information.

44
Q

How do you remove noise from a 3D point cloud using PCL (Point Cloud Library)?

Hint: Pass

A

Use statistical outlier removal (SOR) filter which consists of two passes:

  1. First pass: For each point, find the mean distance to k-neighbors. 
  2. Second pass: Remove outliers with high means.
45
Q

Three advantages of fiducial marker tracking over natural feature tracking

A

Fiducial marker advantages:

  1. Tracking is less computationally efficient.
  2. More accurate 6-degree of freedom pose.
  3. Usually requires no database to be stored.
46
Q

Three advantages of natural feature tracking over fiducial marker tracking

A

Natural feature advantages:

  1. Don’t need markers in this case.
  2. Natural feature targets catch the attention less.
  3. Natural feature targets work also if partially in view,
47
Q

Advantages of PyTorch (2)

A
  1. More flexible for experiments.

2. API is easier to use.

48
Q

Advantages of TensorFlow (2)

A
  1. Runs on more devices.

2. Larger user community and trained networks.

49
Q

Convex Hull

A
  • Convex hull follows the outline of an object except for concavities.
  • The number of regions between convex hull and object are characteristic of object shape.
50
Q

How do pixels in a camera differ from the photoreceptors in the human retina in terms of color space, distribution of color, sensitivity, and resolution?

A

Camera:
• Uses RGB colour space and has evenly distributed
CCD elements (25% red, 50% green, 25% blue) to
approximate equal sensitivity to red, green and blue.
• Lower dynamic range.
• Wider spectral resolution.
• Higher frame rate
• Potential for higher spatial resolution.

Human:
• Resembles CIE colour space.
• Photopic vision - red, green and blue cones
• Eye has the equivalent of a foveal 6.5Mpixel 3
colour camera with a narrow-angle lens combined with a peripheral sensitive 100Mpixel
monochrome camera with a wide-angle lens - but limited to a spatial resolution of only 1-
3cm at 20m.
• Cognitive vision processing in the brain limits the huge 108:1 dynamic range. Result; we can only distinguish approximately 100 colours and 16-32
shades B&W.

51
Q

Describe the CIE colourspace and explain its strengths, weaknesses, and applications.

A

CIE

Strengths: Colours are perceptually uniform, conceptually easier to mix colours in this space.

Weaknesses: Challenging to use for computer vision because CIE is based on human perception - some coordinates don’t represent real colours.

Applications: Colour temperature of lighting for photographers, subjectively comparing food
colours.

52
Q

Describe the RGB colourspace and explain its strengths, weaknesses, and applications.

A

RGB

Strengths: RGB is used to represent colour by media devices such as cameras and so is in the correct format for computer vision algorithms. Unit cube so all possible. RGB values are realisable which simplifies range checking of red, green and blue values.

Weaknesses: Not all colours are perceptually uniform → Thus it doesn’t make sense to
calculate colour differences in RGB, different RGB values are needed to produce the same
colour on different displays → Device specific

Applications: Computer graphics

53
Q

Describe the HSV colourspace and explain its strengths, weaknesses, and applications.

A

HSV

Strengths: HSV is more useful (than RGB) for analysing colour such as simplifying colour range checking. A simple transformation of RGB.

Weaknesses: Not perceptually uniform, device-specific.

Applications: Used by artists (in tools such as photoshop). Computer vision-based colour analysis.

54
Q

When segmenting a moving object from a static background: ‘“Background subtraction” is…

A

The first frame, or some derivative of it, is the reference frame.

55
Q

When segmenting a moving object from a static background: “Difference” is…

A

The difference between two adjacent frames

wherein this case, the previous frame is the reference frame.

56
Q

When segmenting a moving object from a static background: “Ghosting” refers to…

A

A second image of the moving object appearing as an artifact of a difference algorithm.

57
Q

When segmenting a moving object from a static background: “Foreground aperture” refers to…

A

A hole appearing in the moving object as an artefact of a difference algorithm.

58
Q

Laplacian of Gaussian filter

A

The Laplacian of Gaussian filter subtracts the low frequencies from the original image - leaving the high frequencies remaining as a sharpened image.

59
Q

Why does an image processed with a Laplacian of Gaussian filter sometimes appear to have more content?

A

Although there is less content in the sharpened image, the accentuated high-frequency edges give the illusion of more content because there appear to be more edges and human perception is sensitive to edges.

60
Q

Generalized Hough Transform - Advantages and Disadvantages

A

Advantages

  1. Computationally efficient for minimal parameters such as a straight line in a noise free image with clear edges.

Disadvantages

  1. Does not scale well to multi-object complex scenes.
  2. Also suffers from matching uncertainty with noisy data.
  3. Substantial computational and storage requirements become acute when object orientation and scale have to be considered in noisy complex scenes.
61
Q

The three main steps in tracking are 1. prediction, 2. data association, and 3. correction.

Briefly describe 1. prediction in the context of the Kalman filter.

A

Predict the state in the current frame based on the state in previous frames. Here the new state is predicted by multiplying the old state by a known
constant and then adding zero-mean noise. Therefore, the predicted mean for the new state is the constant times mean for the old state.

62
Q

The three main steps in tracking are 1. prediction, 2. data association, and 3. correction.

Briefly describe 2. data association in the context of the Kalman filter.

A

Calculate the state from the current frame considering kinematic models and error minimisation.

63
Q

The three main steps in tracking are 1. prediction, 2. data association, and 3. correction.

Briefly describe 3. correction in the context of the Kalman filter.

A

If the measurement error (Gaussian noise) is low, use the measured state from the current frame, otherwise use a higher weighting on the predicted state.

64
Q

Describe how we can obtain an improved “smoothed” estimate using a Kalman filter.

A
  1. Run two Kalman filters, one moving forward, the other backward in time.
  2. Now combine state estimates.
  3. The crucial point here is that we can obtain a smoothed estimate by viewing the backward filter’s prediction as yet another measurement for the forward filter.
65
Q

Describe two advantages of a Particle Filter (Condensation Algorithm) over a Kalman Filter.

A
  1. Predict multiple positions.

2. Multi-modal & non-Gaussian.

66
Q

Particle Filter for tracking

A

Also known as the condensation algorithm predicts multiple states/positions with non-gaussian distributions - multimodal.

67
Q

RANSAC Definition

A

RANSAC = Random Sample Consensus. It is a general-purpose framework for fitting a model to data that is contaminated with gross outliers.

68
Q

RANSAC Method

A
  1. Randomly choose a minimum number of data points needed to generate a model (a hypothesis set).
  2. Hypothesise: Compute a model from the hypothesis set. If it contains only inliers then the model will be approximately correct.
  3. Test: Count how many data points would be inliers if the model was correct until the model finds a large number of data points.

Example:

You want to find a straight line using only black pixels:

  1. Choose 2 at random, draw a line.
  2. Test = 3 compatible
  3. Repeat choosing another 2 at random draw line
  4. Test = 8 compatible, accept.
69
Q

RANSAC to find H

A
  1. Hypothesise: Choose random set of 4 feature matches.
  2. Compute H from these matches.
  3. Test: Count the number of feature matches where hx=x’ < threshold.
  4. Repeat until H is compatible with large number of feature matches.
  5. H -> R,t,n,d East to convert between H and the others.
70
Q

RANSAC to find E

A
  1. Hypothesis: Choose random set of 5 feature matches.
  2. Compute E from these matches.
  3. Test: Count number of feature matches where x’^TEx / (normalising constant) < threshold.
  4. Repeat until E is compatible with large number of feature matches.
    •5. Decompose e -> R, t
71
Q

Advantages and Disadvantages of LIDAR

A

Good Range (e.g. used for mapping ground from aircraft)
Accuracy is independent of distance
Works well in direct sunlight
Low resolution
Low frame rate
Has moving parts (e.g. motor rotating mirror)
Expensive

72
Q

Human Spectral Resolution

A

Spectral Resolution: 400 (violet) - 700 nm (red)

73
Q

Human Dynamic range

A

Dynamic range: approximately 108:1

74
Q

Human Spatial Resolution

A

Spatial Resolution: 1-3 cm @ 20 m

75
Q

Human Radiometric Resolution:

A

Radiometric Resolution: approx. 100 colours, and 16-32 shades B&W

76
Q

Name the three main stages of a convolutional neural network (in order) (IMCNLP)

A

Input Image -> Convolutional stage -> Non-linear stage -> Pooling stage

77
Q

Name and describe the gradient operators you would use to create such a set of eight orientations as shown in the diagram to the right and also describe how you would use them to create these orientations.

A

Four filters are needed to detect horizontal, vertical, and diagonal edges in the blurred image. The Robert’s cross operator (a) for diagonals and the Prewitt (b, d) or Sobel (c) for horizontal & vertical would enable a
set of eight orientations.

Alternatively, Prewitt or Sobel returns a value for the first derivative in the horizontal direction (Gy) and the vertical direction (Gx). From this, the edge gradient and direction can be determined.

For eight unique gradients, locate the vertical filters above and below the key-point, horizontal filters to the
left and right of the key-point and the diagonal filters top-left, top-right, bottom-left, bottom-right of the key point.