Computer Vision Flashcards

Question

Dilation effect

Answer 1

Enlarges region/blob, thickens lines, and fills small holes/gaps.

Answer 2

Erode then Dilate image.

Answer 3

Removes small details such as thick lines, spurs and noise. Smoothes jagged edges without changing the size of the original image.

Answer 4

Dilate then Erode image.

Answer 5

Closes/fills small gaps/holes and preserves thin lines without changing the size of the original object.

Answer 6

Relates the relative pose of 2 cameras viewing a planar scene. Estimate from feature correspondences using RANSAC.

Answer 7

Relates the relative pose of 2 cameras viewing a 3D scene. Estimates from feature correspondences using RANSAC.

Answer 8

- Initialize using RANSAC (for E). | - Estimates a set of 3D points and camera poses which minimizes reprojection error.

Answer 9

Detect whether an object is present or not in an image. (i.e. not detecting where it is, but just detecting if such an object exists anywhere in an image).

Answer 10

Detecting the location of an object in an image (returning regions-of-interest/bounding-box coordinates).

Answer 11

Label every pixel in an image as belonging to a class (e.g. grass pixels or sheep pixels).

Answer 12

Label segmented pixels for each instance of a class (such as recognise which general sheep pixels in an image belong to which individual sheep in a flock of sheep in an image).

Answer 13

Algorithms work on datasets that are unlabelled and find patterns which would previosuly not be known to us. Example: Graph cut

Answer 14

The use of labelled datasets to train algorithms to classify data or predict outcomes accurately. Example: Supply annotated images

Answer 15

The next input depends on the output of the previous input. Example: Interact with the environment to evaluate performance.

Answer 16

1. Cannot work in direct sunlight because the strong infrared light interferes with the low-intensity projected infra-red camera light. 2. Cannot work closer than 0.5m because the projected pattern of dots becomes too close together. 3. Doesn't work further than 3.5m, project dots get too far apart and the intensity is too low. 4. Motion blur occurs for fast motion because of low intensity 5. Accuracy decreases with distance.

Answer 17

1. Cannot work in direct sunlight, infrared sunlight interferes with the low-intensity infrared camera light, 2. Limited range due to low-intensity infrared light. 3. Accuracy is independent of distance.

Answer 18

1. Potential for highest resolution. 2. Colour available for each pixel (as well as depth). 3. Works well in direct sunlight. 4. Accuracy depends on distance. 5. Noisy depth values in low ambient light. 6. Depth accuracy can be increased over long distances using a wider baseline. 7. Cheap cameras (e.g. Webcams) need expensive calibration for useful depth accuracy. 8. Works for motion (if well illuminated). 9. Many gaps in-depth values in image regions without features (i.e. regions of uniform color/intensity). Depth accuracy can be increased using higher resolution cameras.

Answer 19

Hysteresis requires two thresholds - high and low: 1. Apply a high threshold to find genuine edges. 2. Then while tracing an edge, apply a low threshold to trace faint sections of edges. A threshold set too high can miss important information but a threshold set too low will falsely identify irrelevant information.

Answer 20

Use statistical outlier removal (SOR) filter which consists of two passes: 1. First pass: For each point, find the mean distance to k-neighbors. 2. Second pass: Remove outliers with high means.

Answer 21

Fiducial marker advantages: 1. Tracking is less computationally efficient. 2. More accurate 6-degree of freedom pose. 3. Usually requires no database to be stored.

Answer 22

Natural feature advantages: 1. Don't need markers in this case. 2. Natural feature targets catch the attention less. 3. Natural feature targets work also if partially in view,

Answer 23

1. More flexible for experiments. | 2. API is easier to use.

Answer 24

1. Runs on more devices. | 2. Larger user community and trained networks.

Answer 25

* Convex hull follows the outline of an object except for concavities. * The number of regions between convex hull and object are characteristic of object shape.

Answer 26

Camera: • Uses RGB colour space and has evenly distributed CCD elements (25% red, 50% green, 25% blue) to approximate equal sensitivity to red, green and blue. • Lower dynamic range. • Wider spectral resolution. • Higher frame rate • Potential for higher spatial resolution. Human: • Resembles CIE colour space. • Photopic vision - red, green and blue cones • Eye has the equivalent of a foveal 6.5Mpixel 3 colour camera with a narrow-angle lens combined with a peripheral sensitive 100Mpixel monochrome camera with a wide-angle lens - but limited to a spatial resolution of only 1- 3cm at 20m. • Cognitive vision processing in the brain limits the huge 108:1 dynamic range. Result; we can only distinguish approximately 100 colours and 16-32 shades B&W.

Answer 27

CIE Strengths: Colours are perceptually uniform, conceptually easier to mix colours in this space. Weaknesses: Challenging to use for computer vision because CIE is based on human perception - some coordinates don’t represent real colours. Applications: Colour temperature of lighting for photographers, subjectively comparing food colours.

Answer 28

RGB Strengths: RGB is used to represent colour by media devices such as cameras and so is in the correct format for computer vision algorithms. Unit cube so all possible. RGB values are realisable which simplifies range checking of red, green and blue values. Weaknesses: Not all colours are perceptually uniform → Thus it doesn’t make sense to calculate colour differences in RGB, different RGB values are needed to produce the same colour on different displays → Device specific Applications: Computer graphics

Answer 29

HSV Strengths: HSV is more useful (than RGB) for analysing colour such as simplifying colour range checking. A simple transformation of RGB. Weaknesses: Not perceptually uniform, device-specific. Applications: Used by artists (in tools such as photoshop). Computer vision-based colour analysis.

Answer 30

The first frame, or some derivative of it, is the reference frame.

Answer 31

The difference between two adjacent frames | wherein this case, the previous frame is the reference frame.

Answer 32

A second image of the moving object appearing as an artifact of a difference algorithm.

Answer 33

A hole appearing in the moving object as an artefact of a difference algorithm.

Answer 34

The Laplacian of Gaussian filter subtracts the low frequencies from the original image - leaving the high frequencies remaining as a sharpened image.

Answer 35

Although there is less content in the sharpened image, the accentuated high-frequency edges give the illusion of more content because there appear to be more edges and human perception is sensitive to edges.

Answer 36

Advantages 1. Computationally efficient for minimal parameters such as a straight line in a noise free image with clear edges. Disadvantages 1. Does not scale well to multi-object complex scenes. 2. Also suffers from matching uncertainty with noisy data. 3. Substantial computational and storage requirements become acute when object orientation and scale have to be considered in noisy complex scenes.

Answer 37

Predict the state in the current frame based on the state in previous frames. Here the new state is predicted by multiplying the old state by a known constant and then adding zero-mean noise. Therefore, the predicted mean for the new state is the constant times mean for the old state.

Answer 38

Calculate the state from the current frame considering kinematic models and error minimisation.

Answer 39

If the measurement error (Gaussian noise) is low, use the measured state from the current frame, otherwise use a higher weighting on the predicted state.

Answer 40

1. Run two Kalman filters, one moving forward, the other backward in time. 2. Now combine state estimates. 3. The crucial point here is that we can obtain a smoothed estimate by viewing the backward filter’s prediction as yet another measurement for the forward filter.

Answer 41

1. Predict multiple positions. | 2. Multi-modal & non-Gaussian.

Answer 42

Also known as the condensation algorithm predicts multiple states/positions with non-gaussian distributions - multimodal.

Answer 43

RANSAC = Random Sample Consensus. It is a general-purpose framework for fitting a model to data that is contaminated with gross outliers.

Answer 44

1. Randomly choose a minimum number of data points needed to generate a model (a hypothesis set). 2. Hypothesise: Compute a model from the hypothesis set. If it contains only inliers then the model will be approximately correct. 3. Test: Count how many data points would be inliers if the model was correct until the model finds a large number of data points. Example: You want to find a straight line using only black pixels: 1. Choose 2 at random, draw a line. 2. Test = 3 compatible 3. Repeat choosing another 2 at random draw line 4. Test = 8 compatible, accept.

Answer 45

1. Hypothesise: Choose random set of 4 feature matches. 2. Compute H from these matches. 3. Test: Count the number of feature matches where hx=x’ < threshold. 4. Repeat until H is compatible with large number of feature matches. 5. H -> R,t,n,d East to convert between H and the others.

Answer 46

1. Hypothesis: Choose random set of 5 feature matches. 2. Compute E from these matches. 3. Test: Count number of feature matches where x’^T*E*x / (normalising constant) < threshold. 4. Repeat until E is compatible with large number of feature matches. •5. Decompose e -> R, t

Answer 47

Good Range (e.g. used for mapping ground from aircraft) Accuracy is independent of distance Works well in direct sunlight Low resolution Low frame rate Has moving parts (e.g. motor rotating mirror) Expensive

Answer 48

Spectral Resolution: 400 (violet) - 700 nm (red)

Answer 49

Dynamic range: approximately 108:1

Answer 50

Spatial Resolution: 1-3 cm @ 20 m

Answer 51

Radiometric Resolution: approx. 100 colours, and 16-32 shades B&W

Answer 52

Input Image -> Convolutional stage -> Non-linear stage -> Pooling stage

Answer 53

Four filters are needed to detect horizontal, vertical, and diagonal edges in the blurred image. The Robert’s cross operator (a) for diagonals and the Prewitt (b, d) or Sobel (c) for horizontal & vertical would enable a set of eight orientations. Alternatively, Prewitt or Sobel returns a value for the first derivative in the horizontal direction (Gy) and the vertical direction (Gx). From this, the edge gradient and direction can be determined. For eight unique gradients, locate the vertical filters above and below the key-point, horizontal filters to the left and right of the key-point and the diagonal filters top-left, top-right, bottom-left, bottom-right of the key point.

Computer Vision Flashcards

(77 cards)