w9 gemini Flashcards

Question

Name three similarity measures that can be minimised in template matching (slide 19).

Answer 1

Sum of Squared Differences (SSD), Euclidean distance, Sum of Absolute Differences (SAD).

Answer 2

By using multiple templates, one for each object.

Answer 3

It can produce peaks in areas that are simply darker, not necessarily a true match.

Answer 4

Distinguishing true matches from false matches, and deciding what constitutes a match and how many peaks to consider.

Answer 5

Because the template needs to be very similar to the target object, and scaling or rotation introduces differences.

Answer 6

Using multiple templates for each object, representing different viewpoints and variations.

Answer 7

A high threshold avoids false matches but might miss true matches, while a low threshold finds true matches but increases false matches.

Answer 8

Because of the need for many comparisons, especially when dealing with variations in appearance, viewpoints, and scales.

Answer 9

Because if an object is occluded, the template may not fully match the visible parts.

Answer 10

The metric used for comparison is fundamentally not robust to changes in appearance between the template and the image patch.

Answer 11

It uses a classifier to determine if an image patch contains the object, and considers patches of different shapes and sizes.

Answer 12

By pre-processing images to select regions that are good candidates to contain an object, often using image segmentation.

Answer 13

Templates and input images are pre-processed to extract edges.

Answer 14

Calculating the average of the minimum distances between points on the edge template and points on the edge image.

Answer 15

Hypothesize object identity and pose, render the object in the image ('back-project'), and compare it to the actual image.

Answer 16

Edge score and Oriented edge score.

Answer 17

A histogram of pixel intensity values (either grayscale or colour).

Answer 18

Insensitivity to small viewpoint changes.

Answer 19

Sensitivity to illumination and intra-class appearance variation, and insensitivity to different spatial configurations.

Answer 20

Because skin has a very small range of intensity-independent colours.

Answer 21

Parts (2D image fragments) and structure (configuration of parts).

Answer 22

Locating interest points and extracting 2D image patches around them.

Answer 23

A collection of clustered patches, where cluster centers are stored.

Answer 24

By matching codebook features to training images and recording possible object centres for every codebook entry.

Answer 25

Each feature votes for possible object centres, similar to the Generalized Hough Transform.

Answer 26

Training image content is transformed into local features invariant to translation, rotation, and scale.

Answer 27

Local features are extracted from a new image and matched to those from the training image.

Answer 28

Repeatability despite variations and finding sufficient features to cover the object.

Answer 29

Invariance to translation, rotation, scale changes, and lighting variations.

Answer 30

A 128 element histogram of the orientations of the intensity gradients.

Answer 31

Locality, Distinctiveness, Quantity, Efficiency.

Answer 32

A set of keypoints with 128-component descriptors obtained from each training image, stored in a database.

Answer 33

Finding the top 2 best matching descriptors in the training database for each keypoint in the input image.

Answer 34

The ratio of the distance to the first nearest descriptor to that of the second is below a threshold.

Answer 35

RANSAC or Generalised Hough Transform.

Answer 36

Document recognition.

Answer 37

Documents are parsed into words, common words are ignored, words are stemmed, assigned identifiers, and documents are represented by word frequency vectors.

Answer 38

By calculating the angle between the query and document vectors.

Answer 39

Different objects have distinct sets of features that occur in different frequencies.

Answer 40

Regular grid, Interest point detector, Randomly.

Answer 41

Using a descriptor (analagous to a word).

Answer 42

Encoding many features from many images and clustering them into visual words, forming a visual vocabulary.

Answer 43

As a histogram showing the frequency of appearance of each codeword in the dictionary.

Answer 44

By calculating the distance between the histograms of the input and training images.

Answer 45

A property of an object in the scene that does not vary with viewpoint.

Answer 46

Lengths, angles, areas.

Answer 47

Ratios of lengths, angles.

Answer 48

Parallelism, ratios of lengths along lines, ratio of areas.

Answer 49

Cross-ratio.

Answer 50

The ratio of ratios of lengths on a line.

Answer 51

Comparing the value of the cross-ratio measured in the image with a database of cross-ratios measured in training images.

Answer 52

Template matching, sliding window, edge matching, model-based, intensity histograms.

Answer 53

Top-down (generative) and Bottom-up (discriminative).

Answer 54

Model-based.

Answer 55

Pixel intensities vs feature vectors vs geometry, 2D (image-based) vs 3D (object-based), local features vs global features.

Answer 56

Tolerant to viewpoint, within class variation, and occlusion.

Answer 57

Many objects can consist of the same collection of features and hence cannot be distinguished.

Answer 58

Can distinguish similar objects.

Answer 59

Sensitive to viewpoint and within-class variation, and sensitive to occlusion.

Answer 60

Use features of intermediate complexity, and use a hierarchy of features with a range of complexities.

w9 gemini Flashcards

(85 cards)