w9 gemini Flashcards

1
Q

What is the topic of today’s lecture, according to slide 1?

A

Object recognition

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the three levels of computer vision discussed in the recap on slide 1?

A

Low-level vision, Mid-level vision, High-level vision

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Name four topics covered under Mid-level vision in the recap (slide 1).

A

Segmentation and grouping, Correspondence problem, Stereo and Depth, Video and Motion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the three main aspects of object recognition, as defined on slide 2?

A

Identification, Categorisation, Localisation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Give three examples of methods for performing object recognition (slide 2).

A

Template matching, Sliding window, Edge matching

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the goal of object identification, according to slide 3?

A

To determine the identity of an individual instance of an object.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Give an example of object identification from slide 3.

A

Distinguishing between two specific individuals (Clinton vs. Bush) or two specific phone models (Samsung Galaxy On8 vs. iPhone 7 Plus).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the goal of object categorisation, as described on slide 4?

A

To determine the category of an object.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Provide an example of object categorisation from slide 4.

A

Classifying images as belonging to the category ‘Human’ or ‘Chimpanzee’, or ‘Telephone’ or ‘Calculator’.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is object localisation, according to slide 5?

A

Determining the presence and/or location of an object in an image.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is semantic segmentation, as defined on slide 6?

A

Localisation that is sufficiently fine-grained and for a sufficiently large number of categories, resulting in image segmentation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Explain the concept of a category hierarchy in object recognition (slide 7).

A

Classification can occur at different levels of abstraction, from general categories (like ‘object’) to specific instances (like ‘Rex’).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the three levels in the category hierarchy shown on slide 7?

A

Superordinate level, Basic level, Subordinate level

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Why is the ‘basic level’ significant for human object recognition (slide 8)?

A

Humans are usually fastest at recognizing category members at this level, start with basic-level categorization before identification, and it’s the first level understood by children.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

List two reasons why the basic level is considered special (slide 9).

A

It’s the highest level where category members share many common features, and the lowest level where members have features distinct from other categories at the same level.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the two main requirements for object recognition systems (slide 10)?

A

Sensitivity to image differences relevant to distinguishing objects, and insensitivity/tolerance to differences that don’t affect object identity or category.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Give examples of image variations that object recognition systems should be insensitive to (slides 11-13).

A

Background clutter, occlusion, viewpoint, lighting, non-rigid deformations, within-category variation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are the three main components required for object recognition (slide 14)?

A

Image data, representations of objects, matching techniques.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Describe the ‘off-line’ stage of object recognition procedure (slide 15).

A

Extracting representations from training examples.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Describe the ‘on-line’ stage of object recognition procedure (slide 15).

A

Extracting representation from an input image and matching it with training examples to determine the object class or identity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What are the two key aspects in which object recognition methods vary (slide 16)?

A

Representation used and matching procedure.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is the representation used in template matching (slide 17)?

A

An image of the object to be recognized (an array of pixel intensities).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Describe the matching process in template matching (slide 17).

A

Searching every image region and calculating the similarity between the template and the image region.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Name three similarity measures that can be maximised in template matching (slide 18).

A

Cross-correlation, Normalised cross-correlation (NCC), Correlation coefficient.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Name three similarity measures that can be minimised in template matching (slide 19).

A

Sum of Squared Differences (SSD), Euclidean distance, Sum of Absolute Differences (SAD).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

How can template matching be used to recognise multiple objects (slide 20)?

A

By using multiple templates, one for each object.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What is a potential problem with using SAD in template matching, as shown in the example on slide 21?

A

It can produce peaks in areas that are simply darker, not necessarily a true match.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What is a potential problem with template matching regarding ‘true’ and ‘false’ matches (slide 23)?

A

Distinguishing true matches from false matches, and deciding what constitutes a match and how many peaks to consider.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Why can template matching be ineffective if the target object is scaled or rotated (slide 25)?

A

Because the template needs to be very similar to the target object, and scaling or rotation introduces differences.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What is a common approach to address viewpoint and within-category variation in template matching (slide 26)?

A

Using multiple templates for each object, representing different viewpoints and variations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Explain the dilemma regarding the threshold in template matching (slide 27).

A

A high threshold avoids false matches but might miss true matches, while a low threshold finds true matches but increases false matches.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Why can template matching be computationally expensive (slide 28)?

A

Because of the need for many comparisons, especially when dealing with variations in appearance, viewpoints, and scales.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Why is template matching sensitive to occlusion (slide 29)?

A

Because if an object is occluded, the template may not fully match the visible parts.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

What is the fundamental issue that makes template matching not robust (slide 30)?

A

The metric used for comparison is fundamentally not robust to changes in appearance between the template and the image patch.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

How does ‘sliding window’ differ from template matching (slide 31)?

A

It uses a classifier to determine if an image patch contains the object, and considers patches of different shapes and sizes.

36
Q

How is computational cost addressed in sliding window techniques (slide 32)?

A

By pre-processing images to select regions that are good candidates to contain an object, often using image segmentation.

37
Q

How does ‘edge matching’ represent objects (slide 33)?

A

Templates and input images are pre-processed to extract edges.

38
Q

Describe the matching process in edge matching (slide 33).

A

Calculating the average of the minimum distances between points on the edge template and points on the edge image.

39
Q

What is the general idea behind ‘model-based’ object recognition (slide 34)?

A

Hypothesize object identity and pose, render the object in the image (‘back-project’), and compare it to the actual image.

40
Q

What are two methods for comparing the back-projected model with the image in model-based recognition (slide 36)?

A

Edge score and Oriented edge score.

41
Q

What is the representation used in ‘intensity histograms’ for object recognition (slide 37)?

A

A histogram of pixel intensity values (either grayscale or colour).

42
Q

What is a benefit of using intensity histograms (slide 37)?

A

Insensitivity to small viewpoint changes.

43
Q

What are two drawbacks of using intensity histograms (slide 37)?

A

Sensitivity to illumination and intra-class appearance variation, and insensitivity to different spatial configurations.

44
Q

Why are colour histograms sometimes used in face detection (slide 39)?

A

Because skin has a very small range of intensity-independent colours.

45
Q

What are the two components of the ‘Implicit Shape Model’ (ISM) representation (slide 40)?

A

Parts (2D image fragments) and structure (configuration of parts).

46
Q

Describe the process of extracting local object features in ISM (slide 41).

A

Locating interest points and extracting 2D image patches around them.

47
Q

What is an ‘appearance codebook’ in ISM (slide 42)?

A

A collection of clustered patches, where cluster centers are stored.

48
Q

How is the configuration of parts learned in ISM (slide 43)?

A

By matching codebook features to training images and recording possible object centres for every codebook entry.

49
Q

Explain the matching procedure in ISM (slide 44).

A

Each feature votes for possible object centres, similar to the Generalized Hough Transform.

50
Q

What is the representation used in ‘feature-based object recognition’ (slide 52)?

A

Training image content is transformed into local features invariant to translation, rotation, and scale.

51
Q

How does ‘feature-based object recognition’ perform matching (slide 53)?

A

Local features are extracted from a new image and matched to those from the training image.

52
Q

What are two requirements for feature detection in feature-based recognition (slide 55)?

A

Repeatability despite variations and finding sufficient features to cover the object.

53
Q

What property should the feature description have in feature-based recognition (slide 55)?

A

Invariance to translation, rotation, scale changes, and lighting variations.

54
Q

What is the representation used in SIFT feature matching (slide 56)?

A

A 128 element histogram of the orientations of the intensity gradients.

55
Q

List four advantages of SIFT features (slide 56).

A

Locality, Distinctiveness, Quantity, Efficiency.

56
Q

Describe the representation used in SIFT feature matching in more detail (slide 57).

A

A set of keypoints with 128-component descriptors obtained from each training image, stored in a database.

57
Q

Describe the matching process in SIFT feature matching (slide 58).

A

Finding the top 2 best matching descriptors in the training database for each keypoint in the input image.

58
Q

What is the criteria for accepting a match in SIFT feature matching (slide 58)?

A

The ratio of the distance to the first nearest descriptor to that of the second is below a threshold.

59
Q

What is used to confirm the consistency of matched locations in SIFT feature matching (slide 61)?

A

RANSAC or Generalised Hough Transform.

60
Q

What is the analogy used to explain ‘Bag-of-words’ (slide 62)?

A

Document recognition.

61
Q

Briefly describe how ‘Bag-of-words’ works for documents (slide 63).

A

Documents are parsed into words, common words are ignored, words are stemmed, assigned identifiers, and documents are represented by word frequency vectors.

62
Q

How is matching done in the ‘Bag-of-words’ model for documents (slide 64)?

A

By calculating the angle between the query and document vectors.

63
Q

Explain the core idea behind ‘Bag-of-words’ for images (slide 65).

A

Different objects have distinct sets of features that occur in different frequencies.

64
Q

Name three ways features can be chosen in the ‘Bag-of-words’ model for images (slide 66).

A

Regular grid, Interest point detector, Randomly.

65
Q

How are features encoded in the ‘Bag-of-words’ model for images (slide 67)?

A

Using a descriptor (analagous to a word).

66
Q

Describe the process of creating a dictionary in the ‘Bag-of-words’ model (slides 68-69).

A

Encoding many features from many images and clustering them into visual words, forming a visual vocabulary.

67
Q

How are images represented in the ‘Bag-of-words’ model (slide 71)?

A

As a histogram showing the frequency of appearance of each codeword in the dictionary.

68
Q

How is matching performed in the ‘Bag-of-words’ model for images (slide 71)?

A

By calculating the distance between the histograms of the input and training images.

69
Q

What is a ‘geometric invariant’ (slide 72)?

A

A property of an object in the scene that does not vary with viewpoint.

70
Q

Give three examples of invariant properties under Euclidean transformations (slide 72).

A

Lengths, angles, areas.

71
Q

Give two examples of invariant properties under Similarity transformations (slide 73).

A

Ratios of lengths, angles.

72
Q

Give three examples of invariant properties under Affine transformations (slide 74).

A

Parallelism, ratios of lengths along lines, ratio of areas.

73
Q

What is the invariant property under Projective transformations (slide 75)?

A

Cross-ratio.

74
Q

Define ‘cross-ratio’ (slide 75).

A

The ratio of ratios of lengths on a line.

75
Q

What is the matching process in ‘geometric invariants’ (slide 77)?

A

Comparing the value of the cross-ratio measured in the image with a database of cross-ratios measured in training images.

76
Q

Name five object recognition methods mentioned in the summary (slide 78).

A

Template matching, sliding window, edge matching, model-based, intensity histograms.

77
Q

What are the two main categories of matching procedures for object recognition (slide 79)?

A

Top-down (generative) and Bottom-up (discriminative).

78
Q

Give an example of a top-down object recognition method (slide 79).

A

Model-based.

79
Q

Give an example of a bottom-up object recognition method (slide 79).

A

SIFT.

80
Q

List three categories for classifying object recognition methods based on representation used (slide 80).

A

Pixel intensities vs feature vectors vs geometry, 2D (image-based) vs 3D (object-based), local features vs global features.

81
Q

What are the advantages of using a local representation (slide 81)?

A

Tolerant to viewpoint, within class variation, and occlusion.

82
Q

What is a problem with using local representations (slide 81)?

A

Many objects can consist of the same collection of features and hence cannot be distinguished.

83
Q

What is an advantage of using a global representation (slide 82)?

A

Can distinguish similar objects.

84
Q

What are two problems with using global representations (slide 82)?

A

Sensitive to viewpoint and within-class variation, and sensitive to occlusion.

85
Q

What are two solutions to address the limitations of local and global representations (slide 83)?

A

Use features of intermediate complexity, and use a hierarchy of features with a range of complexities.