w10 gemini Flashcards

1
Q

What is the difference between the “viewer-centred” and “object-centred” approach to object recognition?

A

In the viewer-centred approach, the 3D object is modeled as a set of 2D images, showing different views of the object. In the object-centred approach, a single 3D model is used to describe the object.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a geon?

A

Geons are simple three-dimensional shapes such as spheres, cubes, cylinders, cones, or wedges.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a structural description?

A

A structural description is a representation of an object in terms of its component geons and their relative locations and sizes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Explain the nearest mean classifier method.

A

The nearest mean classifier calculates the mean of the feature vectors for all the training examples in each class (the prototype). For a new object, it finds the closest class prototype (using Euclidean distance) and assigns the new object to that class label.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Explain the nearest neighbour classifier method.

A

The nearest neighbour classifier saves the feature vectors for all the training examples. For a new object, it finds the closest training example (using Euclidean distance) and assigns the new object to the same class label as that closest example.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Explain the k-nearest neighbour classifier method with k=3.

A

The k-nearest neighbour classifier with k=3 saves the feature vectors for all the training examples. For a new object, it finds the 3 closest training examples (using Euclidean distance). The new object is assigned to the class label that is the majority among these 3 nearest neighbours.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Write down Bayes’ theorem.

A

p(H|E) = (p(E|H) * p(H)) / p(E)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Explain the interpretation of p(H|E) in Bayes’ theorem in relation to a computer vision system.

A

p(H|E) is the posterior probability that a hypothesis H (e.g., the object is a chair) is true, given the image evidence E. This is what the vision system needs to evaluate to determine the most likely explanation for the image data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Explain the interpretation of p(E|H) in Bayes’ theorem in relation to a computer vision system.

A

p(E|H) is the likelihood that if hypothesis H were true, the image would contain particular evidence E. This is based on our understanding of how images are formed, such as how surface properties and lighting create certain images.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Explain the interpretation of p(H) in Bayes’ theorem in relation to a computer vision system.

A

p(H) is the prior probability, representing our initial assumptions about the likelihood of a hypothesis being true before seeing any evidence. If a hypothesis is initially improbable, stronger evidence is needed to support it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Explain the interpretation of p(E) in Bayes’ theorem in relation to a computer vision system.

A

p(E) is the probability of observing the evidence E regardless of whether the hypothesis H is true. If the evidence is very common, it reduces our confidence in inferring a specific hypothesis based on that evidence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

In the production line problem, what is the probability of objA?

A

p(objA) = 0.75

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

In the production line problem, what is the probability of objB?

A

p(objB) = 0.25

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

In the production line problem, what is the probability of an indistinguishable image given objA at oriA?

A

p(I|objA) = 0.1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

In the production line problem, what is the probability of an indistinguishable image given objB at oriB?

A

p(I|objB) = 0.2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Using Bayes’ theorem, if an indistinguishable image is observed, what is the probability it is objA at oriA?

A

p(objA|I) = (p(I|objA) * p(objA)) / p(I) = k * (0.1 * 0.75) = 0.075k

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Using Bayes’ theorem, if an indistinguishable image is observed, what is the probability it is objB at oriB?

A

p(objB|I) = (p(I|objB) * p(objB)) / p(I) = k * (0.2 * 0.25) = 0.05k

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

In the production line problem, if an indistinguishable image is observed, which bin should the robot sort the object into to minimize errors?

A

The robot should sort the object into the bin for objA because p(objA|I) > p(objB|I), meaning it’s more likely to be objA.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the viewer-centred approach to object recognition?

A

The 3D object is modelled as a set of 2D images, showing different views of the object. Recognition occurs by matching the current view to a stored view.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is the object-centred approach to object recognition?

A

A single 3D model is used to describe the object. Recognition involves decomposing the viewed object into its components (geons) and matching their arrangement to stored models.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What are geons designed to be?

A

Geons are designed to be sufficiently different from each other to be easily discriminated, robust to noise (identifiable even with missing parts), and view-invariant (look similar from most viewpoints).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What are the problems with the object-based recognition by components theory?

A

It can be difficult to decompose an image into geons, it’s difficult to represent many natural objects using geons, and it cannot detect finer details necessary for identifying individuals or similar objects.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is the image-based approach to object recognition?

A

Each object is represented by storing multiple 2D views (images). Object recognition occurs when a current pattern matches a stored pattern.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is template matching in the context of image-based recognition?

A

An early form of image-based approach where the current view is directly compared to stored templates. It’s considered too rigid to account for the flexibility of human object recognition.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What is the multiple views approach in the context of image-based recognition?

A

A more recent image-based approach where multiple views of objects are stored. Recognition occurs by matching the current view, and interpolation between stored views allows recognition from novel viewpoints.

26
Q

What are configural theories of object recognition?

A

Configural theories emphasize the global shape and relationships between features in object recognition.

27
Q

What are featural theories of object recognition?

A

Featural theories emphasize the individual features of an object in recognition.

28
Q

How do inverted faces relate to configural and featural processing?

A

Inverted faces are processed featurally, meaning individual features are processed independently, and the relationships between them are ignored. Upright faces are processed configurally, or holistically.

29
Q

What are the three main types of theories for how we categorize objects?

A

Rules, prototypes, and exemplars.

30
Q

How does the ‘rules’ theory explain object categorization?

A

Category membership is defined by abstract rules. Anything that satisfies the rule(s) for the category belongs to that category.

31
Q

What are some arguments for and against the ‘rules’ theory?

A

For: explains over-extension of grammar rules. Against: some members are better examples of a category (graded membership).

32
Q

How does the ‘prototypes’ theory explain object categorization?

A

We calculate the average (or prototype) of all individual instances from each category. A new stimulus is compared to these stored prototypes and assigned to the category of the nearest one.

33
Q

What are some arguments for and against the ‘prototypes’ theory?

A

For: explains why prototypical category members are accessed more quickly. Against: variations within a class cannot be represented.

34
Q

How does the ‘exemplars’ theory explain object categorization?

A

Specific individual instances of each category (‘exemplars’) are stored in memory. A new stimulus is compared to these stored exemplars and assigned to the category of the nearest one.

35
Q

What are some arguments for and against the ‘exemplars’ theory?

A

For: explains some kinds of mis-categorizations. Against: struggles with the concept of graded membership (some members being better examples).

36
Q

What is supervised learning in the context of classification?

A

Learning where the class for each data point in the training set is known. New data points are assigned to appropriate classes based on similarity to these labelled training examples.

37
Q

What is unsupervised learning in the context of classification?

A

Learning where the class for each data point is unknown. All data points are assigned to appropriate classes based on similarity without pre-existing labels.

38
Q

What is the nearest mean classifier (prototype) and when is it suitable?

A

It calculates the mean feature vector for each class and assigns new data to the class with the nearest mean. It’s suitable only if the data is linearly separable.

39
Q

What is the nearest neighbour classifier (exemplar) and when is it suitable?

A

It assigns new data to the class of the nearest training example. It’s suitable if the data is non-linearly separable.

40
Q

What is the k-nearest neighbours classifier (exemplar) and when is it suitable?

A

It assigns new data to the class that is the majority among its k nearest training examples. It’s suitable if the data is non-linearly separable.

41
Q

What are some common similarity measures used in classification?

A

Sum of Squared Differences (SSD), Euclidean distance, Sum of Absolute Differences (SAD) (Manhattan distance), Cross-correlation, Normalised cross-correlation, Correlation coefficient.

42
Q

Describe the ‘What’ and ‘Where’ pathways in the cortical visual system.

A

The ‘What’ pathway (ventral stream) goes from V1 to the inferotemporal cortex and is involved in object identity and category information. The ‘Where’ pathway (dorsal stream) goes from V1 to the parietal cortex and is involved in spatial and motion information.

43
Q

How are receptive fields organized hierarchically in the visual cortex?

A

As you progress along a pathway, neurons’ preferred stimuli get more complex, receptive fields become larger, and there is greater invariance to location.

44
Q

Give examples of receptive fields at different stages (Eye to V1).

A

Eye -> LGN: Centre-surround Cells. LGN -> V1: Simple Cells (respond to edges/bars at a specific orientation and location). Simple Cells -> Complex Cells (respond to edges/bars of a specific orientation within a small region).

45
Q

What is the trend of receptive fields along the ventral pathway?

A

Receptive fields become larger, have higher complexity, and higher invariance to location.

46
Q

What are feedforward models of cortical hierarchy?

A

Models that propose a purely serial, feedforward sequence of cortical information processing, like HMAX and CNN.

47
Q

Explain the HMAX model.

A

HMAX is a feedforward model that uses alternating layers of simple (S) and complex (C) cells to increase selectivity and invariance of receptive fields. S-cells respond to conjunctions, and C-cells respond to any input in a small neighbourhood.

48
Q

How do S-cells and C-cells operate in HMAX?

A

S-cells perform a summation (‘and’-like operation) increasing selectivity. C-cells perform a max operation (‘or’-like operation) increasing invariance.

49
Q

How can HMAX be seen as hierarchical template matching?

A

Features at one stage are built from features at earlier stages, forming increasingly complex templates.

50
Q

What is a Convolutional Neural Network (CNN)?

A

A hierarchical model similar to HMAX that uses standard image processing techniques: convolution and sub-sampling.

51
Q

How do convolution and sub-sampling relate to HMAX?

A

Convolution in CNNs is equivalent to the function of S-layers in HMAX (responding to conjunctions). Sub-sampling in CNNs is equivalent to the function of C-layers in HMAX (responding to any input in a small neighbourhood).

52
Q

What are recurrent models of cortical hierarchy?

A

Models that incorporate feedback connections and lateral connections within cortical regions, allowing for interaction and combination of bottom-up and top-down information.

53
Q

What are the two types of recurrent connections in the cortex?

A

(1) Lateral connections within a region enabling interaction between neurons in the same population. (2) Feedback connections conveying information from higher cortical regions to primary sensory areas.

54
Q

What are bottom-up processes in perception?

A

Using the information in the stimulus itself to aid in identification. They are stimulus-driven and discriminative.

55
Q

What are top-down processes in perception?

A

Using context, previous knowledge, and expectation to aid in identification. They are knowledge-driven and generative.

56
Q

How does Bayesian inference relate to bottom-up and top-down information?

A

Bayes’ Theorem describes an optimum method of combining bottom-up (likelihood) and top-down (prior) information to form a posterior probability.

57
Q

Explain Bayes’ Theorem using the terms posterior, likelihood, prior, and evidence.

A

Posterior (p(object|image)) is what we want to know: the probability of a particular object being present given the image. Likelihood (p(image|object)) is what we can calculate: the probability of the particular image being a projection of the particular object. Prior (p(object)) is what we know from prior experience: the probability that the particular object will be present in the environment. Evidence (p(image)) is the probability of observing the image, which can often be ignored as it’s the same for all possible interpretations.

58
Q

Why is vision considered an inverse problem?

A

Because we know the pixel intensities (outcomes) and want to infer the causes (objects in the scene).

59
Q

Why is vision considered ill-posed?

A

Because there are usually multiple solutions (multiple causes that could give rise to the same outcomes).

60
Q

How does the visual system compensate for the ill-posed nature of vision?

A

By using assumptions, constraints, or priors about the nature of the physical world.

61
Q

Give examples of priors used in Bayesian inference in vision.

A

Texture is circular and homogenous, light comes from above, faces are convex, size is constant, neighbouring features are related, similar features are related, connected features are related, strings of letters form words, knowledge about image content.

62
Q

What is the difference between discriminative and generative methods in the context of Bayesian inference?

A

Discriminative methods model the posterior probability directly. Generative methods model the likelihood and the prior probability.