Objects--Slides Flashcards
The eye encounters an enourmous variety of complex visual images. Yet, the brain somehow manages to correctly interpret almost every visual image it receives. It is able to correctly identify objects, materials, and surface shapes, as well as shadows and other lighting effects. It does this effortlessly for scenes never encountered before.

Correctly interpreting visual images involvses dealing with one ore more of this difficult general problems (plus finamental biological constraings).

Context Problem
Underlying oure remarkable ability to interpret images is a set of sophisticated grouping mechanisms, that try to link together image features that arise from the same physical source. To understand why grouping mechanisms are essential, it is useful to consider the context problem the visual brain must over come.
Objects often appear in a complex and varying context of other objects.
What is the solution to context problems?
First measure attributes of hte images in small regions.
Second, combine the small regions into wholes using rules that are related to the physical laws and statistical facts of nature and to past experiences.

Erasing the bottom half of the abstract object reveals the recognizable object. Why are we unable to recognize the square-root 16? the context of other contours causes us to group the countour elements in a way that prevents the square-root of 16 from making contact with our stored memorey for that object. The implications is that proper grouping is essential for project recognition.


Any way of marking word boundaries helps the recognition process.

Screech Owls
Screech owls have feathers that match the features (brak) of oak tress (the trees they frequently inhabit). If their features ccannot be grouped seprately from the background, they are more difficult to recognize.

Other examples of camoflauge.
Proximity
Gestalt grouping principle.
Objects that are nearby tend to be grouped together.
Similarity:
Gestalt grouping principle.
Objects that are similar tend to be grouped together.
Good Continuation
countour elements that are consistent with a smooth contour tend to be grouped together.
Gestalt grouping Principle
Closure
Gestalt Grouping Principle
Countours that are consisted with a closed form tend to be grouped together.

Gestalt grouping principles: “proximity” (position) and various possible dimensions of similairity.

The Gestalt principle of “good continuation” contour elements that are consistent with a smooth curve tend to be grouped together.
Good Continuation…


Because of good continuation, the two straight line segments in A tned to look like a pair of crossing “sticks” Because of the closure of the same two line segments tend to be split at the middle to become parts of two “butterfly wings”
Example of using simple feature dection followed by a gestalt grouping rules to gorm groups in a novel texture pattern.
Left to right: input image –> perform feature/primitive detection like V1 –> use cood-continuation, proximity, etc. to form small groups –> use shape similarity (comparing or matching shapes) to form larger groups from small groups.


Perceptual grouping also makes use of principles that are based upon the three-dimensional properties of the environment. For example, these line segments are grouped into two boxes and a cylinder. Object corners occluding a background object tend to form an “L junction” or an “Arrow Junction.” Object corners that do not occlude a background object tend to form a “Y junction.” Occluded contours of an object tend to form “T junctions” with the contours of the occluding object. These principles, plus the Gestalt principles, are used to group features into wholes that are likely to correspond to physically sepearate objects.

The result of appropriate (typical) grouping.

A result of inappropriate grouping.
Why do we have these specific grouping principles?
How does the brain implement these principles?
Are there other grouping principles the brain uses?

To measure statistics of natural contours, we analyzed 20 representative natural images (close-ups, distant shots, forests, mountains, ocean, sky, water, fields, animals). each image was analyzed separately and the statistics combined.

The zoomed in patch is in the middle column. Each arrow at the bottom shows a contour element detected using synthetic neurons with receptive fields like those in primary visual cortex (V1) From all the arrows in all the images once can measure the statistical properties of countours in the environment. From these statistical properties we can predict how people should group contour elements detected in the primary visual cortex. That is we can figure out the proper good continuation principle then test if people use it.

For example, these statistical measurements tell us that two of these elements are likely to belong to the same contour, but the third is not. If human contour grouping mechanisms accurately incorporate these statistical facts of the world, then we should be able to predict human ability to detect the contours from the statistics of natural images.

An experiment to measure human ability to detect contours where the only information available in the stimuls for perfomring the task is the geometry of the edge elements.

Examples of the different dimensions of contour shapes that were tested in the experiment in previus slide. In each case, the contour was embedded n a dense background of random contour elements.

The connected contours on the right are the contour groups obtained using the statistics in natural images. Notice how they roughly match the contours you see on the left side. In the forced choice experiment, we predict that humans pick the interval with thel ongest group of edge elements.

There is a high correlation between human ability to detect contours and the model (hypothesis_ based directly upon the statistics of contour geometry in natural images. The neural circuits in the brain that perform contour grouping are unknown at this time (although there are some hints from neurophysiological data)
Why do we have these specific grouping principles?
They are rational principles based upon the statistical properties of the natural environment.
How does the brain implement these principles?
It is unknown at this time, but mechanismsm consistent with natural scene statistics are plausible.
Are there other grouping principes the brain uses?
It is very likely and the analysys of natural scene statistics will help us find them.

A room full of chairs. The figure illustrates the viewpoint and category complexisty problems (as well as the context problem). There are examples of the same chair seen from many different viewpoints. There are also different shaped objects that are chairs.
Template matching
Store a seprate template for each possible viewpoint of each object within a category. To recognize, find the template that best matches the input group of image features.
Structural description
Store is a set of shape components and their geometrical relationships for each category of o objects. To recognize, find the components that roughly describe the imput group of image features, then find the category of objects whose stored components match those input components.
View base recognition (template interpolation)
Store a modest number of template for each object category. These templates correspond to typical (cannonical) viewpoints and typical examples of each object category. To recognize, interpolate between templates within each category to fidn the category that best matches int input group of featues.
Template Matching

Simple template matching is not plausible because of the number of stored templates that would be required. Thik of all the templates that would be required just for the letter A.

Example of structural description theory. Biederman’s “recognition by components” model. The geons are simple 3D components; shown here are 5 of the 36 proposed by biederman. Many common objects can be approximately described with this small collection of geons. In other words, geons are a propsed alphabet for objects. The particular geons Biederman propses are selected because they can generally be identified independent of viewpoint and because if some part of one is obscured, it can still be identified.

Recognition by components (in its simplest form) is viewpoint invariant as long as all geons (components) are visible, but for humans viewpoint often matters, at least to some extend.


The two faces at the top are stored views of the same face. These faces on the borrom are examples of the many views that can be simulated by weighted combination of the two stored faces. In other words, from two stored templates a very large number of templates can be generated and then compared with an input stimulus.

There is evidence that the main neural pathway for object recognition is the ventral (temporal) pathway, also known as the “what pathway”

The inferotemporal cortex (IT) has been implicated in complex object perception including face perception. It consists of AIT, CIT and PIT.

Respones of “face cells” in monkey infertemporal cortex (IT). They constitute only a small proporation of the cells. there are other cells that seem to respond to particular types of objects.

Fusiform face area (FFA) identified in FMRI studies of humans. Notice that it is located in the inferotemporal cortex (IT) as it is in the monkey.

Artifical objects (greebles)

Gauthier, Tarr and collegaues showed that as humans become expert in identifying greebles, the face are in IT becomes more acrtive. This suggests that there may be special circuits in the area for recognizing highly familiar and complext categories of objects.