Object Recognition Flashcards
definition of object recognition
the process of matching a representation of a viewed object to a representation stored in memory
not trivial because:
- objects vary hugely in terms of image they project on our retina, due to changes in viewpoint, posture, illumination
- object constancy
recognition by template matching
- perceived stimuli are compared to templates stored in memory
- if they match, the stimulus is recognised
problem:
would need a template for each variation in stimulus
computational approach (david marr)
as a problem of information processing: how is a shape represented in a way that is useful for recognition of 3D objects?
attempts to define: 1) the problem 2) steps needed to solve it
3 steps of computational approach
- primal sketch
- primitive but rich description of the way in which light information changes across visual field
- captures the important underlying structure of image, visible edges and vertices
- not every luminance change represents an edge, not all edges associated with luminance change - 2.5D sketch
- recovery of surfaces, depth information
- from the point of view of observer
- doesn’t generalise to other viewpoints (viewer-centred) - 3D model representation
- recovery of volumetric parts that make up the object
- conceptualised as generalised cones, hierarchically organised along principal axes of the object
- independent of pov of observer (object-centred)
evaluation of computational approach
- highly influential, first attempt to lay down in a formal way, the steps necessary for recognition
- didn’t attempt to work out how these steps would be implemented in a neural architecture: computational rather than algorithmic level
- difficult to find evidence that 3D models of the type described by Marr really exist
recognition by components (biederman)
all objects made up of volumetric components (geons)
36 geons are sufficient to describe all objects
geon structural descriptions: objects are represented by specific geons and the relations between them
edge extraction –> parsing regions of concavity + detection of non-accidental properties –> determination of components –> matching components to object representations
parsing: sensitive to regions of concavity and spend more time looking at these regions
non-accidental properties: tolerant to changes in viewpoint, occlusion, noises
-curvature, parallel edges, co-termination of edges, symmetry, collinearity
biederman & cooper study
priming effect for complementary image, non-visual priming effect for different exemplar
The priming effect cannot be based on low level image features (specific edges and vertices), as these differed in the complementary images.
–> Has to be object components, or full object.
The priming effect is mediated by object components.
–> Less priming when the complement image shows different components.
evaluation of biederman’s theory
at the algorithmic level – i.e., how the
recognition steps could be implemented in a neural architecture.
There is evidence that people are sensitive to geon decomposition.
– i.e., objects are recognised from their constituent parts.
Categorical relations between parts are important, but the exact metric relation between them is not so critical for object recognition (but it is for face recognition).
issue of viewpoint
both marr and biederman predict viewpoint-invariant object recognition
marr: recognition relies on 3D model, object-centred
biederman: recovery of geons can be achieved from many different viewpoints due to their non-accidental properties
View-based theories of object recognition
objects recognised on the basis of stored views that we have encountered through experience
new views have to be matched to stored views
requires some form of transformation to align and match views: mental rotation, back projection
mental rotation
the act of imagining something turning around through space
- example of mental imagery
- ability to actively manipulate imagery
usually measured by asking people to discriminate between objects and their mirror-images presented in different orientations
shepard & metzler: mental rotation mimics real rotation through space
people take progressively longer to do the task as the objects are rotated
tarr & pinker (1989) study
takes longer to recognise as degrees from upright oreintation increases
RT decreases with practice, but increases again with surprise items
jolicoeur & milliken studies
Observers take longer to name rotated objects.
• RT is (somewhat) proportional to degree of misorientation
• The effects of viewpoint decrease with familiarity with the rotated views
multiple-views-plus-transformation
Through experience, we build up a network of views of an object
– Multiple-views-plus-transformation
Eventually this could lead to completely invariant object recognition
Nevertheless, according to view-based theories, the representations used for
recognition are viewer-centred and dependent on image features.
Object recognition is not achieved through mental
rotation
1) mental rotation vs object recognition task
Viewpoint effects were associated with increased
activity in the superior
parietal lobe vs middle and inferior temporal lobe
2) patient with damage to
the right basal ganglia who
– was impaired on a variety of tasks that required mental rotation
– but had intact recognition of rotated objects and silhouettes