Lecture 13 Flashcards
Gestalt approach to object perception
The whole differs from the sum of its parts.
Principles of perceptual organization
The whole differs from the sum of its parts.
looking at a limited set: what’s most likely given what i’m likely to see
– Perception is not built up from sensations, but is a result of
perceptual organization. [strong top-down influence]
– The mind (somehow) makes simple assumptions about objects in
order to recognize them in the environment. (didn’t know anything about the neuroscience involved)
Principles of perceptual organization
8 principles for organizing objects within perceptual scenes have
been offered:
1) pragnatz (good figure/simplicity),
2) similarity,
3) good continuation,
4) proximity (nearness),
5) common region,
6) uniform connectedness,
7) common fate,
8) meaningfulness.
pragnatz
the simplest interpretation: cognitively keeping your resources as limited as possible in trying to solve a problem: NOT overthinking
when you see a scene, assume it’s the simplest possible version of that scene
ex: the olympic rings
(good figure/simplicity)
• Every stimulus is seen as
simply as possible
• The easiest interpretation
takes fewer cognitive
resources.
wouldn’t try to break it up into all the possible objects it could be
similarity
try to group objects by some feature that they have in common
• Similar things are grouped together
• Color is one measure of similarity, but it could
be shape, texture, orientation, etc. = feature that is easy to find
good continuation
when you’re looking at lines within a scene, you’re going to try and keep things as smooth as possible
our visual system makes the assumption that things continue in nice, smooth curves; rather than break it off and continue in a different direction
• Connected points resulting in straight or smooth curves belong together.
– Lines are seen as following the smoothest path
– Holds true even complex images
common region
you’re going to group objects if they’ve been somehow defined graphically as belonging to one another
- elements in the same defined region tend to be grouped together: within a border
proximity
we tend to group things that are a little closer together than other things
- things that are near to each other are grouped together
uniform connectedness
connected regions of visual stimuli are perceived as a single unit
meaningfulness (familiarity)
looking at a complex scene, you’re going to intuitively form groups out of stimuli that seem to go together and form something that means something to you
Stimuli form groups if they appear familiar or meaningful
hidden faces: impose meaningful patterns
common fate
as things move together we’re going to see them as an object
Things moving in same direction are grouped together
do any principles override others?
some are really important (pragnatz) but they’re really heuristics - they give you better answers by narrowing down the answers
we kinda use them all together
Gestalt theorists were also
interested in figure-ground
segregation –
how you pick out objects from a complex visual scene
what is the figure within a scene and what is the background: what rules can you apply to figure that out
Properties of figure and ground:
– The figure is more “thinglike” and more memorable than ground.
– The figure is seen in front of the ground.
– The ground is more uniform (e.g. one color or texture) and extends behind the figure.
– The contour separating figure from ground belongs to the figure (border ownership).
Factors that determine which area is
the figure:
– Elements located in the lower part of displays (bias to seeing things on the bottom as a figure)
– Units that are symmetrical
– Elements that are small tend to be more figure-like (slight bias)
– Units that are oriented vertically (slight bias)
– Elements that have meaning
Gestalt principles operate as….
….as heuristics (probabilistic) that give us the means to quickly organize stimuli in the environment.
trying to describe how the visual system will quickly organize things
aren’t an algorithm, because the same input will give you a different output BUT they are trying to operationalize steps
- For our purposes, they operate at Marr’s 2nd level of analysis as they give us steps followed in the black box to yield a perceptual
result.
Gestalt principles give us methods by which the environment can
be organized, but don’t get far in solving the problems of
identifying occluded objects or seeing objects from different
viewpoints.
Recognition-by-components (RBC) theory tries to go further in
addressing these issues.
Under the Recognition-by-components (RBC) theory
trying to reduce some of the error: varying responses just by aplying heuristics
impose some organization on the environment by looking of specific features in the environment
objects are recognized by volumetric features called geons
– Theory proposes there are 36 geons that combine to make all 3-D objects. (basic visual building blocks)
– Geons include cylinders, rectangular solids, pyramids, etc.
basic toolkit to recombine the objects we expect to see: imagine the visual system has 36 types of legos and for anything you see you build up with those legos
Properties of geons – how they function
still Marr’s second level: its a model
- view invariant properties
- Non-accidental properties
– Discriminability
– Principle of componential recovery
– View-invariant properties -
ex: usually when you look at a rectangle from different views you can always see its edges
aspects of the object that remain visible from
(most) different viewpoints.
non accidental properties
don’t have to recognize it by looking at it by one way (looking at a cylinder from the bottom and only seeing a circle)
- properties of edges in the retinal image that
correspond with the 3-D environment.
Discriminability -
shouldn’t be able to mistake one geon for another geon
- the ability to distinguish geons from one another.
Principle of componential recovery -
to id an object you should be able to pick out its component geons that make up that object
- the ability to recognize an object if
we can identify its geons - overcomes the problem of occlusion
- bridge the gap between top and bottom processing?
a scene contains
- background elements.
– objects organized in meaningful ways with each other and the background. (kind of organization that you’re used to seeing)
Difference between objects and scenes:
– A scene is acted within (setting in which you use tools)
– An object is acted upon (tools, things you use)
Research on perceiving gists of scenes
- The gist is a quick understanding and recognition of major elements in a complex picture.
– Mary Potter (1976) showed that people can do this very accurately when a picture is only presented for 250 ms. (important because our eyes will saccade around a visual scene about 3 times a second: eyes always looking for novelty: this behavioral data matches up with what we know about the visual system)
– Li Fei-Fei (2007) extended this research to demonstrate the range of information that becomes available with more viewing time, extending from 27 – 500 ms.
– The point is that our visual system needs time to construct complex images, but can do surprisingly well with a brief glimpse.
- eye is trained to picking up that info very quickly
– But how?
Mary Potter (1976)
verbal description (“there’s a girl clapping”) or an actual picture of the action
then 16 other pictures in a row
was there a girl clapping? YES if there was a presented image
showed that people can do this very accurately when a picture is only presented for 250 ms. (important because our eyes will saccade around a visual scene about 3 times a second: eyes always looking for novelty: this behavioral data matches up with what we know about the visual system)
Li Fei-Fei (2007)
what’s the critical time period?
looked at time windows
extended Mary’s research to demonstrate the range of information that becomes available with more viewing time, extending from 27 – 500 ms.
masked stimulus, present a stimulus and then a blanks screen to take away the memory of that image
27ms is faster than the time it takes for the signal to travel from the retina to V1 (35-40ms)
by the time you get to 500ms you can recreate a scene: put a narrative to the scene