Midlevel vision Flashcards
Mid-level Vision
Organizes the elements of a visual scene into groups that we can recognize as objects. how do we know which edges go with which object?
Kanizsa figure
Illustrates illusory contours, i.e., borders that are perceived even though nothing changes from one side to the other in an image
Visual system makes inferences about contours
Based on our best guess about what is happening in the world (experiences)
Kanizsa figure may be due to occlusion, i.e., when object blocks another (e.g., the white arrow is on top of the vertical lines and circles)
early theory of structuralism and what breaks this theory
Early psychologists (Wundt, Titchener) thought that perceptions could be understood by the analysis of the components (e.g., the sum of colour, orientation)
what we see is the sum of its parts, we see what is there)
According to structuralism, if you break down all the basic components (e.g., color, shape, orientation), you can fully explain what we perceive.
This approach assumes we only perceive what is physically present in the stimulus.
Illusory contours break these rules
Illusory contours are shapes or edges we see, even though no actual line or boundary is present.
Example: The Kanizsa triangle, where we perceive a white triangle on top, even though it’s not really drawn.
🧠 This challenges structuralism because:
There’s no physical edge, color change, or explicit feature at the contour
Yet, we still perceive a coherent shape
Gestalt school of thought
perceptual whole can be greater than the apparent sum of the parts
- Created a set of organizing principles (Gestalt grouping rules) that describe the visual system interpretation of the raw image (i.e., which elements of an image will appear to be grouped together)
- Gestalt rules are useful because they represent regularities in our world (based on our experience and can explain some illusions)
Gestalt Grouping Rules : Texture segmentation
carving an image into regions of common texture properties
Gestalt Grouping Rules: Similarity
image chunks that are similar are more likely to be grouped together
Gestalt Grouping Rules: Good continuation
a rule that states that two elements will tend to group together if they seem to lie on the same contour - Elements that are collinear (lying in the same straight line) are most likely on the same contour. Ex: birds beaks crossing
Gestalt Grouping Rules: Similarity
image chunks that are similar are more likely to be grouped together
Gestalt Grouping Rules: Proximity
items near each other are more likely to group together (group as rows of stars vs columns cause closer as rows)
Gestalt Grouping Rules: Parallelism and symmetry
a rule for figure-ground assignment stating that parallel or symmetrical contours are likely to belong to the same object
Camouflage
The same gestalt principles are used to help us find objects can be exploited to hide objects
an object’s features group with the environment so as to persuade the observer that the object doesn’t form a perceptual group of its own
Ex: stick bug (similarity - shape + colour is identical to tree), tiger (parallelism: stripes mimic grass)
How do we separate figure from ground (figure-ground assignment) = 5 ways
Process is governed by a collection of principles acting together:
- Surroundedness: if one region is entirely surrounded by another, it is likely that the surrounded region (e.g., the red part is surrounded by yellow) is the figure
- Size: the smaller region is likely to be the figure
- Symmetry: a symmetrical region is more likely to be seen as a figure
- Parallelism: regions with parallel contours are more likely to be seen as figure
- Relative motion: how surface details move relative to an edge can help determine which portion is figure and which is ground
Summarizing Mid-Level Vision
- Bring together that which should be brought together. We have the Gestalt grouping principles (similarity, proximity, parallelism, symmetry, and so forth), and we have the processes that complete contours and objects even when they are partially hidden behind occluders (e.g., good continuation).
- Split asunder that which should be split asunder. Complementing the grouping principles are the edge-finding processes that divide regions from one another. Figure-ground mechanisms separate objects from the background. Texture segmentation processes divide one region from the next on the basis of image statistics. Use our world knowledge to separate object from background
- Use what you know. Two-dimensional edge configurations are taken to indicate three-dimensional corners or occlusion borders, and objects are divided into parts on the basis of an implicit knowledge of the physics of image formation. Use physics knowledge (we know things don’t work)
- Avoid accidents. Avoid interpretations that require assumptions of highly specific, accidental combinations of features or accidental viewpoints. Avoid misinterpretations -> illusion of woman holding tower of Pisa
- Seek consensus and avoid ambiguity. Every image is ambiguous. There are always multiple, even infinite, physical situations that could generate a given image. Using the first four principles, the “committees” of mid-level vision must eliminate all but one of the possibilities, thereby resolving the ambiguity and delivering a single solution to the perceptual problem at hand. Use committees of gestalt principles to find one w/most support and eliminate the wrong ones.
Bayesian approach
formal way to model how knowledge about our world can be used to make inferences about what we see (how likely is a hypothesis given what we see)
Bayes’ theorem is a mathematical model that enables us to calculate the probability (P) that the world is in a particular state (A), given a particular observation (O)
P(A|O) = P(A) x P(O|A)/P(O)
Bayesian approach asks us to think about 2 factors:
1. How likely is what you’re proposing (i.e., prior probability)?
2. How consistent is each hypothesis with the observation?
Ex: P its raining given you see storm clouds = (P of rain) x (P of clouds if raining) / P(of clouds)
Object Recognition (after perception): pathway
There is a progressive change in the responses of cells along the what pathway (Cells in V2 are sensitive to “border ownership” and illusory contours. By V4, cells are interested in more complex attributes.)
As you move forward along the ventral stream:
✅ V1: Basic features
Detects edges, orientation, color, and simple shapes
Small receptive fields
✅ V2: Intermediate features
Responds to border ownership → which side of a border “belongs” to an object
Detects illusory contours (edges that don’t actually exist but are perceived)
✅ V4: Complex features
Sensitive to curves, shapes, patterns, and color constancy
Likely contributes to recognizing object components
✅ IT cortex: Full object recognition
Neurons respond to entire objects, often invariant to size, position, and viewpoint
E.g., some IT neurons respond specifically to faces or hands
Kobatake et al. (1994) Findings
Recorded neural activity in monkeys across visual areas
Showed random arrays of stimuli
Found that:
Early areas (like V1) responded to simple features
V4 neurons responded selectively to more complex combinations of features
This suggests V4 is an intermediate step toward full object representation
(There is a progressive change along the what pathway such that cells in more anterior locations are more sensitive to more complex stimuli.
* The stimuli that activate V4 cells may form the basis of object perception)
Subtraction method
Researchers show participants visual stimuli (like faces, scenes, or objects) while recording brain activity using fMRI.
They use the subtraction method:
Present two types of stimuli:
-One that includes the mental process of interest (e.g., recognizing a face)
-One that does not (e.g., a scrambled face image)
Subtract brain activity in the control condition from the experimental condition.
✅ The difference reveals brain areas specifically involved in that mental process (e.g., object or face recognition).
Scrambling and Brain Activation
Scrambled images: low-level features remain (edges, contrast), but no coherent object
-Increased activity in posterior regions (closer to occipital pole, like V1/V2) → basic visual analysis
Decreased activity in anterior regions (like V4, IT) → no object-level meaning to process
Recognizable (unscrambled) images:
-Trigger stronger activation in anterior brain regions involved in complex object representation
📌 Key idea: As image structure increases (face, scene, body), activation shifts forward along the ventral stream.
Category-Specific Brain Areas
Fusiform Face Area (FFA)
Parahippocampal Place Area (PPA)
Extrastriate Body Area (EBA)
faces
places/scenes
body/body parts
Object-decoding methods in fMRI
Training:
o Observer in fMRI scanner is presented with images of many objects
o Researchers record brain responses to these “training” images
Test:
o Present observer with a never-before-seen image
o Researchers use brain activity to try to guess what the object is
→ Shows that info about object identity is present in that region
Computational models of visual processing: pandemonium model
“Demons” are roughly analogous to neurons:
steps
- Signal is detected - capture what we see
- “Feature demons”: features are extracted (e.g., oriented lines, curves) - pick features of letter B
- “Cognitive demons” looks for specific feature - yell/loud for what’s most similar
- “Decision demons” pool info across third layer demons and choose loudest (decide what letter we are seeing based on activity of cognitive demons and how loud they yell)
Template
internal representation of a stimulus that is used to recognize the stimulus in the world (arial A - we compare every A that we see to this one)
Structural description
description of an object in terms of the nature of its constituent parts and the relationships between them (anything matching this description will be an A: two lines meet at an angle; a third line spans the angle created between them)
Biederman’s recognition-by-components model: Geons
+ problems
objects are recognized by the
identities and relationships of their component parts
Geons: the basic building blocks of perceptual objects
Problems with this: not all objects can be created with geons, we need more info to characterize objects (size, texture, colour)