Object recognition Flashcards
The object recognition problem
Isolated neurons in V1 cannot draw conclusions by themselves on the nature of a whole object
Template Theory
visual system recognizes objects by matching the neural representation of the image with an internal representation of the same “shape” in the brain
Many templates for different object positions
- flexibility is limited because you cannot store a template for every object
General recognition theory (Ashby)
Categorization based on multivariate signal detection theory
- Categories are defined by probabilistic distributions, and categorization is based on decision boundaries that separate perceptual regions.
- If 2 objects have very similar features, they will be harder to distinguish. If 2 objects do not overlap in features, it is easier to detect differences and distinguish them.
Generalized Context Model (GCM) (Nosofsky)
Examplar based similarity : comparing new object to stored examples
Whatever exemplar is more similar to new object = what this new object is
GCM (Nosofsky) : categorizing faces
You store many specific faces you’ve seen before. When seeing a new face, you compare it to stored examples and assign the category based on similarity to past faces.
GRT (Ashby) : categorizing faces
rely on perceptual dimensions (e.g., face shape, jaw width, eye size) and make a decision based on statistical boundaries between the two categories.
Recognition by components (Irving Biederman)
We recognize objects using an alphabet of shapes, or geons (geometric ions) that together combined can form any given object
- 36 geons, that can make up any object
Limitations : does not handle object variability, very crude
Grandmother cell theory
- Suggests we have a single neuron for every single concept in our world
- Extreme version of localized representation in the brain (like template theory)
Jennifer Aniston cell?
- A single responded solely when presented with a picture of Jennifer Aniston
- If that cell died, the person would still probably recognize her
Deep Neural Network (DNN)
- Multilayer neural networks capable of being trained to recognise objects.
- Numerous instances of an object are shown to the network, with feedback
- Over time, the network learns to recognize label new instances of the object that it has never been explicitly trained on (generalization over the training set)
Deep Neural Networks (DNNs) rivalled the representational performance of the … in monkeys on an object recognition task.
inferior temporal cortex (IT)
representations of a DNN-based object recognition model successfully predict the representations measured in the inferior temporal cortex (IT) using …
fMRI
intermediate- and high-level image features can predict …
visual awareness and attentional blink (predict if you will see quickly presented image)
Retinal Ganglion Cells & LGN detect _____ (localized contrast)
Spots
Primary Visual Cortex (V1) detects …
Edges and bars (orientation selectivity)
Role of Intermediate-level vision (V2, V3, V4, etc)
Grouping features into contours, textures, and surfaces.
Role of high-level vision (IT cortex)
Recognizing complex shapes, objects, and categories.
Object recognition is not just about simple features but about _______ processing across multiple visual areas.
Hierarchical
The receptive fields of _______ cells respond to visual properties crucial for object perception.
Extrastriate
Boundary Ownership
For a given edge or contour, V2 extrastriate neurons determine which side belongs to the object and which side belongs to the background—a fundamental process in figure-ground segregation.
Intermediate (mid) level vision
Loosely defined stage of visual processing after low-level feature extraction (e.g., edges, contrast) and before high-level object recognition and scene understanding.
Key Functions of intermediate vision
- Perception of edges and surfaces
- Determines which regions of an image should be grouped into objects
- Bridges low-level feature detection and high-level object recognition
Primary visual cortex (V1) neurons have ____ receptive fields that detect local edges and contrast.
Small
orientation-selective v1 neurons
Respond to edges at specific angles
Computerized edge detectors are not as effective as humans in detecting …
meaningful edges
Why do computers miss edges that humans easily perceive ?
Computers rely purely on local contrast and intensity differences.
Illusory contour
A contour that is perceived even though no physical edge exists between one side and the other.
Gestalt theory opposes ______
Structuralism
- perception is holistic : we naturally organize elements into meaningful wholes rather than processing each part independently
Gestalt Grouping Principles
Set of rules that describe when and how elements in an image appear grouped together:
Similarity Gestalt rule
Similar objects (color, shape, size) appear grouped.
Proximity Gestalt rule
Objects that are close to each other tend to be grouped together
Parallelism Gestalt rule
Parallel contours are likely to belong to the same group.
Symmetry Gestalt rule
Symmetrical regions are more likely to be perceived as a group.
Good continuation Gestalt rule
- Lines and edges are perceived as following the smoothest path
Closure Gestalt rule
The mind fills in missing information to perceive complete shapes.
Common fate Gestalt rule
Objects that move together are seen as moving in same direction
Figure ground Gestalt rule
The brain separates objects from the background.
Common Region Gestalt rule
Elements are grouped together if they appear to belong to the same larger region.
Connectedness Gestalt rule
Elements tend to be grouped if they are connected. Overrules proximity.
Camouflage
Animals use Gestalt rules to merge with their environements
Camouflage is more effective for dichromats
Perception emerges as the result of the _______ interpretation agreed upon by Gestalt processes.
Dominant
5 principles of intermediate vision
- Group what should be grouped together.
- Separate what should be separated.
- Use prior knowledge to predict.
- Avoid accidents (coincidences making appear objects as something else).
- Seek consensus and minimize ambiguity.
After processing in the _____ cortex, object information is divided into 2 pathways
Extrastriate (visual areas beyond the primary visual cortex (V1 or striate cortex)
“Where” Pathway (Dorsal Stream)
- Processes locations and shapes of objects.
- Does not encode object names or functions.
- Extends from the occipital lobe to the parietal lobe
“What” Pathway (Ventral Stream)
- Processes object identity (names) and functions, independent of location.
- Extends from the occipital lobe to the temporal lobe.
Feedforward processes
information leaving early visual cortex V1 moving to V4
Feedback processes
- information sent from V4 back to V1, for example to ask for more edge information
As we move from V1 to V2, V3, V4, posterior, anterior, prefrontal cortex; the neurons respond to more and more… information
abstract and complex
Univariate fMRI analysis
Showing pictures of many different objects or scenes, then averaging the neuronal response to a category like faces. After, compare the average neuronal response to the category of interest to the neuronal response to another category (e.g. faces vs places)
V4 neurons respond to … gratings.
polar, hyperbolic, radial and Cartesian instead of sine and linear
- Supports the idea that V4 represents intermediate level shape information
V4 extracts … and is specialized for…
curves, textures, and complex contour
Specialization : local features
Stimuli that activate the neurons in posterior IT (PIT) the best are …
object parts (intermediate processing)
Difference between Posterior IT and anterior IT
Posterior IT : integrates shape features; not whole objects
Anterior IT : whole objects
The … is thought to be the first stage in visual processing to explicitly represent whole objects
Lateral occipital complex LOC (bilateral)
LOC responds strongly to the ____ of objects, even when texture or color is removed
Shape
Invariant representation in LOC
Representations that do not depend on the viewpoint, texture & color
LOC is also involved in …
figure-ground segmentation, helps distinguish objects from their background
The LOC bridges … with ….
mid-level feature processing (V4, PIT) and high-level object recognition (IT cortex, FFA, PPA).
- Part of ventral visual stream
Why can we say the fusiform face area FFA is category preferential instead of category selective ?
The FFA does not only respond to faces (expertise effect too)
Prosopagnosia
inability to recognize faces due to FFA damage
Invariant Face Recognition in the FFA
FFA helps recognise faces across different angles, lighting conditions, and expressions
Where is the FFA ?
in the fusiform gyrus of the ventral temporal lobe (right hemisphere; sometimes bilaterally).
Parahippocampal Place Area (PPA) is a ______-processing region
Scene and place category selective
PPA is a cortical representation of the …
local visual environment
How does the PPA challenge the idea that object recognition is enough to explain scene perception ?
The PPA shows that patial layout is key to scene perception : a scene is not just a collection of objects.
scene perception vs. memory-based navigation
Scene perception : PPA
Memory-based navigation : hippocampus
Provides a functional link between vision and spatial cognition, bridging perception and higher-order place representation.
PPA
How is real world size represented in the brain ?
Not far from the fusiform gyrus, there is a medial and lateral contrast between small/big objects
- It extends to the dorsal stream, showing that it has a role beyond location identification
What suggests that IT neurons encode more abstract representations of objects rather than raw sensory features ?
IT neurons demonstrate invariance : they continue to respond to an
object regardless of its size, position, or viewpoint.
Machine learning as a decoding method
Collect fMRI scans of a participant while they view images from multiple known categories.
● Train a computer model to recognize the brain activity patterns associated with each category.
● Test the model to see if it can correctly identify an unseen image based on learned brain activity patterns.
Results of similarity decoding for faces and places
- very strong correlation in pattern of responses to faces between even and odd runs
- very strong correlation in pattern of responses to houses between even and odd runs
- But when you cross the category boundary going from faces to houses, the correlation drops
Distributed representations in decoding faces perception
If you remove the FFA of the pattern, you can still decode whether you are looking at a face or not
- There is information about object categories and faces throughout the visual ventral stream
Vowelwise encoding Method
- Collect fMRI scans of a participant while they view images from multiple known categories.
- Define a feature space to N dimensional features
- Fit weights that show how each feature N contributes to the neural signal at each voxel (equivalent of 3D pixels in the brain)
- Once trained, encoding models can predict responses to new, unseen stimuli by multiplying the weights by he feature space for an unseen set of images or objects
Measure of performance with the voxelwise encoding method
- Correlating the predicted fMRI activity in that voxel to the measured fMRI activity in that voxel.
- Can make a whole brain map of where in the brain can the model explain the brain activity.
Gabor patches
Sinusoidal gradings with different frequencies, positions and orientations
Identifying natural images from human brain activity : steps
Step 1 : model estimation
- Estimate a receptive-field model for each voxel
- Used Gabor patches as a feature space with the voxel wise encoding modelling approach
Step 2 : image identification
- Measure brain activity for an image
- Predict brain activity for a set of images using receptive-field models
Identifying natural images from human brain activity : results
The model had a strong correlation between the predicted activity for one image and the observed activity for that same image.
- we could identify single images by decoding or encoding fMRI data.
Second order isomorphism
(Roger Shepard)
Similar objects in the world must have similar representations in the mind.
Second order isomorphism
(Roger Shepard) test
- Asked participants to judge similarity of shape of USA states looking at their shape vs looking at their name
- For the name : had to form mental imagery of the state
Results :
2 ways of judging similarity were highly correlated : there’s a multi dimensional representational space where similarity is encoded whether you’re using mental imagery or you’re perceiving stuff in the world.