Object perception Flashcards
Describe Picasso image
Very challenging for visual system
But brain recognizes ppl and table and stuff
Must compare to internal mdoels
Why do need to recognize objects
Ability to recognize and categorize objects is fudnamatemally for survival and interaction with envrieomnt
Allows us to navigate world, dangers, food and plays a crucial role in social interactions
Perception to action loop
Technological context = replicating these abilities = can enhance our safety, health and well being, handling tasks ranging from autonomous driving to early detection of disease in medical images
What are challenges of creating effective object recognition systems
Reflect complexities of visual processing in brain = variability of objects - brain deals with variance, context, lighting conditions = influences what it looks like
Requires systems capable fo abstraction and generalization from limited exs, similar to human ability to learn and recognize new or unfamiliar objects - brains and as
Describe 5 blind monks
None of them have same conclusion - diff based on what they perceiving - touching
Need to increase info to get full pic
Describe template theory
Proposition that visual system recognizes objects by matching the neural representation of the image with an internal representation of s she shape in brain
Drawback = hard to store unique template for ever occurrence of object ever seen
Describe exemplar theory
Brain reorganizes objects by comparing them to multiple stored examples rather than single template
Not relying on idealizes template - recognition occurs based on what you already experienced in the past, more flexible than template
Describe generalized context model
Diff metrics
Marie formal and mathematical
Formalized exemplar theory in generalized context model
Describe prototype model
Ppl form average of category of objects store - abstract prototype that represents best ex of category
Like best ex of dog - compared to this
Cognitive categories organized around prototypes
View as typical ex or an average over several examples than form category
Storage - template theory
Fixed templates for each object
Recognition process template tehory
Direct matching to a single internal representation
Flexibility template theory
Limited -sensitive to variations
Cannot store infinite templates, grandmother cells
Scalability template theory
Requires many templates for diff views
Dog ex template theory
Matches input to a stored dog template
Like side view of Labrador
Storage exemplar theory
Mueller stores examples - exemplars
Recognition process exemplar theory
Cora prison with multiple previously seen instances
Flexibility exemplar theory
High - handles variability well
Scalability exemplar theory
Stores many exemplars but generalizes well
Dog ex exemplar theory
Compares input to multiple stored dog examples - various breeds, angles and contexts
Storage prototype theory
Single abstracted prototype per category
Recognition process prototype tehory
Matching to the most representative prototype
Flexibility prototype theory
Moderate - allows for some variation but relies on an average
Scalability prototype theory
Requires storing only one prototype per category
Dog ex prototype theory
Compares input to an idealized average dog that represents the category
Describe general recognition theory - categorizing based on
Multivariate
Extension of signal detection tehory
Focusing on how perceptual distributions influenced decision making
Describe general recognition theory - categorizing defined by
Probabilistic distributions and categorization is based on decision boundaries that separate perceptual regions
Ability to differentiate objects depends on how much their features overlap
If do not overlap = easier to find decision boundary (hyperplanes)= decision easier and faster
Compare gcm vs grt
Gcm = store many specific faces you’ve seen before, when seeing new face = compare it to stored examples and assign the category based on similarity to past faces
Grt = rely on perceptual dimensions - face shape, Jaw width, eye size - make a decision based on statistical boundaries between 2 categories
Describe recognition by components
Structuralist tehory
Alphabet of shapes - geometric ions - goons
Form objects = combined to create
Limitation = doesn’t really handle variability we see in objects, just a crude characterization fo obejcts
Describe grandmother cell theory - gen
Kinda impossible bc some variability - one single neuron for every single concept in physical world
—single neuron responsible for recognizing grandma?
Describe grandmother cell theory - specifics
Concept contributes to ongoing debate between localized vs distributed representation
Extreme ex of localized representation in brain
Also if cell dies = does that mean wont recognize grandma anymore
Describe grandmother cell theory - Jennifer Anniston
Ppl with electrodes in brain - epilepsy treatment - did exp
Jennifer Anniston = cell fires when see her, other systems also fire if hear her voice - also for Harrison ford for some ppl
Kinda supports grandmother cell theory
Describe computational models of object recognition
Deep neural network =
Multiple layer neural networks capable fo being trained to recognize obejcts
Numerous instances of an object shown to network with feedback provided
Overtime = network learns to recognize new instances of object that is has never been explicitly trained on
- need to generalize so can see object
Describe deep neural network ex
Alex net
Stimulus —> layer 1 —> … layers 6-8 = huamn face
So small area stimulus inputted and processes = spatial average passed on until can put label of huamn face on it
Describe deep learning in object recognition
Deep neural networks rival representational performance of inferior temporal cortex - it - in monkeys in object recognition task
Representations of Dnn based object recognition model successfully predict the representations measured in inferior temporal cortex using fair
Using dnn to mdoel visual properties of stimuli= demonstrate that intermediate and high level image features can predict visual awareness and provide mechanistic explanation for phenomenon of attentional blink - like if show image v quick
How do recognize objects
Detecting spots and edges and bars
= use retinal ganglion cells, lateral geniculate nucleus and primary visual cortex -v1
What detects spots
Retinal ganglion cells and Lgn - localized contrast
What detects edges and bars
Primary visual cortex - orientation selectivity - combine spots
How do spots become objects and surfaces
Brain performs sophisaticated processing b beyond v1
Integrating visual features into structured representations of obejcts
- intermediate level vision and high level vision
Describe intermediate level vision
V2,v33,v4 etc —> grouping features into contours, textures and surfaces
Describe high level vision
It cortex —> recognizing complex shapes, obejcts and categories - tolerance to variability
What is object recognition about
Not just about simple features but about hierarchical processing across multiple visual areas - feedback and feed forward features
Describe lines to border to textures - gen
Receptive fields of extrastriate cells respond to visual properties crucial for object perception
Only respond if boundary belongs to object and not background
Describe lines to border to textures - ex of boundary ownership
For given edge or contour = neurons determine which side belongs to object and which belongs to background - a fundamental processs in figure ground segregation
Describe intermediate mid level vision = define
Loosely defined stage of visually processing that occurs after low level feature extraction - like edges, contrast and before high level object recognition and scene understanding
Describe intermediate mid level vision = key functions
Perception of edges and surfaces
Determines which regions of an image should be grouped into obejcts
Bridges low level feature detection and high level object recognition
Hypo do we detect object edges - intermediate mid level vision
Primary visual cortex v1 neurons have smaller respective fields that detect local edges and contrast
Neurons are orientation selective - responding to edges at specific angels
How do we know which edges belong together
Complicated
Computerized edge detectors are not as effective as humans in detecting meaningful edges
As humans = can see it better
Can computers detect edges the same as humans
Locally contrast between background and foreground nto strong enough
Computers miss edges that humans easily perceive bc they rely on local contrast and intensity differences
Describe illusory contour
Contour that is perceived even though no physical edge exists between one side and the other
Edge detectors fail
Minds fail gaps
Problem with some of more structuralist theories - mind can solve problem
What is gestalts theory
Whole is greater than sum of parts
Opposes structuralism - which emphasizes breaking perception into basic elements
Suggests that perception is holistic = meaning we naturally organize elements into meaningful wholes rather than processing each part independently
Define gestalt grouping principles
Set of rules that describe when and how elements in an image appear grouped together
Define gestalt = similarity
Similar objects - colour, shape, size or texture = appear grouped together - perceived as group
Segment animal from background
Define gestalt = Proximity
Elements close to each other tend to be grouped together in perception
Define gestalt = Good continuation
Lines and edges are perceived as following the smoothest past
Doesn’t explain everything tho - group as x, if have context = beak = ex
Define gestalt = Closure
Mind fills in missing info to perceive complete shapes
Illusory controls - segment arrow from background
Define gestalt =Common fate
Elements moving together are grouped
Flock of birds - moving together in shaped directions
Define gestalt =Figure ground
Brain separates objects from background
Vase segmented in foreground ex
Define gestalt =Common region
Elements located within a shared boundary or enclosed area are perceived as a group
Stronger than proximity
Define gestalt =Connectedness
Elements visually connected by lines tend to be grouped
Overrules proximity
Describe parallelism
Parallel contours are likely to belong to same group
Describe symmetry
Symmetrical regions are more likely to be perceived as a group
Describe camouflage
Animals take advantage of gestalt grouping principles to form groups in their environment
Sometimes camouflage is used to confuse observed
Like Tiger - most animals are dichromats so camo better
What are gestalt rules good for - gen
All together = can help figure out object
Ambiguity and perceptual committes
Metaphor for how perception operates
Committees must integrate conflicting inputs and reach consensus
Many diff and sometimes competing principles influence perception
Perception emerges as result of dominant interpretation agreed upon by these processes
Combined info from many regions = draw most likely conclusions
Describe bust of Voltaire image
Similarity, closure, good continuation, proximity, figure ground organization
Name and briefly describe the 5 principles of intermediate vision
1 = group what should be grouped together
2 = separate what should be separated
3 = use prior knowledge - brain stores experiences to Avoid mistakes/suprises
4 = avoid accidents, like leaning tower Pisa illusion
5 = seek consensus and minimize ambiguity = on most likely hypothesis, what’s nature of object in front of me
Describe theory of ventral and dorsal pathways
After processing in extrastriate cortex, object info divided into 2 distinct pathways = where and what pathways
Describe Where pathway
Dorsal stream
Processes locations and shapes of obejcts
Does not encode object names or functions
Extends from occipital love to parietal lobe
Describe what pathway
Processes object identity - names and functions, independent of location
Extends form occipital lobe to temporal lobe - infra temporal cortex
Not unidirectional
v1 = bigger
V2 = complex, boundary ownership
V4 = cells that respond to linear shapes
What does where and what pathway help
Supports spatial awareness = dorsal
And object recognition = ventral
In visual perception
Describe neural responses - to what in area v4 and explain
Neural response to polar, hyperbolic and Cartesian gratings in area v4 of monkey
V4 = bridges early edge detection - v1 and object recognition in inf temporal cortex
More responses to these specific patterns
Not much activity for sinuosoidal gratings or patches with oriented linear edges
What happens after v4
V2 = bit more complex boundaries = fore and background
V4
Posterior it = responds to object parts but nto whole objects - don’t need whole object there
Describe lateral occipital complex generally
Some areas show specificity = preferential responses to certain categories
Results obtained by univariance analysis - functional mri = averaging - take average response and contrast it
Shown pics and see if area responds more to one thing
Describe lateral occipital complex results
Responds more to obejcts
Loc = first stage in visual hierarchy = where full objects explicitly represented - complete objects - whole
Responds strongly to shape defined obejcts - doesn’t matter orientation, viewpoints, sizes, positions
Partial invariance - doesn’t respond to specific image but just shape of object - also involved in figure ground segmentation and distinguishing object from background
Why is loc important for high level vsison
Bridges mid level feature processsing - v4, pit with high level object recognition = ita cortex, Ffa, Ppa
Supports invariant object recognition - crucial for recognizing obejcts across diff contexts
Provides whole object representations -making it a key step in ventral visual stream
What is loc for
Major hub for object recognition = makes it essential part of understating how brain transforms raw visual input into meaningful obejcts
Describe location of fusiform face area
In fusiform gyrus of ventral temporal lobe
Usually only in right hemisphere
Sometimes bilateral
Describe category selectivity of fusiform face area
Highly tuned to faces bit also respond to expert level recognition
Preferential to faces
Describe invariant face recognition fusiform face area
Helps recognize faces across diff angles, lighting, expression = suggest view invariant representation
Describe damage to fusiform face area
Linked to prosopagnosia - cannot recognize faces anymore
- do not know if ffa cares about identity - more research needed
But this conditions doesn’t mean its linked to identity - bc could like identify based on info form ffa
What is still debated about fusiform face area
Some research argues ffa is not strictly for faces but instead specializes in fine grained within category visual recognition
Describe parahippocampal place area - gen
Region just posterior to hippocampus
Responds preferentially to places
parahippocampal place area = first identification
As dedicated scene processing region
parahippocampal place area= challenges what
Idea that object recognition alone explains scene perception - scene not just a collection fo objects
Spatial layout is key
parahippocampal place area Distinguishes what
Ppa from hippocampal spatial navigation = refines understanding of scene perception vs memory asked navigation
parahippocampal place area Provides what
Functional link between vision and spatial cognition - bridging perception and higher order place representation
Also responds to other things too - not completely separated = all regions contribute some
V4 function
Extracts curves, textures and complex contours
V4 specialization
Sensitive to local features
Loc function
Encodes whole object representations
Pit function
Represent object parts
Pit specialization
Intermediate processing
Loc specialization
Sensitive to shape, invariant to texture and colour
Ffa function
Recognizes faces
Ffa specialization
Category selective
Ppa function
Recognizes scenes and places
Ppa specialization
Category selective
Describe Ffa and Ppa
Relieve projections from lower level regions to help them process info about category they prefer to respond to
Describe real world size
Small and big objects = projection onto brain medically and laterally to fusiform gyrus = contrast between sizes of objects
Also on dorsal part = why does where care about obejcts - more than just spatial location in dorsal pathway
Describe role of context in object recognition
Contexts helps guide recognition of obejcts
Describe viewpoint and scale invariance
Many it neurons demonstrate invariance - at cellular level in neurons in itc = meaning they continue to respond to an object regardless of its size position or viewpoint = suggests that it neurons encode more abstract representations of objects rather than raw sensory features
Invariance essential for object recognition
Can rotate = neurons still respond. It if rotate too much =responses dampens
Describe decoding methods
Studying brain computer interface = brought machine learning
Departed from invariant methods =
● Collect fMRI scans of a participant while they view images from multiple known
categories.- show more images = better decoding area
● Train a computer model to recognize the brain activity patterns associated with
each category.
● Test the model to see if it can correctly identify an unseen image based on learned brain activity patterns. - show image
Describe decoding based on similarity
One of first expos = even and odd runs = while doing exp = give break to ppl, between showing them houses and faces
If do not cross categorical divisions = have strong correlation of distributed patterns on the 2 runs
correlation drops = if switch categories
= concept of distributed representations - remove group of voxels corresponding to ffa but still decode if looking at face = distributed system involved in object recognition
Describe encoding method = all steps
● Collect fMRI scans of a participant while they view images from multiple known
categories. - build model
● Define a feature space, e.g. a gabor wavelet pyramid for visual stimuli.
● Fit weights that show how each feature contributes to the neural signal at each voxel.
● Once trained, encoding models can predict responses to new, unseen stimuli. Comparing the predicted responses with actual fMRI data allows researchers to assess the accuracy of the model and understand the representational structure of the brain region under study. - model has feature set that is rich enough to understand what brain processing
Describe voxelwise encoding models
Fmri activity
Feature space - multi dimensional encodes orientations, contrast
Multiple learned weights by feature space for unseen images —> then gives you the predicted fmri activity
= look at performance correlated with predicted and measure fmri activity in that voxel and make map of where in brain mdoel can explain activity
Describe exp of identifying natural images from human brain activity
Stage 1 = model estimation = pyramidal hierarchy of Gabor patches = sinusoidal gratings with diff orientations spatial frequencies and positions in an image = gabors as feature space
Stage 2 = image identification = measure brain activity for an image
= can identify which image person looking at if mdoel successful at encoding right features
Graph = correlation of measured voxel activity and predicted
When pop = Megan taht mdoel has stronger correlation between predicted activity for one image and observed activity for same image
Strong diagonal means mdoel very rarely better predicted response associated with other image
Describe second order isomorphism
Representational similarity analysis
Similar obejcts in world must have simialr representations in mind
Study = judge states of USA by shape = if simialr
Saw these 2 things high correlated = also by name but still reate according = has to form mental image of the state
= multidimensional representational space where similarity encoded