Exam 2 Flashcards
What is the entry level of visual recognition?
Sort of analogous to the basic level of categories – the level your visual system recognizes things at
—> objects most easily distinguishable in terms of the relations between their categorical parts
What does it mean (very broadly) to recognize an object?
Match a representation of the stimulus (the visual image) to a representation stored in LTM. When you succeed in finding a match, you’ve recognized the object.
What are some characteristics of the starting point of object recognition – area V1?
~Neurons in V1 respond to bars and edges (simple features) in the visual image
-each neuron responds to a specific combination of location in visual field, orientation, spatial frequency, etc…
~Very sensitive to very particular details – change any one of these properties, and you change which neurons respond to the stimulus
-thus, sensitive to viewpoint
-this means that we are NOT storing pattern recognition of V1
~Retinotopic mapping is a thing
What are the general properties of human object recognition (aka, properties of the representation of shape) and how do they relate to V1? What does this imply about the representations we’re matching to LTM?
Invariant with... -translation across retina (unlike V1) -changes in scale (unlike V1) -left-right reflection (unlike V1) Sensitive to... -rotation in picture plane (like V1) and, to a lesser extent... -rotation in depth (unlike V1) Representations we're matching don't have properties of V1 representations --> we use V1 to compute something elks
What do neurons in inferotemporal cortex do?
Provide info about object identity: what we’re looking at
-mostly (but not exclusively) shape
^because shape is really diagnostic for object identity
What are some of the response properties of neurons in IT in macaque munkeys?
~Some (majority?) respond to object shape, independent of viewpoint
~Others respond to particular shapes in particular views
^if something has special relevance, behaviorally, it would be nice to recognize it more quickly –> neurons in IT can learn this!
What is some evidence that neurons in IT can learn?
~Some neurons respond to particular shapes in particular views
-the more you train a monkey on a particular view of a given shape, the more likely you are to find a neuron dedicated to that view
What is visual priming?
Priming: processing something on one occasion makes you faster and more accurate to process that thing and related things on subsequent occasions
-form of learning
How can priming be used as an index of visual representation?
The more two things have in common, the more they prime one another because activating a representation on one occasion makes it easier to activate again on a subsequent occasion
-therefore, the magnitude of priming is a measure of how much the mental representation of one thing has in common with the mental representation of the other
What is the purpose of the different exemplar control condition in priming experiments?
Prime with Exemplar 1, and probe with Exemplar 2
~Different exemplars have:
-same name
-same (or similar) concepts
-different (albeit similar) shapes
SO, priming from Exemplar 1 to Exemplar 2 provides an estimate of non-visual priming
How can we calculate non-visual priming?
Total priming = visual priming + non-visual priming
SO,
Visual priming = total priming - non-visual priming
What is the purpose of the identical image condition in priming experiments?
Prime from an image to itself
This is a measure of total priming
In summary, how do we estimate the magnitude of visual priming?
- Observe total priming by observing how much a stimulus primes itself in the identical image condition
- Estimate non-visual priming by observing how much one object primes a different object with the same name in the different exemplar condition
- Estimate visual priming by subtracting non-visual priming from total priming
Compare priming for identical images, translated images, and different exemplars. Is the visual representation of shape dependent on the location of the image in the visual field? What about size? What about left/right reflection?
What does this mean about the representation we’re priming?
~Shape: NOPE! We see complete visual priming: priming for a different location is equal to priming for identical image
~Size: also nope, priming for different sizes is equal to priming for identical image
~Reflection: also also nope, priming for reflection is equal to priming for identical shape
SO, the representation we’re priming is far away from V1 (because invariant to things V1 is sensitive to)
What’s the overall conclusion of the more complicated case of whether or not priming is invariant to rotation in depth?
Whether depth rotation matters depends on how it’s measured – the nature of the stimuli and the nature of the task
-if the stimuli have nice volumetric parts and the same parts are all visible
Basically, use or ignore orientation information to the extent that it’s advantageous
–> orientation is separate from shape –> we match it separately –> so, it can help overcome noise
What’s Biederman’s Recognition-by-Components theory of object recognition?
~Use non-accidental properties of image edges (e.g., in V1) to make inferences about eh volumetric (3D) shapes of an objects parts (geons)
~Use the geons and their spatial relations to represent object shape
~Recognize objects based on their geons and the relations among them
What the fuck’s a geon?
Geons are: categories of generalized cylinders
–> within categories, treated the same
~Geons are imprecise
-vagueness makes it robust to variation in viewpoint and makes it possible to recognize objects as members of a category
(vague permits generalization as a natural consequence)
Where do you “get” geons from?
You can recover 3D properties of a geon’s shape from 2D non-accidental properties of image edges
-provides a way to go from 2D information in V1 to representation of 3D object shape
Why is it important that geons are represented categorically?
This means they are naturally robust to variations in veiwpoint, and good for category recognition
Even spatial relations between them are imprecise and represented in a categorical way
What’s the sequence of events according to RBC?
- find non-accidental image properties
- use them to characterize the geons in the images (and the relations among them)
- match the geons and relations to an object model stored in LTM
- use the object model to access the concept and name
Give an overview of how the JIM model works.
Layer 1: detects image edges by location in image – retinotopically mapped
2: discovers vertices, axes, and blobs (non-accidental properties!) – still retinotopically mapped
3: discovers geon attributes
-categorical attributes go straight to layer 6 (object memory) for object recognition
-metric attributes go to layers 4&5 to compute relations
4 & 5: decompose object into components (geons) and calculate spatial relations between them (with info from 3)
6: put relations in pairs (with info from 3)
7: put paired relations into objects, and contains neurons that learn to respond to particular objects
Why is it important that the JIM model represents all geon attributes as independent in layer 3?
In this layer, information has been torn apart so that the confounding of location is no longer an issue – for example, one neuron will respond to a curved axis regardless of where it is.
This means that all the attributes are independent, and is important for invariance. It also introduces the binding problem…
How does the JIM model represent the binding of geons to their relations?
sdfs
How does the JIM model conquer the binding problem?
~You can’t know which neurons go with which in advance, so binding must be dynamic
~Solution: use non-accidental properties to synchronize bound neurons
-synchrony established in layers 1 and 2 that represent binding relations are preserved in later layers
-if synchrony = binding, asynchrony = NOT binding!
Is there a capacity limit to the JIM system?
Yes! There are only a limited number of groups that can be out of synchrony with one another, so there’s necessarily a capacity limit.
What are some predictions from the JIM model, and which are true?
- Structural description requires attention
-true: you don’t generate representation of an image and its parts and relations without attention - Perception of relations requires attention
-true - Object recognition requires a structural description
-false
therefore, predictions 4&5 are false… - Object recognition requires visual attention
- Object recognition no faster than dynamic binding
What did patient DF suffer from?
Bilateral damage to lateral occipital cortex (the VENTRAL processing stream)
~Phenomenologically blind, but her dorsal pathway is intact (governs attention and talks to motor cortex)
–> she walks around fine without bumping into things, but can’t interact with objects because she can’t identify them
~Demonstrates that the two functions of the the visual pathway are separate!!
Describe that experiment with the bar of light and how DF performed when asked to match its orientation.
~When told simply to match the orientation of the bar of light, she fails because she can’t make explicit visual judgements
~When told that there’s a mailslot on the wall, and instructed to post a letter, she succeeds because this is a MOTOR task!
Give some evidence that neurons in V1 fire in synchrony when bound (Gray and Singer, 1988).
Two neurons with collinear receptive fields…
- take two bars, move one across receptive field from top to bottom and other from bottom to top
- spikes correlated, but asynchronized
- two bars, move both across receptive field from top to bottom
- temporal correlation increased
- one long bar that goes across both fields
- fire strongly in synchrony –> signals likelihood of being same object