Perception and Modularity 2: Face and Object Recognition Flashcards
What are the similarities between face and object recognition?
both are fast and automatic
we have a large repetoire for both (look at something and instantaneously know what it is)
How are the goals of face and object recognition different?
-when you recognize an object you recognize it has a member of a class being able to recognize individuals is key aspect of face recognition
-objects have basic level vs individual identification for faces -what is it? vs who is it
What is basic for the basic level of identification?
-this is really a question about conceptual organization rather than perception
-dog cat and bird are more basic than phoebe robin and dove
-bird is more general than dove but not as general as animal so still somewhat specific
What are the three main psycholinguistic properties for words of basic level categories?
-the preferred term when naming
-short
-appear first in children’s vocabulary
What information is the most important at the basic level for object recognition?
-parts
-two or three parts are often sufficient to identify an object at the basic level
-have a fairly restricted set of parts you can combine in different ways to provide an abstract illustrations of a 3d shape of an object
What is some prima facie evidence that parts are identified rapidly and automatically/
-rapid serial visual presentation
-can recognize the line drawing of an objects
-if you were told to raise your hand when you see a flashlight on the screen - can recognize it with only 100ms of exposure - takes longer for you to raise your hand than raise your hand
What did Tanaka and Farah do in their experiment with configural vs featural representations?
-taught people to recognize noses and doors in isolation
-they also taught people to recognize whole faces and houses
-they found that people are better at recognizing whole faces than noses but equally good at recognizing houses and doors
What does the chimera face illusion show in regards to face processing?
-the preference for configural processing of faces is so strong that it is not immediately obvious that there is a face chimera whereas for objects you can pull out parts fairly readily even in unusual objects
How can we define the information that allows us to recognize faces through reverse engineering of face recognition?
we can use noise
-a base image created using morphing - gender is ambiguous and expression is neutral - add noise to the image to make the gender ambiguous -take an avg of male and female faces with neutral expression and add somewhat random noise and take a bunch of orientations and spatial frequncies and put them over the images until we a get a noise mask which is recoverable and the degree to which they are present and absent - take baseline image and add extra something so different array of cells is activated- can characterize what is landing on the retina and characertize what is similar and different from one another via noise - can add noise to a face image and can extract what noise pattern is important for what categorical face is important
When we use noise to mask an image what are we replicating?
what gabors or simple cells in v1 are activated in response to the surface of a face
When we overlay different noise patterns on the neutral expression average of male and female faces face what do we see? What question does this finding allow us to ask?
people’s judgments of gender and facial expression of the face change
-what patterns of noise shift the perception of gender an facial expression
When we have a base image and two class images and in one set of class images the expression changes from happy to sad and in another set of class images the expression changes from female to male where do we see distortion?
class image set one - happy to sad - distortion around mouth
class image set two - female to male - distortion around eyes and mouth (females have bigger eyes and lips)
Hoes are expression and gender different from individual recogniton?
they are driven by local features like smiling and frowning and relative size of eyes and mouth
What information do we use when we try to tell people apart within large categories aka individual identification?
-gabor filter - these are a reasonable model of the receptive field for V1 neurons - sinusoids at different orientations and spatial frequencies provide an over complete representation at a specific retinotopic location - location is defined as a Gaussian
Example: tom cruise and John Travolta and can add noise and say is this tom cruise or John Travolta some are 50/50 some are very tom cruise or very John Travolta and the average across people tells us how effectively we reproduced one or the other - can transform one of these faces into the other by applying a noise mask which captures something about how these images are being processed in the primary visual cortex - can add noise that is into the processing properties
What is a Gabor jet?
a collection of gabors at a range (form low to high) of spatial frequencies (5) and spatial orientations
What does summarizing the activity in a gabor jet give us?
a simplified model of the information in V1 at a specific location
i.e. one a top is going to respond to horizontal thick lines - then as you go down they will respond more weakly cause the orientation is off - can describe a patch of image based on how similar it is to these gabors
What does a gabor based representation of the face illustrate and how you make one?
is all the faces are lined up and facing the same direction a simple grid works well
-can then plot with spatial frequnency on x axis (5 scales) and orientation on y axis (8 orientations) what each point on grid responds to
How can you get some viepwoint invariance in a a gabor based representation of the face?
by quickly finding some landmarks and warping the grid
What is the gabor similarity?
can compare two images and get a numerical answer and can compute the distance between them and see
-sum of Ma Mb / sqrt(sum of Ma^2 and sume of Mb^2)
What is the match to sample task? With what speed can we do this?
people are presented with three face and one on top and two on bottom - have to determine - Which of the bottom two faces is the same as the top?
(can do this quickly with brief 300 ms stimulus presentation)
The greater the calculated euclidean distance of gabor simple cells what is the error rate for tha match to sample task?
the error rate drops because the greater the distance the more metrically disismlar the two faces are and the easier it is for people to tell them apart
Similarity between two faces during the match to sample task is predicted by a Gabor Jet model with what accuracy?
-can predict with a computer v1 representation (gabor jet model) very accurately how hard it is to decipher to faces
-this supports the claim that face recognition is supported in part by processes that are sensitive to metrical similairty of the surface image
Why are people surprisingly bad at ignoring changes in lighting direction in speeded face recognition?
due to dependence on surface information
-lighting direction messes with face recognition senstiivty is lower in different lighting conditions and people are slower to identify people so reaction time increases while sensitivity decreases
How does contrast inversion affect face processing?
it disrupts face processing and appears to impact processing by reducing our ability to recover surface information
-is dependent on surface information as modeled by gabor jets
How does contrast inversion affect object processing?
object processing is less affected by this because it depends on parts more than surfaces (but cannot tell what the object is made of with contrast inversion)
What are five important aspects of face individuation that fail to explain much about object identitifcation?
-surface based
-metric sensitivity
-preservers fine detail
-configural
-viewpoint dependent
What did Hayworth and Biederman do in their study for object recognition?
-subjects name a series of briefly presented pictures shown one at a time in a first block of trials
-in the second block the pictures include identical pictures - different exemplars of the same basic level object
-complements
-new items
What are the five possible things which could possibly be involved in mediating the priming of picture naming?
-local image features
-whole object templates
-basic level conceptual or lexical or word production priming
-subordinate level conceptual priming
-parts
Defintiions:
Conceptual Priming: This involves the activation of related concepts in memory, facilitating the processing of related information. For example, seeing the word “doctor” might make you faster to recognize the word “nurse.”
Lexical Priming: This focuses specifically on the activation of words and their meanings. For instance, if you hear the word “bread,” you might be quicker to respond to “butter” due to their lexical association.
Word Production Priming: This type involves the facilitation of the production of words following exposure to related stimuli, often assessed in tasks requiring naming or generating words.
Subordinate Level: The most specific category (e.g., “beagle”).
In subordinate level conceptual priming, exposure to a specific example at the subordinate level can facilitate the recognition or processing of related, even more specific concepts. For example, if you hear “beagle,” you might find it easier to think of specific traits or related breeds like “basset hound” or “poodle.”
What were the findings of hayworth and biederman in their object recogition task for the complement created by deleting every other line and vertex in the image?
-people are faster and more accurate at all the objects after the first block
-the identical and the complmenet has the same response time and error perentage whcih was lower than the different exemplar
-this is for the complement created by deleting every other line and vertex in the image so that if you superimposed the complements hey would add up to a complete model - still preserve the parts this way
For the complement created by deleting every other line and vertex in the image so that if you superimposed the complements hey would add up to a complete model, what would their gabor jets looks like?
they would be the complete opposite of one another because the opposing gabors will be actiavted since every other line and vertex is deleted
For the complement created by deleting every other line and vertex in the image so that if you superimposed the complements hey would add up to a complete model, what do the erasures preserve?
your ability to recognize the parts of the flashlight
What does the adavnatges of the identical over the different exemplar conditon show?
that priming was visual and was not basic level conceptual or verbal
For the complement created by deleting every other line and vertex in the image so that if you superimposed the complements hey would add up to a complete model, what does the equivalence of the complementary and identical conditions indicate?
that none of the visual priming is due to local features like the contours or vertices like with faces which are dependent on local features
-seems that priming is dependent on parts
Instead of deleting half the vertices in the hayworth and biederman study they deleted half the parts, what were the results?
-the complment was much greater in response time and error percentage than the identical image
-complement was identical to the different exemplar - treated like the complement was an entirely different image
In the parts complement portion of the study, what did the advantge of identical over different exemplar conditons reveal?
that priming was visual and not basic level conceptual or verba;
In the parts complement portion of the study, what did the superior performance on the identical condition compared to the complementary parts condition (which was equivalent to the different exemplar) show?
-recognizing the same elephant but from a different set of parts is sort of like recognizing a hole other elephant
All in all, what ultimately ended up mediating the priming of picture naming in the hayworth and biederman expt?
parts
What was the main difference found between the feature deletion block and the part deletion block in the study?
-the feature deletion allowed for the ability to still recover the parts that make up the object
-the priming that we see for the feature deletion is the same as the priming we see for the identical dawing which means they are the same image processing wise
-the priming we see for the parts deletion is the same as the priming for the different exemplar which means the visual system treats the parts complements as a completely new image
In lower level perceptual areas what does repeated stimulation with the same thing lead to?
reduced responding in fmri
In higher level regions we want to know what is perceived as the same via fmri so what can we do?
we can present two different images of the same object like a drawing and an image and see if they are is an attenuated response due to the repetition suppression phenomenon
In what region of the brain has object-based adaptation been observed?
the LOC or lateral occipital complex which is where object based adaptation has been observed
-there was an observed reduced activity from repeating the basic level object independent of format in the LOC
Are mirror reversed objects represented the same in the LOC?
-yes because you can recognize in the LOC that is the same object just in different orientation
What is the LOC generally understood to play a role in?
object recognition
What is the LOC in the brain?
pretty dispersed in the lateral surface of the brain in the occipital love and is connected sulci and gyri and shows adaptation which shows it representing this abstract structure and is theorized how we recognize shapes instead of faces
How can we use this complement priming paradigm to test what kinds of computations the LOC does to help us recognize objects?
can use fmri adaptation test of parts based priming
What was performed in the fmri adaptation test of parts based priming?
-subjects judged whether the images were of the same exemplar
-mirror inverted complement of feature based was shown and so was a mirror inverted complement of the parts based plane
How will the cells in V1 see the different mirror inversed complement of feature based?
they will say orientation and complement wise it does not look the same because retinotopically it will activate a different set of V1 neurons while the LOC does not care about orientation and will say they are the same object (this means in the LOC you are abstractly representing objects)
What were the fMRI bold responses for the local feature deleted block and the parts deleted block?
-for the local feature deleted - the LOC responds with an attenuated curve as the same as the identical exemplar for the local feature deleted exemplar
-for the parts deleted block - the LOC responds with a normal curve as the same as the different exemplar
What is a part?
-seems obvious when talking about a particular object - like the stem of a glass or the handle of watering can
-in biederman’s geon theory there is a very specific definition in terms of shape
-continuation is needed for an objects parts to be recoverable while nonrecoverable has no curves for continuation
How is object recognition composed hierarchically?
-objects are composed of parts
-parts are defined by conjunctions of features
-features are abstractions from images that help us achieve invariance across viewing conditions
-essentailly have an array of cells or networkl that are doing feature detection and combinations of edges are going to give rise to corners or different vertices or parallelism - very important for this domain and this model is binary like is there an edge yes or no and then can up the hieracrhy and eventually detrmine is this a table
-are recovering three dimensional objects so can ask yes or no questions
What are some local features used in object identification?
- straightness - smooth continuation - is this line straight or curved
- cotermination - yes the vertex type and no is a T
- parallelism - yes with bias in depth and no
-these will all be true at different orientations
What are non-accidental properties (NAPs)?
combinations of features that distinguish 3d shapes from one another irrespective of the viewpoint, lighting conditions, contrast, polarity, etc.
What can combinations of NAPs be used to describe?
a inventory of primitive shapes
What does recognition by components depend on? What is not important for this system?
contrastive differences between features based on nonaccidental properties
-metrical differences are not important for this system
What happens to object recognition if you make a metrical change making it more complex or concave?
-increase the degree of curvature and make it more concave nothing happens to object recognition because the curvature degree depends on your viewpoint - different from face recognition because this depends on metrical differences
What does recognition by components mean?
we recognize objects using NAPs to identify basic shapes then use those shapes and their 3d spatial relationships to identify the object
-there is an invariance to viewpoint, lighting, surface properties, and inverted contrast
-this is different than V1 which does pay attention to viewpoint and surface properties
How is the classification of objects (including faces vs objects) different from face individuation?
object classification - nap-parts (geons), edge based, viewpoint invariant
face individuation - surface based, metric sensitivity, preserves fine detail, configural, viewpoint dependent
What do the V1-V4 lattice of spatial filters feed into?
-nonaccidental classifiers of orientation an depth discontinuities, grouping and decomposition, structural description —> classification of obejcts
OR
-face individuation
How do we study representations in the brain
so far we have looked at:
-subtraction (do you like faces more than houses)
-adaptation (do you think these two cups are the same thing)
-an alternative is classification (this was also seen in the Freiwald and tsao paper)
-we can look at a distributed pattern of activation in the brain and from these determine what the stimulus was
Haxby et al looked at the pattern of activity over the ventral temporal cortex (which is the whole bottom surface of the brain) what did they do?
instead of looking at which voxels were more active for one condition than another they looked at the correlation between the pattern of activity across the stimulus classes
-so for example activity for faces i one set of runs was highly correlated with activity for faces collected at another time
-used a split half design where they split the data into odd and even runs and then look at the pattern of response during the face runs
-this is not canonically the face cortex instead it is just parts of the ventral temporal cortex - r value shows how correlated the different runs are - the response across a bunch of differnet faces ois the same between even and odd runs
-the correlation is low between catrgoies
What was the holist analysis conducted in the Haxby et al study?
-holist analysis of all areas and looked at correlations - took out areas strongest for those images - red bars are the entire ventral temporal cortex and the orange is when you have removed the cortex that is most selective to that category
-red and orange are positive correlations and blue is negative correlations
What did norman et al do in regards to the classifier technique?
-did a comparison of correlations
-if brain activity in an area is more similar within a stimulus type than between stimulus types we can say that brain area is somehow representing information relevant to telling those stimuli apart
-saw a high pattern of similarity within catgories and high pattern dissimilairty bewteen catgories
What did kriegeskorte et al do?
they took a bunch of different stimuli and made the stimuli distinct from one another even if they were in the same group of animate (human non human) inanimate (natural and artificial) and then did mapping of the early visual cortex and the LOC in humans and monkeys
What was seen in patterns of representational similarity?
distance between faces is short and distance between houses is short but distance between faces and houses is long
-cool is similar in matrix and hot is dissimilar
What did kriegeskorte et al see in the early visual cortex?
-they mapped out early v1 - can pick out a small diagonal blue line where identical image elicit identical responses otherwise there was no correlation looks like confetti
-faces were posed in slightly different orientations and alot of different scales were used but they were still perceive by the cells in v1 retinotopically
What did kriegeskorte et al see in the human IT?
-the IT seems to be strongly organized by category structure
-human inferior temporal cortex and can see macro structure and have a blue and red checkerboard and blue is similar and red is dissimilar and animate is treated the same to one another and inanimate is treated the same to itself and faces are treated the same to one another have small blue box
What did kriegeskorte et al see in the primate IT?
the same thing was seen as the human IT with monkeys having less variety for faces
-this shows that this phenomenon is conserved across species
-this can be attributed to the shared ancestry or because these are human raised monkeys so they see stuff all the time
What did humans find more dissimilar than monkeys in the IT?
-human faces
what is similar to monkeys is similar to humans but thee are some expceptions things in the line moneys and human see the same and things below the line monkeys see more dissimilar and above the line is human dissimilar - more dissimilar to humans is human faces while monkeys see them like oh yeah human they dont differentiate faces by inidivudals for humans
-For monkeys scores are lower of humans and humans score for humans is higher in dissimilarity
Dies the aIT represent information that can be used for face individuation?
-they stretched faces and houses and they found that
-inaIT can tell faces apart 500 voxels in can tell that face apart from all other faces in data set if do this anywhere else cant
-the ffa responds strongly when you show a face but does not distinguish between faces can only tell who you are looking at by looking at aIT cortex
Where does information get transformed in a way that helps us identify faces and objects?
ventral visual region
How do distributed representations have a role to play in the process of objects and face identification?
-there is not a show area or bottle area it is distributed
Is representational similarity highly conserved across species?
yes
If the FFA is important for face individuation why does it not seem to contain information about faces? What intermediate processing stages are there between FFA for faces and LOC (for objects) and aIT where there seem to be representations that are important for discriminating both faces and objects?
FFA might alert you have face and might tell you that the information needs to go to the AIT
-man who thought wife was a hat compensatory Strat is to treat face as object by identification by parts
-Overall don’t know what the FFA is really doing it might signal that you have a face and might tell you that the information needs to go the AIT