WEEK 3- ACHIEVING OBJECT CONSTANCY Flashcards

1
Q

Why do we need visual object constancy

A

the visual system must recognise familiar objects whilst generalising over irrelevant variation due to depth rotations, plane rotations and size, position and lighting changes- object constancy must be achieved fast and accurately- there is no time for slow iterating or double- checking processes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what does object constancy allow us to do?

A

to access the SAME semantic information whatever view is seen

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what are the two factors needed for object constancy?

A

GENERALISATION across variation between stimuli in order to identify objects= achievement of object constancy. DISCRIMINATION between stimuli= categorisation. both generalisation and discrimination are necessary to access semantic knowledge effectively from the input stimulus. the trade off between these two things is tricky for our visual system

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what are the four alternative account of how we we generalise in order to categorise objects?

A
  1. defining features. 2. template. 3. multiple views plus transformations. 4. structural descriptions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is the defining features account?

A

a unique feature distinguishes the target object from the alternative distractors whatever your viewpoint- but are there defining features which uniquely discriminate all the everyday objects which we know across all different views that we can encounter in daily life?- probably not- many different view of everyday familiar objects are possible- what is unique about the side view of a car? conclusion: defining features are only useful if just a small set of distinctive objects must be distinguished - so this route probably has little role in everyday object recognition

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what did Hayward and Williams 2000 find when they showed coloured objects shown in a picture- picture matching task when they tried different view points eg rotated 40 degrees round (experiment 1)

A

Experiment 1: group a: objects were a uniquely coloured shape group b: objects had unique parts group c: neither unique colours or parts- people were much worse if the object was rotated round from the first to the second view found that group C (view invariant recognition) were the slowest than B then group A were flat because it was view invariant performance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What did Hayward and Williams 2000 find when group A now saw all the objects the same colour- all grey (experiment 2-)?

A

their performance now got worse (response times longer) as the number of degrees rotated increased - now view sensitive performance whereas before it was view invariant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what are template theories?

A

they assume we store a different internal representation for each significantly different view of an object we encounter:
- a side view of a car is then matched to a stored 2D representation of a side view of a car
a front view of a car matches to a stored front view of a car ect. once enough different views of a given object have been stored, most input stimuli can be matched farily directly to one of the stored views. this has large memory demands but less online computation (so may be faster) than structural description accounts. no decomposition of the image into its parts and no coding of the spatial relations between the parts, unlike structural description theories. we store an internal representation of every significantly different view of an object we see- awful lot of information to store in memory - called a combinatorial explosion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what is the psychological evidence for object-specific cells

A

Hubel & Wiesel 1962- used electrophysiology to reveal a hierarchy of cells responding to increasingly complex inputs from the simplest features (edges, corners) upwards. there was speculation there would even be ‘grandmother cells’ that respond to a particular object. for example, single cell recording studies have detected hand- specific neurons in monkey inferotemporal cortex (gross et al 1972). there do seem to be cells that are very specialized in our brains ie fine tuned to recognize things like hands

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what other body part is there also good evidence there is specific cells for in monkeys

A

the face (face- specific neurons in the infero-temporal cortex)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what did bruce et al in 1981 find in a study about face-specific neurons?

A

these cells fired more intensely to more face-like stimuli relative to various visual control stimuli. also that cells fire preferentially specifically for faces

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what is the imaging evidence for object- specific cells?

A

more recent fmri studies have deomnstrated separate regions in human fusiform gyrus for the perception of faces- kanwisher et al 1997, stick figures (peelen and downing 2005) and body parts (downing et al 2001). although we shouldnt get carried away by this positive evidence as it is only evidence for a narrow range of types of stimuli. we havent found evidence for templates for bananas and trees and pens ect (ie everyday objects)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what are the other questions left open by the template account?

A
  • how can so many views be stored and assessed efficiently?
  • what if the input does not exactly match a stored template?
    -when do we decide to store a new view?
    how do we recognise previously unseen views?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is the conclusion about template theories?

A

templates are usually considered too inflexible and expensive in resources (eg memory capacity) to be a general solution. templates may let us recognise a few types of biologically or socially important stimuli (faces, hands, human bodies) but they seem unlikely to be useful for general purpose object recognition. they may be used in narrow circumstances ie faces and hand (things that are socially relevant to us). although there is good. there is good evidence however we store large amount of detailed visual information- but this does not necessarily mean that we just store templates

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what study suggests we can store a large amount of detailed visual information (evidence FOR the template theory)?

A

Brady et al 2008- 14 people shown 2500 photos over 5.5 hours then did a two- alternative forced choice task. Right: example test pairs presented during the 2AFC task for three different conditions (novel, exemplar and state). the number of people choosing the correct item is shown for each pair. you get two different images and have to say which one is the one you saw before (exemplar and state conditions the hardest) conclusion: huge and detailed storage capacity for visual images - most people got the right answer in each condition

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what is another study that suggests we store a huge amount of visual information?

A

Vogt & Magnussen 2007- at study subjects saw 400 photos of doors then tested alternative forced choice taskafter delays of 30 mins, 150 mins, 2 days and 9 days. at test both groups saw 50 photos identical to their study photos and 50 photos of new doors that they hadn’t seen. the group shown original photos were better at recognising them than the group shown edited photos. conclusion: huge and detailed storage capacity for visual images

17
Q

what are the multiple views + transformation theories of object recognition?

A

to improve the efficiency of matching and to reduce the numbers of different templates that must be stored, a transformation process is used (like mental rotation or alignment interpolation). This transformation process normalises the input image to match the size, position and view of a stored template view (eg tarr and pinker 1985- multiple views + transformation hypothesis), making template matching more useful for general object recognition. now you need far fewer templates but need to do a lot of transformations but still reduces memory requirements- but some still argue this is a lot to memorise

18
Q

why might the memory requirements for multiple views + transformations theories still be too much?

A

inputs can only be trasnformed over a limited range of viewing positions to match to any individual template so still need multiple templates of different views of the same object to be stored..we must align the input to the nearest stored representation before we can match it to template representation

19
Q

what does the MVPT account suggest we should be better at recognizing?

A

upright views- because you are more likely to see uprigth views so takes longer to work out other views because you have to transform object back to original view

20
Q

what does Tarr’s multiple views plus transformation account suggest?

A

tarr and pinker (1989):

  1. people will be better at recognising familiar views of an object than new views (if object recognition is view sensitive)
  2. object recognition will get worse (slower and less accurate) the further an input view has been rotated from the nearest stored
  3. this view sensitivity will increase if the recognition task is harder (eg for subordinate compared to basic level recognition)
  4. new views of an object can be stored, and once you have stored enough views of an object then its recognition may be view- invariant. it is harder when the objects are more similar- for more familiar objects you will be able to store lots of new views because you are seeing lots of different views
21
Q

multiple views plus transformation theories have lower memory demands but higher what?

A

computational demands

22
Q

what is the evidence against multiple views plus transformation theories?

A

computational demands might make the process too slow also what are the transformations? tarr and pinker originally proposed mental rotation but there is evidence against this claim

23
Q

what are the conclusions about the multiple views plus transformation theories?

A

a promising approach but current account are insufficiently detailed to be properly assessed

24
Q

what are structural description theories?

A

the strongest versions of structural description theories assume that, whatever angle we see an object from, we always access the same 3D model and this model should always match to a stored 3D model of that object. each 3d model comprises hierarchically arranged parts and the spatial relations between them. suggested back in the 70’s- we have the skeleton model of the main components and object recognition progresses just by which of these skeleton models are out there in the world- internal reconstruction

25
Q

what is the problem with structural description theories?

A

very computationally complicated- working out the structure of something online. the visual system is only partially able to achieve object constancy

26
Q

who are the two main researchers in structural description theories?

A

marr 1982

biederman 1987

27
Q

what did marr 1982 suggest about structural description theories?

A

suggested we internally reconstruct the 3D objects that we see. achieving object constancy is now trivial- all views of a given object match onto the same stored representation. but this reconstruction seems difficult, computationally intensive and probably slow. can it explain human object recognition, which is fast as well as accurate?

28
Q

what did biederman 1987 suggest about structural description theories?

A

he instead suggested that the visual system extracts simple, non accidental features (parallel lines, t-junctions ect) which are invariant across different views of an object. these features are used to infer which of an alphabet of 3d parts is present in the image and how they are spatially related to each other. he proposed that there are less than 50 types of parts (cube, block, wedge ect) which biederman called ‘geons’. he also said objects are described by two or more geons and the spatial relations between them- this is a GSD- a geon structural description

29
Q

where does the representational power of biedermans approach come from?

A

from the huge number of possible combinations of a small set of parts/ geons in different spatial relations to each other- a bit like 26 letters of the alphabet produces millions of words

30
Q

According to biedermans account what are not coded?

A

metric differences- so although his geon- based representations can be used to distinguish different basic level objects (cats vs dog, chair vs bench ect) they cannot distinguish different faces or similar pairs of objects (this is subordinate level recognition) ie because faces have the same geons they are indistinguishable

31
Q

what are the two types of relations? (according to cooper and wojan 1996)

A

categorical and coordinate- categorical can tell a face from a chair but not Anne’s face from Deb’s. Coorindate relations are needed to code metric (quantative) differences.. which are necessary to tell apart two faces

32
Q

what relations does Bierdman’s account use?

A

categorical relations

33
Q

how does Biederman claim how constancy is achieved in terms of spatial relations between parts

A

3 mugs that look similar with different metric variation would all activate the same GSD representation (categorical relations ie cylinder next to a curved cylinder) despite the different size and aspect ratio of the two cyclinders- this is how constancy is achieved

34
Q

what is the exception for Biederman’s account?

A

plane rotation- as the bucket rotates the categorical (aswell as the coordinate) spatial relation between the the two cylinders changes - the handle is the first ‘above’ then ‘right of’, ‘left of’ then ‘below’

35
Q

what does Biederman’s structural description theory of object recognition predict?

A

this theory predicts no effect of viewpoint on recognising an obejct 9ie object constancy) if three conditions are met for the geon structural description (GSD):

  1. THE OBJECT CAN BE REPRESENTED BY A GSD
  2. OTHER, TO-BE-DISCRIMINATED OBJECTS MUST ACTIVATE DIFFERENT GSD’S
  3. THE VIEWPOINT CHANGE MUST NOT CAUSE THE OBJECT’S GSD TO ALTER (SO THE OBJECT MUST PRODUCE A STABLE DESCRIPTION OF IT’S PARTS). It predicts that we are insensitive t view changes so we achieve the constancy especially across depth constancy. but there are conditions. if the viewpoint changes that doesn’t mean you get a different structural description
36
Q

what does Biederman’s account about view invariance?

A

it is inly partial view invariance. a given representation will match a range of views but cannot cope with major changes- a halfway house between Marr’s strong account and multiple view plus transformation theories?

37
Q

does Biedermans account fit the experimental data better or Marr’s?

A

Biederman’s

38
Q

what is another advantage of Biederman’s 1987 theory?

A

it is relatively explicit- he says what information is coded and what is not.. contrast this to, for example, Tarr and Pinker (1989) multiple view plus transformations account which is rather vague- making a very precise account of what information we are encoding- this makes this account quite testable

39
Q

what is the conclusion about structural description theories?

A
  • a popular approach
  • computationally expensive
  • current theories do not fully predict the pattern of view sensitivity of human visual object recognition
  • but more explicit than some multiple-view-plus-trasnformation accounts
  • there are elements of each account in the other accounts
  • the initial two accounts are the precedents for the later accounts
  • in the next few lectures we will explain how well they explain object constancy