lecture 2- object perception Flashcards
how are the same object at different perspective processed?
- recognise objects as same
- but visual info coming into retina is differet
what is meant by gestalt psychology?
- ‘grouping principles’ of perceptual organisation
- group things in mind depending on different criteria
- parts of image seen as belonging together (these parts are likely to arise from the same object)
- ‘whole is more than the sum of parts’
what are the gestalt grouping principles
- similarity (e.g in luminance, shape, colour)
- proximity (tend to see groups where closer together)
- closure (closing areas of space into e.g a square)
- good continuation
- common fate (discs moving together at same speed and direction vs. still ones)
what is a figure-ground?
- area bounded by contour (closure) is seen as a separate object
- contours seen as belonging to one object at a time (can only see one option at a time, not simultaneously)
what is Marr’s model of recognition?
- primal sketch of what objects look like- 2D rep of luminance (enables us to detect edges and contours)
- 2 1/2 D sketch- description of depth, orientation, shading, texture, motion, binocular disparity- (viewpoint dependent)
- 3D model- description of 3D shape of object (view point invariant)
- analyse image with range of edge filters (hubel weisel cells in visual cortex)
- use gestalt grouping (continuity) to find outline
- segment outline at concavities (dips), from there identify principal axis of sections
- define arrangments of parts (cylinders): start with biggest (principal axis) and work progressively through smaller
NOTE: visibility (angle) of principal axis is important
explain Biederman’s ‘recognition by components’ computational model
- edge extraction- surface characteristics: luminance, texture, colour
- detect arrangment of edges: curvature, parallel, co-terminating, symmetry (don’t alter with viewpoint)
- segment object into components: detect concave parts
- determine GEON type for each component: 36 needed- any combo of these create any object (most only need few)
- determine arrangment of GEONs and match GEON description to memory
what are the problems with Biederman’s model?
- does not differentiate objects within class (telling difference between different faces/mugs)
- does not use surface pattern
- recognition is viewpoint invariant (can recognise regardless of viewing angle) BUT can matter when see object from
what are the two object processing pathways? (Ungerleider & Mishkin 1982)
- ‘what’- bilateral removing of inferior temporal area (TE) = severe impairment in object discrimination, but can grab correct option well
- ‘where’- bilateral removal of posterior parietal cortex = impairment in landmark discrimination, but could do novel object discrimination
- DOUBLE DISSOCIATION
what are the symptoms of object agnosia and which pathway does it affect?
- ventral visual pathway
- no loss of intelligence
failure to recognise objects - no simple visual impairment
- can see edges but cannot put them together
- may draw object ok but not recognise drawing
what is the case study for object agnosia?
DF- CO poisoning, Milner’s posting task
- can: accurate guidance of hand and fingers toward object (can reach/grasp/pick up)
- can’t: object size/shape/orientation (when asked to indicate size with fingers/match angle of object with slot when holding infront of it)
what is the ‘titchener circles’ illusion? (Aglioti 1995)
- target circle size is influenced by surrounding array of circles
2 options: - discs physically the same (but look different)
- discs perceptually the same (but actually different)
- perception of disc size strongly affected by illusion BUT grip aperture correct despite it
- shows object analysis for action separate from conscious perception of object
what are the symptoms of optic ataxia and which pathway does it affect?
- dorsal visual pathway
- difficulty completing visually-guided reaching tasks
- difficulty reaching in the right direction
difficulty positioning fingers correctly towards an object - little relationship between grip aperture and object size
what are views on the purpose of vision?
- early models: to construct an internal model of reality (outside = reality, trying to create accurate version of outside in head)- foundation for all visually derived thought and action
- now: more focus on ‘requirements of vision’ (Goodale & Milner 1992)- considering what visual info is doing for us (instead of making neural representation- modularity based upon what ‘uses’ vision can have)
what are the requirements of vision?
- vision for perception (identification of object- ‘object-centered’)
- vision for action (how we can change that thing in space- ‘viewer-centered’)
- require different transformations of visual signals
- need to encode size, orientation and location of objects relative to others
- start with this frame of reference- needs a perceptual representation of objects that transcends particular viewpoints (while preserving info about spatial relationships)
what is the function of the LOC?
- lateral occipital cortex
- how we recognise the same object in more than one location => identity representation
- location-tolerant object info and object-tolerant location info
- doesn’t matter where object is, can atill recognise
what is the single cell representation of objects?
- Hubel & Wiesel found hierarchical processing in visual cortex
- simple cell to complex cell to hyper-complex cell (which has n-stopping property)
- pattern-processing in inferotemporal cortex
what object features are the different regions of the temporal lob responsible for?
V1: edges
V2: contours
V4: colour & shape
PIT: simple features
AIT: elaborate features
what is the cell selectivity for pattern processing in the inferotemporal cortex?
- cells respond to code shape, colour and texture
- respond to all objects with these properties
- generalise across position
- organised in columns through cortex surface (e.g columns of object-selective cells in AIT)
- posterior region cells more orientation- and size-specific (general), anterior region more responsive to specific objects
what are the steps in Riesenhuber & Poggio’s 2002 hierarchical model of object recognition?
- input image, first processed by simple cells (different cells for different areas of visual space) => different orientations in different space coded for
- complex cells provide input into hypercomplex and V2 cells
NOTE: two options for simple to complex cells - MAX operation (complex cell will respond to which simple cell responds most)
- pooling (weighted sum of info, adding composites together)
what are the properties of the hierarchical model of object recognition?
- anatomically and physiologically plausible- based on known connections and properties of brain cells (from V1 to IT)
- based on earlier hierarchical models
- copes with viewpoint dependence and independence
- incorporates theories of learning
- copes with multiple objects and objects in different context
what is the role of context in object recognition?
- context can be normal or abnormal
- wihtin scene -> recognition of objects is easier in correct context
- within object-> word context biases interpretation
- word superiority effect- detecting a letter is easier when in a word
explain bottom-up processing of letter recognition
- provided with stimulus, detected by low-level feature detectors that have features similar to letter
- these low-level detector cells connected to mid-level pattern detectors which have a nice representation (more like the letter)
- mid-level excites the correct high-level object detector and inhibits the incorrect ones
explain top-down influence on letter recognition
- we have memorised concepts in temporal lobe (e.g what has a tail? rat and cat-yes, mat- no)
- excites possible highs, inhibits ones that don’t match concepts
- excites suitable mids
- anatomy- more connections descend than ascend
what happens when we combine bottom-up processing and top-down influence in letter recognition?
- bidirectional processing models: info flow is bottom-up and top-down
- expectations (of what something might be) lower threshold for likely items (allows to detect better)
what is the role of temporal context and object permanence in expections?
- we expect an object moved behind a screen to reappear with the same form when screen removed
- Bower: >6 months olds surprised if object gone if screen removed (expects object to be there)
- indicates at this age, they have permanence of own toy (know objects exist)
how does similarity factor into object representation?
- similarity underlies how objects are represented and organised (objects are represented in brain in terms of similarity)
- therefore guiding classification, naming, behaviour
- similar objects = similar neural representation
what makes objects similar/dissimilar? explain the experiement used here
Cichy 2019- investigated link between brain acitvity and different object properties
- groups: asked to judge similarity of shape, function, colour, background or freely without instruction
- asked to put similar objects close to each other within circle depending on group condition (close together = similar)
- compared perceived similarity (free vs. other to see which closest)
- ‘free’ group similar to ‘function’ group
what do we find when we compare psychological similarity matrixes against similarity matrixes from different brain voxels?
- lower perceptual distance = more similar
- perceived similarity of objects related to ventral visual cortex activity
- representations emerge within 200ms- object colour first (earlier in hierarchy), then shape, then background and free-arrangement in ventral temporal cortex
how might AI perform object recognition and what do we need it to do?
- should provide us with accurate and meaningful information
- it understanding context/place can be used to help determine objects
- usual method is supervised learning- train network on known objects
what are the two requirements for supervised learning?
- algorithm must be suited to task
- must have training dataset with appropriate ‘coverage’ (led to rise of multi-million datasets of objects)
what is a convolutional neural network (CNN)?
- a deep-learning algorithm that learns features directly through data
- (for objects) CNN + 1.2 million object database + 2.5 million scene database ~ human performance
- take image and convolve into different layers that represent edges
what do deep neural networks (DNNs) do?
- predict behavioural assessment of perceived similarity
- show spatial invariance: can extract relevant features, handle noisy/cluttered images and image classification/object detection/semantic segmentation
what do you need in the AI to enable to carry out CNN/DNNs?
- large computers
- that do not ‘over-fit’ (trying to detect patterns when there aren’t any)
- layers to be interpretable
- ability to handle irregular structures