Object recognition with reading Flashcards
Gestalt approach
Centred on structuralism. Wertheimer, Koffka, Kohla were founding Gestalt psychologists. The Whole Is Different Than The Sum Of Its Parts
Gestalt rules
Pragnanz, Of several geometrically possible organization that one will actually occur which possesses the best, simplest and most stable shape, similarity, good continuation, proximity, common fate, meaningfulness or familiarity
Modern Gestalt principles, Palmer (1999, 1992) and Palmer & Rock (1994):
common region, connectedness and synchrony.
Palmer and Beck 2000
using the repetition discrimination task that grouping effects how something looks and the speed we can process it at. Subjects were asked to look at images and press a button when two circles were next to each other or press a different button when it was two squares. When the two items were within the same group the reaction time was significantly quicker.
Figure ground segregation
Important step in recognition of object. The figure is more thing like and memorable, the figure is seen as in front of the ground, the ground is seen as unformed and extends behind the figure and the contour separating the ground and figure seems to belong to the figure. Symmetrical objects are often seen as figures
Limitations of gestalt
more descriptions. in some cases, difficult to say what constitutes the “simplest” or “best” form or how “good continuation” should be defined. good starting point though.
Triesman 1987, 1993, 1998
Feature integration theory
Steps in FIT
identify primitives (preattentive), combine primitives (attentive), perceive object, compare to memory and recognise object
Primitives
Curvature, Tilt, Colour, Line crossings, Line ends, Movement, Closed areas, Contrast, and Brightness. During visual search if the efficiency of this search is independent of the number of items presented, this is called popout. These are primitives.
Boundaries in FIT
if you have 2 regions containing different primitives a boundary will appear
Illusory conjunction
When a stimulus possesses two primitive features, they are sometimes combined inappropriately
Role of attention in FIT
Attention binds features together into objects
FIT main components and restrictions
Concerned with early feature extraction and processing. Highlights the role of attention but can only account for basic features
Biederman
Recognition by components
RBC basics
36 geons, invariant and discriminable from almost all viewpoints. ‘non-accidental properties’. Resistant to visual noise
RBC steps
edge extraction, parse image into regions of concavity and detect non-accidental properties, determination of components, match components to object recognition. when geons can be determined object can be recognised
principle of componential recovery (goldstein 2010)
if we can recognise an object from its geons
Biederman 1987
Removal of contours defining concavities affects object recognition
Biederman & Cooper, 1991
Priming of contour-deleted images. Visual priming seen if same geons are intact but only semantic priming seen otherwise
RBC problems
Doesn’t distinguish between different types of the same object. No direct evidence for geons, only evidence of a representation of that sort. Also neurons can distinguish between much smaller differences than geons. RBC may need to be refined to show how we distinguish between, for example, faces, that have the same features but look different (goldstein 2010)
Perrett and Oram, 1993
based on geons we cannot distinguish between two different finches
Marr theory and steps
computational approach: image, primal sketch (find contours), 2.5D sketch viewer centered, 3D model description (object centered), (find axes) object catalogue.
Marr Finding contours
identify sharp changes in contrast (not shadowns). Zero-crossings. Second derivative. Find when the rate of change of the rate of change is 0. Simple cells perform this as edge-detectors
Zero-crossings and 2.5D image
Images can be combined from a course to a fine scale to get a raw spatial sketch. Gestalt principles used to define where you have blobs, bars, edges and features. 2.5D
3D sketch
Carve up the image at regions of concavity to get a number of 3D primitives. Cones are the shape. Marr and Nishimara 1978. Hierarchical organization of primitives. You have to find principle axis of each cone and you can therefore identify the nature of the object. Recognition involves matching the 3D model against a catalogue of stored 3D representations. Object centered (regardless of viewpoint). Lengths and arrangement of the axes of the cones could distinguish between object classes
Marr problems and strengths
Only entry/basic level of object recognition
Could argue that you still don’t have a model to allow you to distinguish between different exemplars within the same class. theoretical approach. Utilises a lot of Hubel and Wiesel’s work on simple, complex and hypercomplex cells (goldstein 2010). Takes into account object in real world setting
Comparison of theories
Many of these theories explain different aspects of perception so each contributes to current theories. All of them have flaws
Tarr and Vuong 2002 pessimism
Fairly pessimistic, ‘the truth is, at present, that no single model can explain the range of behavioural, neurophysiological, and neuropsychological data that has been obtained under various conditions. In- deed, perhaps the most significant challenge to any theory is that human object recognition is so flexible, supporting accurate recognition across a myriad of tasks, levels of specificity, degrees of expertise, and changing viewing parameters
Tarr and Vuong 2002 two types of theory
One using three dimensional objects to match to three dimensional features of objects (Biederman’s RBC).
The other using 2D or 2.5D processes
Tarr and Vuong 2002 theories should have similar properties
Image must be decomposed into features that are recognisable
Coding of spatial relations of features
Multiple views for single objects to account for features at different viewpoints
Generalisation mechanisms that make the object more normal despite viewing conditions
Plasticity that can support recognition tasks ranging from the highly specific individual level to the categorical entry level