Object Perception LOs Flashcards
Template approach theory
compares input to a model/template previously stored in memory, stimulus categorized to be exact match.
PRO: successfully used by machines (eg. reading MICR numbers at the bottom of a cheque)
CON: cannot handle novel stimuli, cannot handle variations within a stimulus, too many templates required, cannot handle context
What are the theories of Object perception
Template approach, prototype approach, pandemonium, Marr’s approach, recognition bym components
Prototype approach
individual instances not stored, represented as prototype (abstraction of typical or next example of object) , categorization based on distance between perceived item and prototype
PRO: more flexible than templates
CON: cannot handle context
Pandemonium
stage 1: Image demon: gets sensory input.
stage 2: feature demons: analyze input in terms of features; each activated by its specific feature.
Stage 3: Cognitive Demons: determines which patterns of features are present, corresponding to known objects.
Stage 4: Decision Demon: identifies pattern by listening for the cognitive demon shouting the loudest
PRO: can identify a wide range os stimuli - just specify component features, feature-detectors physiologically relate to cells in the visual system
CON: doesn’t define features, cannot handle organizational principles (Gestalt Laws), cannot handle context effects, cannot be applied to 3-D objects
Marr’s Approach
defines object with respect too object itself (object-centered), determine objects primary axis using generalized cones (have an axis of orientation, certain location/centre of mass, overall size).
Create shape descriptions of object at diff levels of detail.
Each level of hierarchy contains info about: axes of cones, arrangement of axes of component ones, internal reference to 3-D description of component models
3-D model description: object centered, invariant over changes inm position of viewer (viewpoint invariance)
Object identification: finds match between 3-D model description and stored catalog of 3-D models of known objects
Specificity Index (level of detail): searches through hierarchy of stored info until info in model and in catalog hae same level of specificity - bottom up (eg. object -> biped -> human -> male or female)
Adjunct (subcomponent) index (whole-to-parts reference): relations info about components (location, orientation, relative sizes) to help determine object TOP DOWN (eg. human -> arm -> forearm -> hand -> David)
Parent (supercomponent) index (parts-to-whole reference): as each component is identified it provides info on what the whole object is likely to be TOP DOWN (eg. hand -> forearm -> arm -> human -> David)
PRO: doesn’t rely on catalog of features, is economical, handles variation and novel stimuli, allows for top-down processing, accounts for organizational principles (gestalt laws)
CONS: physiological evidence is questionable, identifies objects by gross features not details
Recognition By Components
assumes that the visual scene can be decomposed into constant basic elements, these components are called geons, different geons have different non-accidental properties (not an artefact of viewing position but rather reflection of property of world)
Principle of componential recovery: if an objects geons can be determined then the object can be recognized or identified even if the object is partially obscured
Edge extraction -> detection of non-accidential properties / parsing of regions of concavity -> determination of components -> matching components to ____ representations
PRO: has well-defined components, can handle novel stimuli and variation, is economical
CON: geons not always reliably determined, may be too broad (objects also differ in their details, is viewpoint invariant (objects are most easily identified from a canonical (typical) viewpoint
What is the basis of the structural description approach?
Is different from image-based models
Image-based models: traditional models of visual perception focus on analyzing aspects of 2-D retinal image (junctions, features, etc.), rely on viewpoint-dependent frame of reference, it is difficult to represent a fully 3-D world
Structural-description models: structural description is a set of symbolic propositions about a particular configuration, these are different in the picture domain (2D) but are the same in the object domain (3D), relationships among components are important (eg. brick joined at midpoint to another brick)
How does Stanley’s vision work? What is the current state of robotic vision in autonomous vehicles?
Stanley used environment sensors, positioning sensors and 6 Pentium M computers running Linux
Environment state consists of multiple maps that construct 2-D environment map
Stanley used environment sensors, positioning sensors and 6 Pentium M computers running Linux
Drivable area was determined by laser analysis projected into visual image
Extrapolation made to similar visual areas out of laser range -> vision initially classified grass as non drivable (green area) until lasers scan it and conclude grass is drivable then all grass areas in visual range reclassified as drivable (red area)
Data continually evaluated by a learning algorithm which can adapt to new terrain
Vision not used for steering control but for velocity control
NOW: Waymo autonomous vehicles, Tesla enhanced autopilot
Consumer Technologies: autonomous cruise control, automatic parking, lane departure warning, pre-collision breaking and throttle management
Current technology: no single sensory currently equals human visual perception, some sensors have capabilities that human drives do not (sensing through fog with radar), equaling or exceeding human sensing capabilities requires a variety of sensors whose data must be integrated to form a unified representation of roadway and environment