vision- theories of visual processing Flashcards
discuss classifications and types of receptive fields
Receptive fields can be unimodal e.g. excitatory only or bimodal some areas acting excitatory and others being inhibitory.
Receptive fields can be on or off center or ahve complex morphology e.g. color opponent cells
Area of receptive field inversely proportional to the acuity of that area.
E.g. fingers or small visual buckets near fovea
state the order of neurons in a pathway from photoreceptors to the thalmus
photoreceptors -> bipolar cells -> ganglion cells -> lateral geniculate nucleus (LGN)
explain how receptive fields can have temporal as well as spatial structure
Some continuous activation for a present stimulus
Some have transient response - detect change in stimulus
You can also have a non linear receptive field with no clear excitatory or inhibitory regions.
why are cat models not a perfect match for human vision
Cat and monky models are often used to elucidate human vision however neither is a perfect system e.g. cats have no colour vision.
explain the following terms
Simple cells
Complex cells
hypercomplex cells
Simple cells- Respond to flashed bars or edges
Complex cells- respond to a bar/shape at specific orientation in a certain direction
Hypercomplex cells- stimulus must now be a set length
within the 6 layered cortical structure where are simple and complex cells predominantly found
Layer 4 houses lots of simple cells while complex cells make up the surround.
What are visual receptive fields for?
- -receptive fields as templates ‘Fly detectors’ in frog’s retina (Barlow 1953) ‘Bug detectors’ in frog’s retina and optic tectum (Lettvin et al 1959) Hubel’s & Wiesel’s template matching model of visual cortex
- -receptive fields as ‘filters’ (Lettvin et al 1959) Lateral inhibition (Ratliff & Hartline) Mach Bands Spatial frequency analyzers (Campbell & Robson 1968) –FHS Part 1
- -receptive fields as “zero crossing” detectors (Marr & Hildreth 1980)
explain the theory of receptive fields as templates
Successive processing and combing creates increasingly complex interpretations from each neuron of a visual stimulus.
Receptive fields would converge creating larger receptive fields which does happen.
evaluate the template matching theory of receptive fields
- no one has ever been able to trace one of these suggested pathways.
–there is far more overlap, or convergence, between the LGN and striate cortex than implied by the model
–Feedback as well as feed forward not included in model.
–Timing- the latencies of responses of simple, complex and hyper-complex cells don’t conform to the sequence implied by the model
–Logical extension of theory is ‘grandmother cells’ which only respond to one specific stimulus clearly not correct.
Combinational explosion- responses are not as specific as they need to be but impossible to test all possible stimuli.
–too few neurons to encode all possible combinations of features (combinatorial explosion)
explain ensemble coding
best model is ‘ensemble’ encoding by means highly abstracted combinations of features
state the difference between additive and subtractive colour mixing
additive is with light subtractive is with pigments
explain the color matching experiment
Get participants to match light colour with pure wavelength light by mixing primary light
Primary light= cannot be matched by combination of two other primaries.
Inference humans are trichromatic
-> at the time thought that particles were omitted from the eye
explain the color matching experiment
Get participants to match light colour with pure wavelength light by mixing primary light
Primary light= cannot be matched by combination of two other primaries.
Inference humans are trichromatic
-> at the time thought that particles were omitted from the eye
peak wavelength absorbance of the blue cone
420nm
peak wavelength absorbance of green cone
534nm
peak wavelength absorbance of red cone
564nm
most common form of colour blindness is lack of red or green photo-pigments what are these disorders called and how might they arise
deuteranopia- missing green or protanopia- missing red resulting in difficulty distinguishing longer wavelengths of light; this occurs in ~2% of the population.
erroneous ‘ unequal homologous recombination:
Prevalence of this disorder is largely due to the high level of homology between the red and green photopigments (98% of aminoacids in each sequence are the same). The genes are also located in adjacent stretches of the X chromosome, this and the homology of the two photopigments gives a relatively high likelihood of errors arising during recombination.
what is a lack of blue photopigment called
Tritanopia or lack of blue cones comparatively is extremely rare, occurring in less than 0.01% of the population.
Explain the principle of univarience
Three photoreceptors needed as:
One photoreceptor would conflate a weak optimal signal with a strong sub optimal signal- metamulisum (gives identical response to visual system hence cant distinguish.
dicromatsalso have trouble distinguishing
Three cones is a trade off between acuity and metabolic cost.
More cones less likely to get metamers
why are we forveal dichromats
Dichromats not blue cones as prioritizes spatial acuity
Colour opponent channels:
and how can this be easily demonstrated?
red – green
blue – yellow
light - dark
some colours mix while others cancel each other out e.g. red yellow = orange, blue yellow =white
how is colour oponency achieved and how can it be modeled
This is achieved by LGN systems center surround frequency sampling
it can be modeled therefore on a 3D axis
what is colour consistency state some evidence for it
Where a perceived colour is the same irrespective of illumination
Lan’s ‘colour mondrian’ demonstration
how can colour consistency be explained
Local adaption colours are swapped to be consistent with static local precursors
Local corrections only happen in local part of the retina
helmhotz’s theory of unconscious inference; that our brain adapts our perception of the imagenbesed of what we know or predict.
state a problem with seeing in 3D
translating a 3d environment onto a 2d surface you lose information
An infinite number of shapes could give rise to the same retinal image
Distance itself is invisible
what are the 3 direct cues our brains can use to perceive an image as 3D
accommodation
vergence eye movements
binocular disparity
explain how accommodation can explain 3D perception
How far do you have to distort the lens of the eye to focus on the object
Ciliary muscles contract-> lens changes shape
Codes distances not depth
Only forks up to 2m
explain how vergence eye movements can explain seeing in 3D
compare disparity in angle of eyes when focused on an object to discern depth
Only useful to about 6m or less
Berkeley discounted as a viable source of information
explain how binocular disparity can cause perception of 3D
compare difference between two images on each retina to gage depth
codes depth not distance
3D red green glasses ar an example of how this disparity can be tricked to generate perception of 3D
Must be scaled by viewing distance to be interpreted correctly
name the pictorial cues that our brain can use to discern 3D elements of an image
Pictorial cues Interposition(occlusion)- nearer things preclude things further away
Shadow- shape from shading we expect light to come from above hence flipping this flips our perception. Cast shadows also provide a cue for height,
Aerial perspective- texture gradients height relative to horison
state a monocular cue for seeing in 3D
parallel lines convergence
explain the theory of indirect perception
Insufficient information in image
Therefore we must make inferences about our environment to perceive it
We use rules of thumb- heuristics to attain information from the image
explain the theory of direct perception
No ambiguity
There is more than enough information for direct pick up
One of the richest sources of information is movement and the changing image to the observer
Referred to as optic flow - J.J.Gibson
explain the motion paralax
With movement close objects exhibit greater relative image change than distanced objects.
Gradient of motion vectors that reside into the distance
Expansion or contraction- works the same as motion parallax
explain the kinetic depth effect
Rotation of an object enables decoding of shape. because of occlusions shadow ect
Stereokinetic effect
Change in a potentially 3d object enables us to infer shape
how did Gibson classify information about the outside world?
Extero-specific information (about 3D structure of the environment, objects and layout)
Proprio-specific information (about our own movements within the world) this can be demonstrated by The swinging room(lee) the room created distorted version of optic flow- observers would then compensate by moving with the room.
explain gibsons theory
Starting point in vision is optic array, spatial pattern of light
movements - transforming optic array
Identify invariants in field of view
Perception is direct
Affordances is the end product of the visual system how does the thing relate to the person what can be don with it
brain damage can produce optic ataxia what areas of the brain are damaged and what is this syndrome
Optic ataxia(parietal cortex damage)- able to understand orientation of objects but not adjust movement to orientation. trouble interacting with objects
brain damage can produce Visual agnosia what areas of the brain are damaged and what is this syndrome
Visual agnosia(occipital cortex damage)- unable to describe orientation of object(cannot recognize object) but able to interact correctly with it
what does optic ataxia and visual agnosia brain damage imply about visual processing
These two conditions seem to imply separate processing streams for perception and action
approaches to understanding perception
Psychological:
Psychophysics techniques for systematically probing the
characteristics of our senses e.g. measuring thresholds
webber
Phenomenology e.g. illusions and aftereffects (Lectures 4–6)
- Mach, Hering
Physiological:
Single cell recording e.g. Hubel & Wiesel (Lecture 2)
Computational Modeling of underlying process
explain Mar’s model of information processing
Step Computational goal (“Make explicit …”)
Grey-level representation … light intensities
Raw primal sketch … intensity changes
Full primal sketch … contours, boundaries
2½ D sketch … surfaces and their orientations
3D representation … 3 structure in object-centered
coordinates
how can the visual image be mathematically modeled?
The visual image can represented by a set of parameters • image = f(x, y, I, λ, t)
everything we see can be represented explicitly as some mathematical combination of x, y, I, λ, t ( and d, in binocular vision) e.g. an “edge” is computed as dI/dx
for a boundary between two areas of different luminance(I) how mathematically does a center surround receptive field function?
as a zero crossing sensor effectively taking the second derivative of the boundary
mathematically this is known as a Laplacian filter, 2, or Mexican Hat filter, or DOG (difference of gaussians) filter
in mars model to move to the full primal sketch from the raw primal sketch gestalt grouping mechanisms are used these include:
- pragnanz - the law of simplicity
- Similarity - group similar objects together
- Good continuation - join the dots by the smoothest path
- Proximity - group near things together
- Common region - group objects in the same region of space together
- Uniform connectedness
- Synchrony - simultaneous events perceived as belonging together
- Common fate - things moving in the same direction belong together
- Meaningfulness/familiarity - things that form recognizable patterns are grouped together
explain and criticize the theory of generlised zones as a way of understanding the world
break things down into general shapes we perceive as consistent
however responses of ‘face’ neurons exhibit nothing like this in macaque monkeys
criticize mars model of computational aproach
Marr’s model is bottom-up i.e. all the information needed is present in the grey-level image
bottom-up strategy doesn’t work
computer scientists have found it impossible to implement without using feedback from later stages in the model to guide processes in earlier stages
every level in the visual hierarchy in the brain projects back to the area form which it receives information (and sometimes lower ones too)
Marr’s model is far too simple