Lec 6- Flashcards

1
Q

In general as we move up the hierarchy what happens to receptive fields retinotopic maps.
Unlike p and m cells do cells in higher visual areas respond to linear sum so spots of light?
And what is the current thinking. And just explains diff parts so v1, and as we go higher specialised

A

In general, as we move up the hierarchy, receptive fields become larger, retinotopic maps become less precise or even absent, and the type of stimulus that visual neurons prefer becomes more elaborate.

In particular, unlike P-cells and simple cells, many cells in higher visual areas do not respond to the linear sum of (spots of) light presented within their receptive fields but respond strongly only when stimulated with specific stimulus configurations.

As you can imagine, classifying these cells is a very difficult thing to do because it is difficult to discover what the appropriate stimuli are.
Thus, current thinking is that V1 is a preliminary area, where information about all visual sub-modalities (luminance, colour, movement, stereopsis) is represented. From here, the cortex becomes more specialised, with separate ‘streams’ projecting to different areas, each of which specialises in a different type of information (e.g. colour in V4, motion in MT).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

V1 revision AREA 17

And what is blindsight and what does this mean for seeing and v1

A

Evidence from single cell recordings, optical imaging and various anatomical staining and labelling techniques indicate that: 1) V1 is retinotopically mapped, 2) has columnar organisations for both orientation and size (spatial frequency), 3) has cells selective for wavelength (blob-cells interspersed amongst interblob cells), 4) has some cells selective for direction of motion, 5) is organised for ocular dominance, 6) has cells classified as simple, complex and end-stopped (hypercomplex). Cytochrome oxidase stains for blobs in v1 and and thin stripes in v2.

5 Blindsight
Individual case studies of patients with brain damage to V1 suggest that intact V1 is required for conscious perception. Because such patients claim that they cannot see, but usually have intact retina and LGN, they are sometimes described as ‘cortically blind’.so consciously cant see but subconsciously they respond

However, this does not mean that V1 is necessarily the site of ‘seeing’ (indeed, growing evidence suggests it is not), but that for conscious percepts, visual information must first pass through V1 and perhaps then on to further areas.

But not all visual information must pass through V1 to get into visual cortex. There are minor routes to other visual parts of the brain that project directly from sub-cortical areas. These other routes are probably the substrate for the ‘seeing without visual awareness’ phenomenon known as blindsight. These pxs have no v1 no conscious vision but are good at discrimination tasks so can tell you if light flashed was upper or lower eventhough they insist they cant see it. So v1 involved in processing visual data but not necessarily involved in seeing bc they tech can see subconsciously.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

V2- where does this area receive most input from so what does this mean for retinotopic mapping. Rfs vs v1 and what are most of the cells.

Illusory contours and an example of an illusion kanizsa

A

This area is the second visual area in the hierarchy (also called area 18 or pre striate cortex) and receives the most substantial part of its input from V1. Its point-to-point connection with V1 means that, like V1, it is retinotopically mapped, though with slightly less precision than V1. Receptive fields are typically larger than those found in V1, and most cells are complex cells.

Many cells in v2 respond to illusory contours.
In the kanizsa triangle a white upward pointing triangle is seen superimposed on background elements yet there is no luminance cue for the boundary of the figure so the contours are illusory. There is also a weak brightness illusion here so the triangle in the foreground appear lighter than the background. In the vertical ones you can see that the colours are all the same white as now the illusion of the vertical bar is abolished when gaps in inducing elements are closed so it isnt whiter than white.

This is bc the brain tends to perceive incomplete figures as complete or closed and prefers to perceive continuous and smooth contours rather than fragmented or discontinuous ones. Fills in the missing parts.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Cytochrome oxidase strips
What do v2 stain for

What are cells witin thin and thick stripes important for
Where do they all receive input from
What are interstripes important for

A

Like V1, V2 also stains for cytochrome oxidase (CO). In V2, CO rich areas form ‘thick stripes’ and ‘thin stripes’ with intermediate unstained ‘interstripes’.

Cells within thin stripes appear to be important for colour and receive projections from the blobs in V1
whereas cells in thick stripes appear to be important for stereo vision (see later lecture).
The interstripes receive input from the interblob regions of V1 and are thought to be important for spatial form.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Visual streams= DORSAL/ VENTRAL PATHWAY= where and why
What and where stream

Evidence- are these two streams linked or segregated
Functional aspects of these streams how do we know that where is__ and what is__

A

pair of parallel processing streams through visual cortex. One of these (the ‘what’ steam) projects to the temporal lobe (IT) and deals with object structure. The other (the ‘where’ stream) projects to the parietal lobe (e.g. Area 7a, which also receives input from the pulvinar) and deals with object location and spatial relations.

There is good evidence for segregation between these two streams. Baizer et al (1991) injected large amounts of two different retrograde tracers1 into monkey brains. One tracer was injected into part of the parietal lobe and the other into part of the temporal lobe. Upon inspection of earlier visual areas, they found almost no neurons that contained both tracers.

Evidence for the functional aspects of these streams comes from clinical case studies of damage to the relevant brain regions. For example, damage to the parietal lobe (the ‘where’ system) can cause patients to have problems orienting towards or reaching for objects (a condition known as optic ataxia) but does not impair performance in object discrimination and recognition tasks.
On the other hand, damage to the temporal lobe (the ‘what’ stream) can result in patients having problems with form perception and recognition, and visual memory, but spares performance on visuospatial tasks. So parietal= tasks temporal= what is it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Once again ab the dorsal/ vernal
What are they where do they project to

Damage here

A

Where what pathway
Dorsal= where so parietal lobe. Ventral= what so temporal lobe

Dorsal- Projects from v1 to v2 to mt= v5. And extends to area 7a
Ventral= projects from V1 to V2 to V4 and to IT

Parietal=dorsal
Damage to parietal lobe so damage to some of the dorsal pathway can result in optic ataxia like in last flashcard. Unable to grasp orientated postcard but able to report its orientation bc temporal lobe still works. See world but cant act upon the world. The dorsal stream guides how we do actions.

Damage to ventral= temporal
Visual agnosia. Inability to recognise objects
Or prospagnosia

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

V1 vs IT

And damage to the IT or ventral stream.
IT and faces, how are faces encoded

A

V1 cortical amps. Doesn’t know anything about top down processing it looks at diff edges. Orientation distributions it just encodes info about local orientation or features. Cant tell you where parts of that flower are.

IT- don’t have a retinotopic map anymore, this tells us more about the structural components so the edge is there or the stem is there etc. columnar organisation of IT= elaborate cells. region of IT responded only to faces in an experiment. Faces encoded by distribution of activity across population of cells and it is that pattern of activity that is specific to one individual face so 100 cells are needed for one individual.

Prosopagnosia typically arises from damage to regions within the ventral stream of the brain, in the inferior temporal cortex (IT). This condition is characterized by an inability to recognize faces, including those of familiar individuals, despite intact visual perception in other areas. not solely due to damage in the IT cortex; other areas within the ventral stream. Prosopagnosia can result from various factors, including brain injury, stroke, or neurodevelopmental conditions.

-within class identifications cant tell what it is eg type of frog or rope of car. Some people may recognise smth better than others so need fair comparison. Some people cant identify their cows and birds as well as they used to so maybe this over faces.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Area 7a

A

This area is located in the posterior part of the parietal lobe and receives information about head and body position as well as visual input.

RFs are large and often span both visual hemifields. So called gaze-locked cells have been reported in this area. The key requirement for these cells is that the animal must be gazing in a particular direction. If the direction of gaze is inappropriate for a cell, then no RF can be found. Thus, these cells will respond to objects only when an animal is gazing in the ‘right’ direction, with different cells preferring different directions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

V3
Is it retin mapped, describe rfs.
What are cells in v3 selective for
Is it involved in colour processing and why

A

V3 is retinotopically mapped (though less precise than V2) and receptive fields are larger than those typical in V1 and V2. Cells found that are selective for isoluminant contours.

Cells in V3 are markedly selective for orientation and motion, receiving input via the transient magnocellular pathways from V1 and V2. In V3, the form is defined by luminance variations that just happen to be moving. Found pattern selective cells in v3 similar to those in MT.

Early reports suggested that V3 was not involved in processing colour, though this view has since been challenged. For example, Gegenfurtner found cells (similar to those found in V4) that responded to so called ‘isoluminant’ stimuli containing chromatic modulation alone. Such stimuli might contain, say, red & green stripes, but if photographed using black and white film would appear featureless so uniform grey. If cells responded to these kinds of stimuli they must be seeing the chromatic modulation which means they must have wavelength selective properties.
So isolumiannt and boundary defined by colour but v3 can respond as luminance is the same it differed only in colour.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

MT- middle temporal cortex
What are these sensitive to
Rfs
What does damage here lead to
What is mt also involved in

Unlike v1 difference

A

All of the cells in this area are sensitive to direction of motion.
They have fairly large receptive fields and are not concerned with the colour of the stimulus.

Unlike V1, the cells in this region have a columnar organisation for motion. Some cells in this area appear to be sensitive to relative motions and could be used in computing form and depth from motion. Furthermore, some of these same cells are also sensitive to stereo disparity, suggesting that depth cues from motion and stereopsis may be combined here.

MT is probably also involved in grouping together individual motion elements to achieve a representation of object motion. In v1 only orientation selective So, in V1 there are no ‘vertical’ cells that will respond to the compound stimulus described above. However, in MT, some of the cells that prefer vertical component motion will also respond to the compound stimulus in our example. These cells are called ‘pattern’ cells and correspond more closely with our actual perceptual experience of motion than do the motion selective cells in V1.

Clinical lesions in MT lead to a condition known as akinetopsia (motion blindness). In this very rare condition, patients appear to have no visual representation of motion. They can recognize objects and understand spatial layout but cannot see things move.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

MST- medial superior temporal cortex vs mt

A

MT is located in the medial temporal lobe of the brain.
Mt also known as V5, is primarily involved in processing visual motion. It is sensitive to the direction and speed of moving visual stimuli. Neurons in MT respond preferentially to motion in specific directions and speeds, making it crucial for motion perception and tracking moving objects.

MST is located adjacent to MT, also in the medial temporal lobe.
MST is involved in processing complex motion information, including the perception of optic flow (the apparent motion of visual elements in a scene as an observer moves through it) and the perception of three-dimensional motion. MST integrates information from MT and other visual areas to provide a more comprehensive understanding of visual motion and spatial orientation. Some cells selective for complex 2d motion eg: expansions, contractions, rotations, spirals.

MT is primarily responsible for processing basic visual motion, such as direction and speed, while MST is involved in processing more complex aspects of motion perception, such as optic flow and three-dimensional motion.
- Specificity: Neurons in MT are highly tuned to specific directions and speeds of motion, whereas MST neurons are more broadly tuned and respond to a wider range of motion stimuli, including complex patterns of motion.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

V6 parieto occipital visual area

A

V6, also known as the dorsal medial area (DM), is a region within the visual cortex of the brain. near the junction of the parietal and occipital lobes.
V6 is part of the dorsal visual stream, which is involved in processing spatial information and guiding actions based on visual input. Specifically,
Partial occipital visual area= PO
Gaze locked cells

V6 is thought to play a role in processing motion and integrating visual information with other sensory inputs to aid in spatial perception and navigation. While not as extensively studied as some other areas of the visual cortex, V6 contributes to our understanding of how the brain processes visual information to navigate and interact with the environment.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What have some cells in v6 been called
Please explain this properly below

A

Real position cells

Real position cells are neurons in the brain’s visual cortex, specifically in the V6 region. Unlike other neurons whose receptive fields move with eye movements, real position cells maintain their receptive fields fixed in the real world. This means they respond to objects in a consistent position relative to the observer’s body, rather than relative to the position of the object on the retina. They help provide information about the stability of objects in the environment as the observer changes gaze direction.

V6 responsible for representing space in egocentric coordinates as opposed to far less useful retinal coordinates of v1 and v2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

V4
Retinal map and columnar organisations or colour
Non selectivity for lgiht wavelength
Damage to v4
Specific visual loss
Chromatopsia from co posioning why may cv remain intact
Why are blob cells in v1 more resistant to damage from carbon monoxide poisoning
Is v4 just colour processing what else is colour processing

A
  1. Retinal Map and Columnar Organization for Color: This refers to the structure of the visual cortex, where cells are organized in columns and process specific aspects of visual information. In V4, there’s a particular arrangement for color perception.
  2. Non-selectivity for Light Wavelength: Some cells in V4 aren’t picky about the color of light hitting the retina, which is unusual because one might expect them to be highly specific.
  3. Achromatopsia from V4 Lesions: When V4 is damaged, patients experience achromatopsia, where they see the world only in shades of grey, indicating the critical role of V4 in color vision.
  4. Specific Visual Loss: Unlike damage to other visual areas like V1, which would result in complete blindness, lesions in V4 lead to selective visual deficits, highlighting the specificity of each visual area’s function.
  5. Chromatopsia from Carbon Monoxide Poisoning: In some cases of carbon monoxide poisoning, color vision remains intact, but patients struggle with fine detail and object recognition. This suggests damage to specific regions like the interblob areas in V1, while sparing other regions like the blob areas.
  6. Metabolic Activity and Blood Supply: Blob regions in V1, which are metabolically active and receive a larger blood supply, may be more resistant to damage from carbon monoxide poisoning compared to other regions.
  7. V2 and V4 Connectivity: Thin-stripe regions in V2, involved in color processing, project to V4. However, V4 is not solely dedicated to color; it also processes complex spatial patterns and forms.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Colour constancy

A

Color constancy is the ability to perceive the color of an object as relatively constant despite changes in lighting conditions. Essentially, it’s the phenomenon where the perceived color of an object remains the same under different illuminations. For example, a red apple will still appear red whether it’s viewed under natural sunlight or artificial indoor lighting, even though the wavelengths of light reflecting off the apple may vary. This perceptual stability allows us to recognize and identify objects accurately in varying environments.

Wavelengths that reach our eye are dependent on: spectral composition of illumination, and spectral reflective properties for the surface. Mainly interested in second. Human vision appears to be able to disregard the illumination and deliver results relating to just the spectral reflect acne of properties of surfaces.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

A bit more on colour constancy mistakes and tungsten light

A

Spectral composition of tungsten lighting is much more biased towards the long wavelengths so red than its natural daylight.

Vision can make small mistakes due to illuminant eg shop artificial lighting . Maybe colour constancy is bc of top down processing so our brain telling us bananas are yellow. On other hand it is the average estimate of the wavelength over the visual scene if constant then cancel out. Estimate illuminant and analyse object.

Cells in v4 large rfs 30 times larger v1= colour constancy effect.
The ability to discount illuminant= colour constancy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How is colour constancy achieved

A

Top down
Bottom up= comparison of wavelengths across large areas of retina and uses this to estimate spectral composition of illuminant and then discounts it by subtraction.
Some cells in v4 have this property large rfs pulling info from large area but delivers it about a small patch.

Destroying colour constancy gets rid of that averaging so you see it insolation and you see the true wavelength. So wavelength we see is a combination of averaged illuminant and the wavelength of the object. The wavelength we need to block the averaging and look in isolation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Why is colour constancy not perfect

A

Cloth under artificial shop light greens and blues
Extreme or unusual lighting eg nightclubs

We have it to deliver a constancy about the perception of the world so surfaces look same colour irrespective of the illumination. But visual system does have to make some assumptions ab the world so can be misled.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Diff areas and names

A

V1= area 17
V2= area 18
V5= mt
Parietal lobe= 7a
It= ventral stream. V6 is Dorsal stream or po

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is simultanagnosia

10 distinct response levels for 5 visual neurons how many cells code for how many different faces

A

Condition where people are unable to see more than one object amongst a group of two or more closely placed objects.

Each neuron has 10 distinct response levels
5 so each one multiples it by 10.
10 times 10 times 10 times 10 times 10= 100 000 different faces

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

As we move up the cortical visual hierarchy which is false about visual neurons

And which are true

A

Their retinotopic maps become less precise- true as go from highly ordered in v1 to less as higher complex processing
Their rfs become larger true- as integrate more info from larger regions of vf
Their stimulus selectivities become more difficult to characterise- complex and varied responses more abstract cant really characterise
Their stimulus sens= more refined- trye as more selective

False- it becomes possible to determine their responses to compelx stimuli by knowing their responses to small spots of light-
False as insight but cant get the full complex response from simples stimulus shown.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Ungurlieder and mishkin proposed number of major visual streams in visual cortex is
Milner and Goodall proposed which dichotomy for primate vision

A

2= dorsal and ventral
Action/ perception which is a reworking of the what/ where dichotomy.

Zeki argued that what and where doesnt provide good insight into function of brain.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Action vs perception

A

Milner and Goodale suggested radical re interpretation of cortical visual pathways. Where should be relabelled action and what should be labelled perception. Enabling organism to rapidly interact w environment.
Perception- what it needs to do so ooh hole dont go in there
Action- your legs moving away.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Visual illusion called titchener circles
Evidence for disassociation between action and perception

A

Apparent different sizes but they are the same left and right.
When a circle is around smaller circles it looks bigger and when circle around bigger circles it looks smaller.
Spatial contrast effect and the surrounding ones make middle ones change.

Perceptual effect so the ventral system. Perceptual guides illusion and action system is one that guides actions.

Ventral= what= action
Dorsal=what=perception

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Which of the following is true about the two pathway model of vision beyond v1

Does dorsal stream make rapid decisions

A

The ventral stream is thought to be involved in the perception of motion- no this is dorsal system where motion
The ventral system is thought to be involved in directing visually guided actions on the world- no ventral is identification and recognition of objects
The dorsal system is linked to inability recognise faces- ventral
The main distinction between dorsal and ventral stream as is that ventral streams is concerned with stationary objects but dorsal is concerned with moving objects- dorsal is motion= where. Ventral is object identification and recognition.

True- the dorsal stream is thought to be involved in rapid decision making processes that do not necessarily link to conscious visual perceptions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Which is generally false

A

Different parts of primary visual cortex separated by few mm or so encode image data from diff regions of retina= true
Different parts of it separated by few mm or so encode diff types of image features true
Neighbouring columns of cells in mt are selective for diff directions of motion= true
Neighbouring columns of cells in v1 are selective for diff orientations of luminance contour= true

Different cells in v4 selective for each one of the hundred of diff colours tht we can see-= false

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

According to zeki which one of the following visual areas is not well accomodated by the what where theory

A

V3

28
Q

David marr- one way in which human vision may work
Mars three levels of understanding information processing- computational approach: split into 3 further things

Computational approach to vision and computational framework for achieving object recognition retinal image to object recognition.

A

Computational theory- sets out the theory behind the problem. Applicable to understanding both the behaviour of visual neurons and psychophysical behaviour.
Thinking- theory level of understanding so maths may involve a computer but doesnt demand it. Understanding and defining the mapping of process investigated, what is goal of computation?

Algorithm- procedural details of how it is to be solved. Limitations imposed by visual environment. Understanding of laws.

Implementation- machinery upon which the algorithm is to be implemented so our brains etc. what info is available at the outset of the computation. For any given algorithm could be several diff ways of implementing it.

29
Q

Contstraints
2 ingredients of computational theory

A

No two objects can occupy same position in 3d space at same time
Assumptions and generalities but important to help to tell distal from proximal stimulus.
Matter is generally cohesive
Lighting usually comes from above
Five disc experiment-

Specification of the problem. Does human vision meet criteria
Understanding environment constraints

30
Q

What is computer vision

What is computational model of biological vision

What is the computational approach to vision

A

Making computers or machines or robots see. Using visual info to drive behaviour inside the machine. May inspire experimentation in human vision.

Comp model of biological= run some algorithms on computers to see if it behaves in the same way as biological algorithm. This is to understand biological vision. Getting computer to see but this isnt computer vision as you’re not getting them to see smth useful you’re running tests on them to see how our own biological system works. So we do this to understand opuses.

Comp approach- acknowledging Mars 3 levels

31
Q

Bulbous blob

A

Assumes that lighting comes from above so each blob appears to protrude towards us. Bulbous shapes look bulbous so visual system gets it right so using the assumption to help comp processes.

Now bulbous blob looks like indentations when the lighting is put down below so vision gets it wrong.

Some of these assumptions are important in helping us to constrain our interpretation of the retinal image and most of times its good and correct.

32
Q

Five disc experiment
What does it provide a general illustration of= well posed or ill posed vision

5 disc experiment rule of thumb

A

If we put these five d​isks in box black we cant see and we look into the box using one eye peeping into it with peephole and we can see 5 disks. All appear to be the same size+ same distance below because they are.

But if we make it smaller so half and halve its viewing distance so its closer so we halved the viewing distance so it becomes twice as big at the back of our eye but we also halved the size so it’s the same. The retinal image size stays exactly the same as it was before so disks appear to be the same size again.

Move further back and double image size= same

etc there is an infinite number of arrangments that will produce exactly the same retinal image. so if we manipulate distance and size retinal image size can be the same and will look same to observor.
So retinal image we have in any instance is consistent with any one of numerous interpretations of distance and object size. Visual system gets it wrong as the world is different to its perceptual experience w assumptions.

The 5 disk experiment provides a general illustration of the ill-posed nature of vision so more than one solution eg 5 disk experiment. If a problem is well posed, there is a unique solution. In general vision is ill posed- eg there was multiple solutions to disk problems so must be making assumptions and performing computations to derive its consistent solutions. We stick with solutions but can flip.

5 disk experiment rule of thumb- if images are adjacent on the retina and of the same or similar size then presumably the distal stimuli (real objects) are also the same size. Chances of it being random and it producing a same size on retina is unlikely.

33
Q

Understanding 3 levelled of computational approach with non biological example-
Cash register

A

1- computational theory- calculate cost of goods addition maths etc. it doesnt matter what order you pay= commutativitity. Just the overall price= associativity.
2-Algorithm- prices of goods are transferred and cash is output the algorithm uses pounds and pence to.
3- implementation- originally mechanical but now tech to do this and implement algorithm on made made machine eg pulley etc.

Each one of 3 levels is indecent of each other. So comp theory does not dictate algorithm or the process by which input can go to output can be more than one solution to problem that works equally as well so eg five disc experiment.

34
Q

Mars 3 levels of representations now- not 3 levels of understanding information processing.
Raw and full primal sketches

A

Retinal image to first level the primal sketch= list of basic image elements bars blob and their properties contrast orientation position. Rich symbolic description of image.
Full primal sketch- list of specific parts of the image and their properties grouped together. Larger structures group together with grouping laws.
2 and 1/2 d sketch- input to the 3d model.binocular disparity, perspective, optic flow in 2 and 1/2 sketch.

3d model- input from 2-1/2d sketch. Here objects described as solid entities using axis based volumetric and surface primitive.

35
Q

Theory about the 3 stages
Is information processing top or bottom up

A

Cube- 3d we see as final stage. But before that we need angles orientations amylases that into a 2d description.

2d- shape depth a dn surfaces
Then we see the cube so computations have to take place.

Information processing is driven bottom up from retinal iamge through raw, full primal sketch then 2 =1/2d and then full 3d model.

36
Q

Symbolic coding
What properties does each symbol have
Discreet symbols over continuous symbols

A

Symbol= label. Examples dude 3d model surfaces 2-1/2d sketch and edge is a symbol for the raw primal sketch and these symbols are the primitives at each of the 3 levels- building blocks.
Edge may be a primitive at level of raw primal sketch.

Each symbol has properties- eg length, orientation, sharpness, contrast

Discrete Symbols vs. Continuous Analogue Signals:
1. Continuous Analogue Signals:
- These signals, like the retinal image, can vary continuously (e.g., light intensity at different points).
- They represent data in a variable, non-discrete manner.

  1. Advantages of Discrete Symbols:
    • *Explicit Features**:
      • Discrete symbols simplify complex information by highlighting essential features.
      • Instead of considering every pixel’s light intensity, the visual system uses symbolic codes, such as edges, to build a 3D model of the object.
      • Edges define surfaces, and the connections between surfaces help to identify the object’s shape and structure.
    • Data Compression:
      • Discrete symbols reduce redundancy by focusing only on important aspects of the retinal image.
      • Unnecessary details are discarded, making the system more efficient.
      • This approach allows more information to be processed and stored in a limited space, similar to data compression in digital storage like iPhones.

Discrete symbols offer two key advantages over continuous analogue signals like retinal images: they make important features explicit and enable data compression. By using symbolic codes such as edges to build 3D models, the system emphasizes essential aspects and discards redundant information, improving efficiency and allowing more information to be processed and stored.

37
Q

What does biological vision serve as proof of
What does computer vision science aim to do

A

BV serves as existence proof that certain visual problems are computationally tractable.

Computer vision science aims to solve problems in vision.

38
Q

What is the computational approach to vision applicable to

A

Applicable to understanding both the behaviour of individual neurons and psychophysical behaviour.

39
Q

What is computational level and algorithm and implementation level again

A

Comp level- what the system acc does and why. What it defines the problem as. So requires a specification of the visual problem that is to be investigated. And the goals of computation. What needs to be detected basically. Involves analysing the constraints imposed on problem by visual environment.

Algorithm is how the computation is carried out so specifies algorithms and is the particular way in which input converted to output. Does require software to be written but the concept of what algorithm is is independent of whether or not its written.

40
Q

Symbolic codes and analogues codes

One aim of the primal sketch

A

Digital code= discreet and symbolic. Symbolic baso gets rid of all crap and data compression.

Analogous= continuous signals not symbolic. Sensitive to change and subtle variations.

One aim of primal sketch is to produce a symbolic description of the stimulus rather than an analogue description.

41
Q

according to marr, neuroanatomists and psychophysicists. What do they address

A

Neuroanatomists address primarily an implementation level of vision.
Psychophyscisits address primarily algorithmic level of vision.

42
Q

Where are certain features made explicit

A

Raw primal sketch= where basic features eg edges and gradients are made exploit.
Full primal sketch= where visual texture or shading colour are made explicit.
2D sketch- representation of scene in 2d. Spatial layout
2-1/2 d sketch- clusters of objects grouped together
3D- grouping and also 3d position now. Depth information computed can happen earlier as well.

43
Q

Raw primal sketch-
Goal, whaat does it do.
What do boundaries and markings tend to show up as.

What is a reasonable starting point

A

The goal of the Raw Primal Sketch (RPS) is to provide the richest possible description of the image. Object boundaries, surface markings and shadows in the external world are clearly a good thing to know about and so the RPS must provide a description that makes these things easy to recover.

Boundaries and markings tend to show up in images as luminance changes, so a reasonable starting point is to figure out how we might detect and locate luminance changes.

Fig 1 is an image of the Mona Lisa along with a 3D plot of its luminance profile. The job is to find the locations and orientations of luminance changes in this luminance profile. So, in this lecture we will focus on how ‘edge’ primitives can be found and written into the raw primal sketch. Retinal image starting point how do we go from that to representation of stimulus created with basic features like edges.

44
Q

Scales
Luminance changes at multiple scales

A

Luminance change s in images occur at variety of spatial scales. Shadows gentle changes in luminance at borders object boundaries abrupt changes.
Vision has to estimate the boundaries of shapes and objects. We can see outline of objects. Diff info is carried at diff spatial scales. Visual analysis must be performed at different spatial scales and we can address this from a computational point of view.

Diff spatial scales= diff types of info. Coarse spatial scale= contour of cats arched back and finer scale= contours of individual hairs on cats back.

45
Q

Gaussian weightings to detect all these changes

A

going to detect all these changes, we need to construct analyzers that work at different spatial scales.
One convenient way of doing this is to blur the image with Gaussian weighting functions, where the spread of the Gaussian function determines the spatial scale: large spreads produce a lot of blurring (i.e. a lot of spatial averaging) so that large spatial structures (i.e. low spatial frequencies) are emphasised, whereas narrow spreads introduce less blurring, and so finer detail (i.e. high spatial frequencies) is retained

Thus, broad blurring functions are poor for revealing spatially rapid luminance changes (e.g. a pile of matchsticks) but good for analysing slow luminance changes (e.g. the border of a cloud), and vice-versa for narrow blurring functions.

Fig 2 shows three Gaussian weighting functions, where the spread gets broader from left to right. In Fig 2a, the x-axis is a spatial dimension and the y- axis shows ‘weight’, or equivalently responsiveness. In practice we would want to employ a far greater range of scales than those shown here. For example, we might cover the range 1:32 or more, rather than 1:4 as shown.
Fig 2a shows our weighting functions in one-dimension (the spatial dimension is shown only by the x-axis), though it is easy to envisage a two- dimensional version of the Gaussian function. Simply imagine rotating the function through 360° around a vertical spindle. A volumetric representation of this function would look a bit like a rather curvaceous traffic cone. Such functions are sometimes called ‘Gaussian Blobs’. Fig 2b shows how these look when viewed from above.

46
Q

How does convolution work

A

Convolution is a process where a weighting function, which can be thought of as a receptive field, is applied to an image. The weighting functions, as shown in Fig 2, have only positive lobes and are depicted in profile (Fig 2a) and in 2D from above, with weights expressed as grey levels (Fig 2b). To perform convolution, the weighting function is placed at each position in the image, and the response is calculated by considering all the pixels beneath it. This response value is then written into the corresponding position in the output image, and this is repeated for every possible position, generating the output image. Broad Gaussian functions, which average over a larger area, introduce more blur compared to
narrower Gaussian functions.

Fig 3 illustrates this with images of the Mona Lisa convolved with 2D Gaussian functions at three different spatial scales. The narrowest scale, resulting in the least blur, is shown in the top right panel, while the broadest scale, resulting in the most blur, is in the bottom right panel. Different image details are preserved at different scales: broader scales emphasize large structures, like the border between trees and the sky, while narrower scales retain finer details, such as the eyes.

This process demonstrates how convolution with Gaussian functions at various scales can affect image clarity and detail, preserving different types of information based on the chosen scale.

47
Q

Convolution vs filtering

A

Image and filter it via we get output image via convolution
Convolution is just one way in which filtering can be achieved. Filtering si removing input so finer detail in this case.

Images at diff spatial scales one for each weighting function.

More blurred version= overall impression of the scene
Fine= details
Coarse= location generally.

So generate Gaussian weighting functions and then these blur image via convolution which filters image at diff spatial scales.

48
Q

Finding edges-= luminance changes
Why blurred step edges

Step edge

A

Step edge= where luminance changes most rapidly. Gradient of luminance is the steepest. Differentiate and tells us ab the peaks.

Why blurred- by time image through eyes optics may be blurry and edges can be blurry in the real world.

Mona Lisa in Fig 1 is an example of a luminance function of [x,y]. Fig 4 is another example of a 2D luminance profile, this time, a blurred step-edge.

Warning: don’t confuse the representation of the proximal stimulus in Fig4, with the (physical) distal stimulus that gave rise to it. Fig 4 does not represent a physical step, but rather the luminance profile of a light-dark border. A physical step, lit from above, might give rise to this kind of luminance distribution, but so would an optically blurred image of a white piece of paper against a black background.
]

49
Q

Deviation of edge location

A

A reasonable definition of the spatial location of a luminance change (i.e. the x-value at which the edge is positioned) is the position where f is steepest (i.e. where the luminance profile has maximum gradient).

(In Marr’s terms, this is part of the computational theory of edge detection). Fortunately, there is a simple mathematical procedure that we can use for calculating a function’s gradient. It’s called differentiation—a general procedure for calculating rates of change. a mathematical technique for computing gradients of smooth functions.

50
Q

First derivative
Second derivative

A

Gradient of function at that position of function= first derivative gives us rel steepness of luminance profile at each x location. But biological vision may differentiate for a second time.

To differentiate again we compute the gradient of the first derivative so calculate gradient of function. The function has both pos and neg values as slope is positive to left and neg to right of peak. Point at which second derivative passes through zero= peak in first derivative so it all shifts. Luminance changes can be found by locating zero crossings in 2nd derivate of the lunminance waveform= importnat for Marr and hildrehts edge detection.

51
Q

Edges at other orientations what could we use instead

A

We might propose a set of differential operators that compute derivatives in several different directions. Indeed, this is the solution that is offered by some machine vision algorithms (e.g. Canny, 1986).

However, it would be nice to reduce computational burden by finding a single operator that can reveal edges at all orientations. It turns out that an operator that sums the second derivatives computed in both the x direction and the y direction is just such an operator (Marr & Hildreth, 1980). This operator is called the Laplacian operator.

52
Q

How do we implement all of this

A

Conveniently, we don’t need to blur and differentiate the image separately. I
f we first differentiate the Gaussian blurring function to create a new function (where ( G(s) ) is a Gaussian with a spatial scale ( s )) and then convolve the image with this new function, it’s equivalent to blurring the image first and then computing the second derivatives. This simplifies the process significantly.

So does the first and second derivative together. So differentiate with this adn then convolve and blur

53
Q

Edge assertions in raw primal sketch

A

The receptive fields of retinal ganglion cells have evolved to emphasise luminance changes in the image, but they don’t actually detect them.

The Raw Primal Sketch is a symbolic list, so how might the measurements provided by the retina and LGN be turned into the symbolic edge assertions that are the elements of the Raw Primal Sketch? In Marr’s scheme, this is accomplished in the early stages of cortical processing as follows

54
Q

Summary-computational theory, algorithm, implementation

A

Computational theory: Luminance changes in the image correspond to object boundaries, markings (e.g. zebra stripes) and shadows. Thus, the task is to detect and locate the luminance changes in the image which, by definition, are local maxima in the luminance gradient of the image. Thus, the input is the 2D luminance array of the image and the output is a list of edge tokens, labelled for position and orientation.

Algorithm: The Laplacian (Ñ2) can be used to reveal zero-crossings which correspond with maxima in the luminance gradient. The Laplacian is an isotropic second-order differential operator—i.e. it computes the second derivative of the image in all directions. Because it is an isotropic operator (i.e. circular) it will reveal edges at all orientations. Because luminance changes occur at several spatial scales, several different sized operators are required. This can be achieved by appropriately setting the scale of the Gaussian blurring function (G[s]). The Laplacian can be applied to Gaussian blurring functions of different spatial scales to produce an appropriate set of weighting functions, sometimes called convolution kernels’ (in the case here, Ñ2G[s]). Edges are asserted in the raw primal sketch at image locations where ZCs are found in at least two neighbouring scales after convolution.

Implementation: The Laplacian of a Gaussian (Ñ2G[s]) is approximated by the centre-surround receptive field profiles of visual neurons such as those found in the retina and LGN. Parallel arrays of on- and off-centre cells connected with AND gates could then be used to detect oriented segments of zero-crossings. Such devices would have oriented receptive fields (in this respect at least, they would be like simple cells) and would be labelled for position and orientation.

55
Q

Summary of the summary

A
  • Luminance Changes and Edges: Changes in brightness in an image indicate boundaries, patterns, or shadows on objects.
  • Task: Detect and locate these brightness changes, which are local peaks in the brightness gradient.
  • Input and Output: Start with the image’s 2D brightness array and produce a list of edges, noting their positions and directions.
  • Laplacian for Zero-Crossings: The Laplacian operator helps identify zero-crossings, which align with peaks in brightness changes.
  • Isotropic Operator: The Laplacian calculates the second derivative in all directions, making it circular and able to detect edges in any orientation.
  • Multiple Scales: Brightness changes happen at various scales, so operators of different sizes are needed.
  • Gaussian Blurring: Adjust the Gaussian blur function to different scales and apply the Laplacian to each scale to create suitable filters (convolution kernels).
  • Edge Detection: After applying these filters, edges are identified where zero-crossings appear consistently across at least two scales.
  • Laplacian of Gaussian: Approximated by the receptive fields of visual neurons in the retina and LGN.
  • On- and Off-Centre Cells: Use these cells in parallel arrays, connected with AND gates, to detect zero-crossings and identify edges.
  • Oriented Receptive Fields: These fields, similar to simple cells, help label edges by their position and direction.
56
Q

Iqra important points
What basic features are made explicit with attributes of what

When is the first derivate gradient differentiated
What ab second

Laplacian operator what is it
Laplacian corresponds with what

How many crossings are in image after convolution and what is it
What does it require
Unless the delta function what does convolution do
What does delta do

What scale does computer vision operate at and why

A

Basic features made explicit = edges, bars, blobs, terminations
With attributes of = orientation, contrast, length, width, position

First derivative (gradient) differentiated ONCE 2 parameters
SECOND = all directions multiple times

Laplacian operator = non-oriented (isotropic), 2nd derivative, reveals edges @ all orientations
Laplacian derives 0 crossings corresponds w location of gradient maximum in step-edge
Marr & Hildreth…

0 crossings in image after convolution

Convolution = ‘looking through’ filter at an image
Requires kernel w centre-surround construction
UNLESS delta function, convolution transforms input
Delta leaves image as it is
Correspond w max luminance gradient in retinal image

Computer vision operates at multiple spatial scales – why?
different info found at different scales

57
Q

Fine detail in a seen what and what not is it processed by

A

High spatial frequency filter- rapid change in light intensity= fine detail and edges.

Filter w large rf- responds to larger coarser aspects of visual scene
A low sf- responds to slow changes in light itnensity so broad general features.
Very excited cells= no
Highly specialised area of the brain= no its asking specifically what carries it

58
Q

Why are many artificial edge detection schemes performinganalysis at multiple spatial scales

And coast of britain length depends on what

A

As diff image feature exist at different spatial scales
Human vision operates at multiple spatial scales.

Length of coastline of Britain depends on scale of analysis

59
Q

The Laplacian operator is what

A

Non orientated isotopic operator
Second order differential operator

60
Q

first derivate of a function
Second derivative of a function

A

The gradient of the function

Second can be calculated by differentiating the first derivative.

61
Q

Does the Laplacian operator differentiate the image once at all orientations

A

No

62
Q

If an image is first convolved with a blurring function GS and then operated upon by the Laplacian N^2 the result is the same as if the image had only been convolved with a Laplacian of a Gaussian. N^2GS. Its true but explain why

A

First blurring function GS filter smooths out the image by reducing the noise and details and then applying laplacian operator N^2 you get the second derivate of the smoothed image and then this enhances rapid intensity change which after blurring becomes your edge with less noise.

Just applying Laplacian of a Gaussian combines both steps into one so controls image with Gaussian kernel and then applies laplacian to blur image and have the same enhancing image reducing noise = edges.

63
Q

The output of a Laplacian operator produces what
Where are luminance boundaries located

A

Output of a Laplacian operator produces a zero crossing that corresponds with the location of the gradient maximum in a blurred step edge.

Luminance boundaries located incorrectly close to sharp corners. By the zero crossing in the outputs of a laplacian of a gaussian operator. Crossings= max gradient= steep

64
Q

Why is marr and hildreths theory of edge detection acc valuable
In their theory zero crossings…

A

It provides a sound mathematical basis for the computational operations that are performed by the visual neurons.

Zero crossings in second derivative. in the image after convolution they correspond with the maximum luminance gradient in the retinal image.

65
Q

Process of looking through a filter at an image=
Convolution

According to Marr the full primal sketch is

In the full primal sketch features are grouped into what

A

Looking through filter at an image= convolution
Convolution requires convoluion kernel with centre surround construction

According to Marr, the full primal sketch is where clusters of features are grouped together into higher order symbols.
In full primal sketch features are grouped into higher order structures