Lecture 13- Flashcards

1
Q

Motion information about

A

Ego- self motion
Time to contact
Surface structure
Identification of moving people and animals- camouflage
Emotion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Physical world stable is retinal iamge

How to detect movements

A

Physical worlds stable retinal image in constant motion. Pick up movement we need to detect their retinal motion against constant and complex retinal flow.

Firstly remove retinal flow from eye movements to allow compensation= inflow theory.
Outflow theory

The inflow theory suggests that feedback from stretch receptors in eye muscles helps the brain compensate for retinal motion caused by eye movements.
In contrast, the outflow theory, proposed by Helmholtz, suggests that compensation occurs by comparing retinal motion with the signal that initiated the eye movement, rather than relying on feedback.
To test these theories, one can close one eye and gently push the other to induce a passive eye movement. The observed effect contradicts the inflow theory: the world appears to shift in the opposite direction to the eye movement. This suggests that compensation does not rely solely on feedback from eye muscles. Instead, it supports the outflow theory, where compensation is based on comparing retinal motion with eye movement instructions.

However, it’s possible that both theories play a role in compensating for eye movements, along with full-field motion cues. So, while the evidence challenges inflow theory, it doesn’t conclusively prove outflow theory as the sole explanation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Dynamic raw primal sketch= encountng direction and speed

A

In motion analysis, the initial step involves encoding the direction and speed of individual image features, which are then added to the raw primal sketch elements. This results in a dynamic raw primal sketch, represented as EDGE(position, orientation, size, contrast, direction, speed …). Humans appear to utilize at least two different systems for extracting motion information: a long-range system and a short-range system. However, recent experiments have prompted a re-evaluation of previous evidence, indicating the existence of four distinct motion processing mechanisms.

Long range= apaprent motion- no motion aftereffect, over long distances. Short temporal gaps.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

long range

A

The long-range system, also known as classical apparent motion, operates indirectly by tracking individual image features over time. This system is highly adaptable, recognizing various complex objects as features. For example, different sequences of images can create the illusion of an object moving laterally. Even when the object changes shape or luminance between frames, the impression of motion remains. Experimenting with sequences of images, such as those in Fig 2, demonstrates this apparent motion effect. However, this system faces a correspondence problem similar to stereopsis: determining which dot in one frame corresponds to a dot in the next frame. In more complex scenarios, like random dot kinematograms, where two different patterns are presented in sequence, the visual system must decipher the correspondence between dots across frames. At higher frame rates, apparent motion closely resembles real motion, as the rapid succession of images blurs together, making the motion appear smooth. This phenomenon underscores the human visual system’s ability to perceive motion even in rapidly changing visual stimuli.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

The short range system= real motion

A

The short range system actually signals the speed and direction of moving image features directly and is thought to rely upon specific motion sensors, sensitive to spatial and temporal luminance changes.

The simplest way that such a system might work is by using a sequence detector. Sensors are connected together at slightly different positions so that the motion detector responds only when the primary sensors are activated in the right order and with appropriate timing. The response of such a detector can, in principle, signal both direction and speed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Plausible mechanis, physiologically

A

The problem addressed here is detecting a small movement of a light spot or contour over a short period of time, which is part of a theory for understanding how motion is detected.

One approach to solving this problem is using a Reichardt detector, outlined in Figure 3. This detector consists of two parts: one responds to motion in one direction, and the other responds to motion in the opposite direction. Each part combines signals from two light sensors, one of which is delayed. For example, if a spot moves to the right between two frames, the left-hand sensor will activate first and its signal will be delayed and combined with the signal from the right-hand sensor, causing the unit to respond to rightward motion. Similarly, the unit on the right responds to leftward motion.

However, both units also respond to a stationary bright stimulus. To solve this, the outputs of the two units are compared: if only one unit responds, there was motion so opp side, and depending on which unit responded, the direction of the motion is determined.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

One final tweak

A

Instead of using circular lightness detectors if we place orientation selective sensors this would mean the motion detector would also be selective for contour orientiton.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

The aperture problem
What may be the site where this integration takes place

A

Applies to all motion detecting mechanisms whose window on the stimulus so rf is smaller than the object size, if biological or artifical. The problem is the ambiguity of moving contours not within the mechanism that underlies their detection.

When a motion sensor can only observe a small part of an image, it encounters what’s known as the aperture problem. This problem arises because the motion perceived by the sensor is limited to the component perpendicular to an edge, not the actual motion along the edge.
For example, when a large rectangle moves diagonally upwards to the right, a motion sensor would detect rightward motion for the right edge and upward motion for the top edge. In simpler terms, the sensor only sees part of the movement, not the whole picture. This poses a challenge for the visual system, as it must integrate all these localized motion signals to understand the true motion of the entire object. This integration process likely occurs in areas like MT in the brain, rather than in the initial visual processing area, V1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Motion after efect

A

After adapting to movement in one direction stationary objects appear to move in the opposite direction. Fatigue etc.

Direction= comparing the amount of response in units selective for opposite direction.
Stationary objects= balanced response but adapting to one direction destroys this balance due to directions selective fatigue= signal for movement in opposite direction.

More compelx motion after effects- eg rotating spiral can be understood. If adapting pattern appeared to expend and rotate right wards then the stationary spiral observed afterwards will appear to contact and rotate leftwards.

Types of stimuli processed by long range motion system do not produce motion after effects= evidence that these two types of stimuli are processed by 2 diff mechanisms. Long and short

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

The dynamic full primal sketch

A

Common fate. Most powerful. Features belong together. Motion= group together move together in the same direction at same speed. Must be more complex.

Motion through world produces smooth but complex gradients of speed and direction so not all features will move w exactly same speed and direction yet they all belong to same objec.t

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

The dynamic 2 1/2 d sketch
Timing of action

A

When something moves towards us or we move towards it, we perceive an expanding pattern of motion. This expanding motion pattern has a focal point called the Focus of Expansion (FOE), which indicates whether an approaching object will collide with us or pass by. Even infants react to this cue, instinctively flinching if presented with a pattern suggesting an object on a collision course. Similarly, we can time our actions based on these expanding flow patterns.

The rate of expansion of the image depends on both the object’s distance and its speed of approach. Although measuring expansion rate doesn’t give separate estimates of distance and speed, it provides something more valuable: the “time to contact,” which is the distance to the target divided by the speed of approach. This measure helps us gauge when to act, regardless of whether we’re moving fast towards a distant object or slowly towards a nearby one. Human observers excel at estimating time to contact from expanding flow patterns, and various organisms, from drivers to diving gannets, likely use similar measures to guide their actions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Extracting 3d info from relative motion

A

Time to contact not only helps estimate the 3D arrangement of surfaces in our surroundings but also provides a depth map of the external world. By measuring the rate of expansion across the image, we can gauge the time it takes to reach various points, essentially creating a time-based depth map. This method isn’t limited to movement towards a specific target; it works for any direction of movement.

Motion parallax further aids depth perception: closer objects appear to move faster across our retina than distant ones, a phenomenon evident when observing telegraph poles and trees from a moving train.

Additionally, movement relative to a surface generates smooth motion gradients in the image, with the type of gradient indicating the orientation of the surface—horizontal or vertical tilt. As the surface angle increases, the speed gradient in the image becomes more pronounced. By analyzing these gradients in retinal flow patterns, we can extract valuable information about the layout of 3D surfaces. Experimental evidence suggests that humans excel at using this information, supported by neurophysiological findings indicating selectivity for speed gradients in certain brain cells, such as those found in MT.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Biological motion

A

Biological motion
Gunnar Johannson (1973) attached a few lights to subjects’ joints (Fig 6) and filmed them in the dark so that only the lights were visible. A single frame from the resulting movie looks like an incoherent cluster of lights. But, as soon as the movie runs and the lights move about, even though the stimulus is now physically more complex, it is instantly recognisable as a person moving about. It is quite possible, using this kind of display, to make sense of two people dancing together, to recognise whether they are male or female, and even to estimate the weight of a box (also visible only as a few lights) that is lifted. The type of motion present in these stimuli has been referred to as ‘biological motion.’

Evidence from single cell recordings in macaque monkey suggests that some cells in area STP (superior temporal polysensory area), which receives input from both the dorsal ‘where’ —or ‘motion’—pathway and the ventral ‘what’—or ‘form’—pathway, are responsive for various body movements including walking

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Humanistic motive and intent

A

Eventhough stimuli chosen to be very simple cartoon spontaneously attributed them with human characteristic

So the way things move used to work out things like eg a hits b and thats why it moved. Subtle and sophisticated ways involving humanistic motivate and intent.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Long range

A

Apparent motion but no motions after effect
Over long distances and short temporal gaps
Can tolerate changes in- colour, shape, luminance

The long-range motion detection system focuses more on how things move rather than their specific details like color, shape, or brightness. So, even if something changes color, shape, or brightness as it moves, this system can still detect its motion because it tracks the movement of individual features over time. In simpler terms, it pays attention to how things move rather than what they look like, which helps it tolerate changes in color, shape, and brightness while still recognizing motion.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Short range motion detection system vs long a well

A

The short-range motion detection system, on the other hand, focuses on detecting motion within a smaller, localized area of the visual field. It may be more sensitive to specific details like color, shape, and brightness changes because it operates within a limited spatial range. This system might be better at detecting fine details or subtle changes in nearby objects but may not be as effective at perceiving motion across larger distances or with rapidly moving objects. Essentially, while the long-range system tracks motion over a broader scope, the short-range system is more attuned to detecting motion in specific areas or objects within its immediate vicinity.

Real continuous motion and detected by motion detectors- Experiences motion aftereffect- The short-range motion detection system is more sensitive to specific details and changes in the visual field. When exposed to a moving stimulus for a while, it gets adapted to that motion. So when the stimulus stops, it may continue signaling motion in the opposite direction, causing the motion aftereffect. Long-range motion detection focuses on overall motion patterns and is less likely to experience this effect.

Becomes faster than temporal res and true motion unknown and ambigous. Eg cell selective for upward motion but responds strongly to motion square, motion of square is unknown we see an upward motion we see it mvoe up but could be left or right we dont know.
In short-range motion detection, when objects move too quickly, the sensors can’t keep up. In short-range motion detection, there’s a problem when the speed of the motion exceeds the temporal resolution of the sensors. This means the sensors can’t accurately perceive the true motion, leading to ambiguity. So, while we perceive the motion as upward, the true direction of the object’s movement remains unknown due to the limitations of the short-range motion detection system.
In contrast, long-range sensors, which detect motion over larger distances, can tolerate faster motion because they’re designed to track more gradual changes in movement direction over longer periods.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Why do we get rid of retinal motion thar arises due to eye movements from visual analysis

A

Not useful info

We eliminate retinal motion caused by eye movements from visual analysis because it can interfere with our perception of stable objects in the environment. If our visual system were to interpret every motion on the retina as movement in the external world, it would lead to confusion and distortions in our perception. Therefore, the brain filters out retinal motion caused by eye movements to ensure that we perceive the world as stable and coherent.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Retinal cues

A

Retinal motion cues are indeed important for many visual tasks, such as tracking moving objects or estimating motion direction. However, when it comes to analyzing overall scene motion, the visual system needs to compensate for the motion of the eyes themselves to avoid interpreting that motion as motion in the external world. This is why, in some cases, the visual system needs to distinguish between retinal motion cues arising from eye movements and those arising from motion in the external environment.

So not useful info when looking at overall motion
Object boundaries
Causality
Motive and intent
Time to contact
Surface slant
Relative depth

Sure! Let’s take the example of a person walking towards you.
- Object boundaries: Retinal cues help you perceive the person’s outline and distinguish them from the background.
- Time to contact: You estimate how long it will take for the person to reach you based on their speed and distance.
- Relative depth: You perceive the person as closer to you compared to objects in the background.
- Surface slant: If the person is walking on an inclined surface, you might notice changes in their posture or gait, indicating the slope.
- Causality: You understand that the person’s motion is driven by their intention to approach you.

These retinal cues contribute to your overall understanding of the person’s motion, allowing you to react appropriately to their approach. However, to fully comprehend their movement trajectory and predict their path accurately, your brain integrates these cues with other sensory information and contextual factors. We don’t want to eliminate retinal cues. Instead, in certain contexts, such as when studying overall motion perception or when focusing on specific motion processing mechanisms, researchers may isolate or control for specific cues to better understand their individual contributions. However, in everyday perception and interaction with the environment, retinal cues are essential for making sense of motion and understanding the world around us.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Reinhardt model (for short range)

A

The Reichardt model is a computational model used to detect motion in visual systems. It works by comparing signals from two adjacent light sensors over time. In short-range motion detection, this model is effective because it can detect small, rapid changes in motion, like those caused by nearby objects moving quickly. However, for long-range motion detection, where objects are farther away and moving more slowly, the Reichardt model isn’t as effective because it’s better suited for detecting rapid changes in motion rather than slower, more gradual movements. By adding orientation selective can do orientation as well so direction.

If model has pairs of units response to L and R motion distances= opp
If both units respond= stationary opbject
Units made to respond faster if distance between pairs of inputs for each unit is increased .

  1. Pairs of units response to left (L) and right (R) motion: In this model, there are pairs of neural units that are sensitive to motion in opposite directions. For example, one unit might detect leftward motion (L), while its paired unit detects rightward motion (R).
  2. If both units respond, it indicates a stationary object: If both units in a pair respond equally, it suggests that there is no motion occurring. This is because if an object is moving, one unit in the pair should respond more strongly than the other, indicating the direction of motion. But if both units respond equally, it suggests that the object is not moving.
  3. Units respond faster with increased distance between inputs: The responsiveness of these units increases as the distance between the inputs (e.g., visual stimuli) for each unit in the pair increases. This means that if the motion between the inputs is greater, the units will respond more quickly and strongly to indicate the direction and speed of the motion.

Overall comparing the responses of pairs of units sensitive to opposite directions of motion, with the strength of response indicating the presence, direction, and speed of motion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

TTC

A

Expanding patterns of retinal motion are used to calculate time to contact TTC
TTC= 1/ expansion rate.
Forces of expansion are used to help calculate the direction of heading

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Inflow vs outflow theory

A

Outflow= motion compared with eye movement instructions. Also known as corollary discharge theory and efference copy
Inflow cant explain why movements are compensated. Inflow is motion compared with feedback signals from the eye muscles.
Yes, that’s correct. Inflow theory fails to explain why compensation still occurs during passive eye movements when the muscles aren’t actively engaged. Therefore, inflow theory alone cannot fully account for how eye movements are compensated.

The inflow theory suggests that sensory feedback from eye muscles, called stretch receptors, is sent to the brain to compensate for retinal motion. This feedback allows the brain to adjust visual perception based on eye movements.

In contrast, the outflow theory, also known as corollary discharge theory, proposes that the brain predicts retinal motion by comparing the intended eye movement signal with actual retinal motion. This prediction allows the brain to separate self-generated motion from external motion in the visual field.

Inflow Theory: Imagine you’re driving a car and your friend is sitting next to you. In this theory, it’s like your friend is constantly telling you about the movements they see you making while driving. So, if you turn the steering wheel to the left, your friend tells you, “Hey, you turned the wheel to the left!” This constant feedback helps you adjust your driving based on what you’re doing.

Outflow Theory (Corollary Discharge Theory): Now, think of it this way: before you make a turn while driving, your brain sends a message to itself saying, “Hey, I’m about to turn left.” This message is like a prediction of what’s going to happen. So, when you actually make the turn, your brain already knows what to expect and can adjust your perception accordingly, without needing constant feedback from your movements.

Does that make it clearer?.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Correspondence pattern of motion

A

Problem in associating image points from 1 movie frame with the same points in a subsequent frame.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

The aperture problem

A

Problem for moving orientated luminance contour viewed through aperture
True direction of motion is ambigious

The aperture problem arises when observing a moving oriented luminance contour through a restricted aperture, such as a small window or a limited field of view. In this scenario, the true direction of motion of the contour becomes ambiguous because only a portion of the contour is visible. Due to this restricted view, the observer cannot accurately determine the complete motion of the contour. This problem highlights the limitations of perceiving motion when only partial information is available, leading to an ambiguous interpretation of the motion direction.

Other convergence clues calc distance to nearby points of fixation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What can motion provide and not provide info about

A

TCC, surface slants relative depth, object trajectory
Not relative or absolute object size.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Is it desirable to remove the retinal motion that arises due to eye movements from further VISUAL analysis bc it provides no useful info ab the visual world

A

Yes but stil important eg navigation it’s important.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

When the frame rate of a series of displaced images becomes faster than the temporal resolution of the visual system apparent motion becomes here same as real motion?? Explain

A

Temporal; resolution is the ability of our visual system to perceive changes in stimuli over time and the frame rates now matches it so frame relate becomes faster than that so it can perceive it. So not jerky rather it blends so we get smooth continuous blend

So when frame rate exceeds temp res images blend together and we get continuous motion or call flicker fusion. Apaprent motion= real motion they are indistinguishable bc it can perceive it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

In optic flow the focus of expansion can be used— not bc time of contact why and what for

A

TTC yeah but it needs to know what direction it is heading in to detect the contact so first

Focus of expanasion= can be used to help calculate the direction of heading.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

relative motion cues what and what not

A

Object boundaries
Causality
Motive and intent
Time to collision

Not relative reflectance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What is the aperture problem

A

Problem that for a moving orientated lumiancne contrast moved through man aperture the true direction of motion is ambigous.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

consider the Reinhardt model consisting of pair of units responsive for left and right directions of motions which is true

A

The units could be made to respond to faster motion if the distance between pairs of inputs for each unit were increased so increasing distance increases motion so increases response time.

Or could be made to respond faster by decreasing length of delay stage in their circuitry.

31
Q

In an unadapted state- cell in primary visual cortex selectively retuned for right wards motion responds strongly to moving square, which deduce

Then upwards

A

Square moved directly to the right but not possible to say what direction of motion you sae as compelx motions

And next- the square moved directly upward but not possible to say what direction of motion you saw.

32
Q

NOW ADAPted state

A

When you are in an unadapted state:
- The Situation: A cell in your primary visual cortex is selective for rightward motion and responds strongly to a moving square.
- Possible Deductions:
- The square likely moved to the right because the cell tuned to rightward motion responded strongly.
- However, without additional context, someone cannot definitively conclude that the motion perceived was solely rightward. The response could be influenced by other factors, such as combined motion vectors or complex shapes.

When you are in an adapted state (having been exposed to a particular motion for a prolonged period):
- Adaptation: Cells tuned to the direction of adaptation (e.g., rightward motion if you’ve been looking at rightward motion for a while) may become less responsive due to neural adaptation.
- Effect: This adaptation can cause changes in perception due to the decreased responsiveness of the adapted cells.

  • Rightward Adaptation: Suppose you have been exposed to rightward motion for some time.
  • Observation: Now, when the square moves, the response of the rightward-tuned cells may be weaker than in the unadapted state due to adaptation.
  • Deductions in Adapted State:
    • If a rightward-tuned cell still responds strongly despite adaptation, it indicates a very strong rightward motion signal.
    • Perception might be altered. After adaptation to rightward motion, neutral (non-moving) or weakly moving objects might seem to move leftward (motion aftereffect).

In an unadapted state, the strong response of a rightward motion cell to a moving square indicates rightward motion, but precise perception details might need more context.

In an adapted state:
- The rightward-tuned cell’s response might be weaker due to adaptation.
- A strong response despite adaptation suggests a strong rightward motion.
- Perceptual effects might include altered motion perception due to adaptation,

33
Q

Introduction

A

Most objects do not emit light, they reflect back the light reaching them from some other source eg the sun.

The illumination (I) of a surface is the amount of light falling onto the surface
Luminance (L) amount of light coming back from the surface.

Illumination depends both on illumination and reflectance of the surface.
L= Ir

Brightness= light
Lightness= surfasces

34
Q

Physical and psychological variables

A

Physical= reflectance. How much light a surface will reflect. Sometimes called albedo. 0-1. Albedo 0 reflects no light everything absorbed. 1 all light is reflect back off surface
Illuminance- level of light source so can be measured amount of light falling on a surface
Luminance- amount of light reflected back from the surface. Not the same as reflectance. Reflectance is a percentage of illumination reflected off and luminance is the amount of light that reaches our eye.

  • Illuminance is the input light falling on a surface.
  • Reflectance is the property of the surface that determines how much of that light is reflected.
  • Luminance is the output light that is reflected (or emitted) from the surface and perceived by an observer.

Consider a white piece of paper under a lamp:
- Illuminance: The light from the lamp hitting the paper.
- Reflectance: The paper’s property, which determines how much of the light it reflects (a high percentage for white paper).
- Luminance: The brightness of the paper as perceived by an observer, resulting from the reflected light.

35
Q

Luminance equation

A

Luminance= illuminance times reflectance

36
Q

Lightness constancy

How can this be achieved- how can we make good estimates of r when we are only given access to L at the back of our eye-

A

See same objects under diff lighting conditions and appearances dont change so visual system is able to return good estimates of reflectance (r albedo) in spite of variations in illumination. This is called lightness constancy.
How can this be achieved- how can we make good estimates of r when we are only given access to L at the back of our eye-

White surface albedo 1= 100 percent. From sun 100 units of light shine down onto surface and therefore it reflects all that light back of the surface. To observor looking at the surface 100 units of light L would be rejected back of that surface into the back of the eye.

Second surface black surface albedo of 0.1 so reflects 10 percent of incident light to observor. So 100 percent but 10 percent reflects back. So one on left lighter.

37
Q

But now lets suppose dim illumination and sun behind clouds so illumination drops from 100 to 10 units

A

Reflatcne property doesnt change and the white surface reflects 10 units of light.

The dark one reflects 1 unit of light now.

(Amount of light reflected back is the same ratio 100 percent there and still 10 percent there. So lightness constancy irrespective of illumination. Surface reflectance is unchanging properly of real objects. The ratios of light across two surfaces are the same)

38
Q

If we can compare adjacent luminance levels by comparing their ratio then

How can this algorithm be implemented by the visual system ]

Rgcs changes in reflatcne and peak responses

A

We know their reflective values of reflectance

Not the same as having an absolute measure of r but perhaps good enough. Tells you relative reflectance not absolute.

Retinal ganglion cells play a crucial role in processing visual information by making spatial comparisons within the visual field. Their receptive fields are designed to use lateral inhibition, a process where the cell compares the amount of light in adjacent regions of the image. When adjacent regions have the same luminance, the cell remains inactive. However, when there is a difference in luminance between adjacent regions, the cell responds, with the magnitude of the response providing a measure of the relative reflectance of the surfaces forming the boundary. This ability to detect changes in reflectance is vital for identifying surface markings, shadows, and boundaries between different objects. Thus, retinal ganglion cells, through their receptive fields and the mechanism of lateral inhibition, are essential for distinguishing various visual features and contributing to our perception of the environment.

Changes in reflectance tell us ab the surface markings, shadows and boundaries between diff objects and might be given by the zero crossings in the responses of retinal ganglion cells. Peak responses or gradient of zero crossings tell info ab relative lightness. S

39
Q

If the visual system assumes the lightest region of the image is white what can it do

A

Anchoring
So can estimate the light levels or the r values for every single surface across the retinal iamge. So eg that point is 3 times that is 2 times so absolute lightness of all grey levels can be estimated in a scene.

40
Q

Lightness contrast

A

Two squares w same luminance. If one place in darker surrounding it appears brighter and in lighter it appears darker.
Lightness contrast consequence of lightness constancy

Leads to an error burn rare

41
Q

Craik corn sweet o brine illusion

A

Right side darker than the other but if you mask off the middle region now it appears the same. This is because of the step edge. So now brain thinks they’re diff colours.

demonstrates how the responses of retinal ganglion cells influence our perception of surface lightness at luminance borders. In this illusion, most of the stimulus has a uniform luminance except for the central region, which features two smooth ramps surrounding an abrupt change in luminance. Retinal ganglion cells primarily respond to the abrupt change but are less sensitive to the smooth ramps. This response pattern is similar to what would be evoked by a real light-dark luminance border, creating the perception of a step change between two areas of different lightness, even though the actual luminance is nearly uniform. This illusion shows that the visual system interprets the ganglion cell responses as changes in relative reflectance, leading us to perceive a uniform lightness on either side of the edge despite the continuous nature of the luminance ramps.

42
Q

Shadows

A

Produce luminance borders but these are not interpreted as discontinuities or changes in reflection rather as changes in illumination. Correctly interpreted as shadows how:

Penumbra- fuzzy region helps indicate luminance change is shadow
Shadow illusions= argyle illusions and checker shadow illusions= improtance of vision higher level processes-> surface lightness
Bottom up influences we know light there and too down. Cognition

Shadows without penumbras- again top down so unlikely for tree to be on house so shadow.

43
Q

Argyle illusion

A

Grey diamonds look much lighter than grey columns centrally. Eventhough luminance is identical.

In the Argyle illusion, even though the background grey levels are the same in each version, our brains interpret the patterns differently. In the first version, we might see tessellating triangles with some shaded areas, but even if we’re told to focus on the actual brightness levels, our brains still tend to see the up-pointing triangles as lighter. In the second version, the shadow explanation doesn’t work, so we see black and white shapes on a gray background. And in the third version, where the shapes are spaced differently, the illusion remains because the shadow idea still works. This shows that our brain’s interpretation of lightness isn’t just about what our eyes see—it’s also influenced by higher-level thinking processes.

The “shadow idea” refers to the interpretation of the lighter and darker regions in the pattern as being caused by shadows. In the Argyle illusion, when certain parts of the pattern are shaded or appear darker, our brains might interpret them as shadows cast by objects, even if there are no actual shadows present. This interpretation affects how we perceive the overall brightness and contrast of the pattern.

So gaps between them then cant be shadows so we dont interpret them as shadows. Argyle illusions= when appear darker our brains are interpreting them as shadows

44
Q

Overall brightness

A

If retinal ganglion cells compute ratios, they provide no information about absolute values, yet we see the difference between a brightly lit sunny day, and an overcast grey day. Some other mechanism must also be involved in telling us about brightness.

45
Q

Specular reflectance

A

Surface reflectance comes in two flavours.
Specular (mirror-like) reflectance means that incident light is reflected in a predictable direction, like a ball bouncing off the floor. This is responsible for the highlights seen on many surfaces (remember the highlights on the vase in the lecture on depth cues).

Diffuse (matte) reflectance means that incident light can be reflected in any direction. Most surfaces have both a specular and a diffuse component, so the coding of reflectance is both subtle and complex, underpinning the perception of things like surface glossiness.

46
Q

Shading

A

This passage explains the concept of shading in visual perception using an example of two cinder cones photographed from above. In the first image (Fig 5A), the lighting creates shading that helps us perceive the shape of the cones.

When the same image is flipped upside down (Fig 5B), the shading creates an illusion of craters instead of cones.

This demonstrates two points:
firstly, shading influences our perception of shape, as discussed in previous lectures on depth cues.
Secondly, our visual system typically assumes that lighting comes from above, so the perception of the image as craters requires effort to overcome this assumption.
Just like other visual illusions, such as the Necker cube or Schroeder staircase, our perception can be altered with conscious effort.

47
Q

Lightness and the 2-1/2D sketch

A

It seems that lightness perception does not occur at the level of the Raw Primal Sketch, but at the level of the 2-1/2D Sketch, where a depthy surface representation is available (e.g. Fig 4).

The visual system seems capable of building up a sophisticated model which includes the direction and nature of light sources, types of surface reflectance and 3D surface layout, all from the simple information about luminance available in a single black and white image. In doing this, it seems able to use luminance information (e.g. shading) as a way to interpret surface layout (e.g. Fig 5) but also to use information about surface layout (e.g. Fig 4) as a way to interpret luminance. Very tricky!

48
Q

Iqras important points

A

Physical= luminance vs psychological
Brightness= light and lightness is surfaces
L= r times I

49
Q

relative vs absolute reflectance
Is light constancy a godo estimate of reflectance

A

Relative= calcualtion from luminance ratio
Absolute= cnanot be calculated from info at 2 neighbouring patches of image= all of it so not from ratio.

Yes light constancy= good estimate of reflectance and it happens when surfaces appear to be of the same lightness when viewed under different lighting conditions

50
Q

Luminance contrast
Rgcs

A

Contrast= same physical intensities yielding differences in perceived lightness
Rgcs are insensitive to gentle luminance changes but sensitive to sudden ones so like a step edge

51
Q

There is more to perception

A

Shadows= not sensitive to penumbra but change in luminance and change in reflectance
3d except light diff in diff rooms. Surface lightness depends on perceived depth and affects interpretation of illumination.

52
Q

Light constancy is desirable if
And waht is it

A

Desirable if it is more I profit at for vision to encode properties of surfaces than absolute light levels.

Phenomenon that surfaces appear to be of th e same lightness when viewed under different lighting conditions
Diff lightness= lightness contrast

53
Q

the relative and absolute reflatcne values of spatially neighbouring patches of an image

A

Relative- calculated by their luminance ratio- reflectance is a proportion of light a surface reflects.

Absolute- cannot be calculated from information contained at only those two patches as needs knowledge of exact proportion of light at each patch reflects so illumination and more info than 2 patches.

54
Q

Lastly the Argyle illusion

A

Bottom up retinal processes are insufficient by themselves to account for human perception of lightness. Lightness perception is not solely determined by raw input received. Also contextual factors and higher level processing

So top down importnat
Bottom up retinal processes insufficient

55
Q

Recap

A

We are now on the final stage in Marrs computational framework for object recognition
The 3d model. But first recap.

Raw primal full primal. In the raw primal simple features eg edges made explicit. We saw how this could be achieved by convolvign retinal image or proximal stimulus w laplacian of a Gaussian at diff spatial scales and marking an edge where zero crossings coincided from two or more scales.
These edges attributes eg orientation etc and in full primal higher order descriptions available by grouping the lower order primitives

56
Q

Object recognition
Invariants

A

Flexible for many invariant. Invariant= dont depend on each other
Doesn’t depend on size if we can resolve it we can see it and doesnt depend on orientation,and viewpoint invariant so look at smth from many viewpoints you can recognise it

Mirror reversal so exact copy invariant with that as well
Also survives partial occlusion- object is partially occluded but what remains in view we see.

57
Q

The three spines stickleback grayling butterflies

A

Recognised fish just by the red belly. So not acc appearance it didnt matter what it looked like just the red bellies so these so called trigger features the red belly and the eye.

Graying butterflies- made models of butterflies with paper realistic and fake. Male seems a mate. Butterflies want to chase the right insects so the trigger feature for them is the flapping of the wing- 50/50 chance of finding a mate.

58
Q

Primates
If we use marrs framework

A

More sophisticated than sticklebacks bad butterflies
More subtle and complex features. Trigger features dont help we need to use collection of features.

Raw primal sketch- basic features made explicit so edge and other features
Within full primal sketch- features grouped together into larger structures
2 1/2 d sketch- make explicit surface properties eg rough and estimate of 3d layout so which parts of the object are close to us and which are further away.
3d model= we need smth else to recognise it.

Current sensory response and other memory= rise to recognition. But how is it associated with memory

59
Q

Why do we need axis based volumetric description

A

Marr thought the 2 and 1/2 d sketch was a poor candidate as the basis for object recognition as description from a viewpoint egocentric and its only physical attributes being described so only describes them. Which would mean that instead of storing a single description of an object in memory many representations from different viewpoints would have to be stored.

Better to build single description of object than store many in memory. So needed another level in 3d model called axis based volumetric description. Full 3d structure of object not just visible surfaces
Primal sketch edges etc. primitives of primal sketch. Primitives at this level of representation are what Marr referred to as generalised cones

60
Q

Generalised cones

A

Surface created by moving a cross section along a smooth axis where the cross section can change in size but not shape and the axis could be curved. Here cross section elliptical. So code an entire object eg banana with only small numbers of pieces of info.

So can build up compact cones on the level of the 3d model for any object. Generalised cones- banana, vase, cylinder each with axis so circles in them. HAFSA has banana.

Rich set of volumetric primitives that are powerful and can represent many diff objects but can be simply encoded

More objects are more complicated than vases etc so how can complex objects be broken down. Marr and nish said that every generalised cone can be broken down into component parts and human being is not simply cylinder. Must identify the major axis where the initial cross section is described.

61
Q

Generalised cones from handout

A

The concept of storing different representations for the same object arises from the viewer-centered nature of 2-1/2D sketches, where coordinates are relative to the viewer and change with viewpoint alterations. In contrast, 3D models aim to generate unique volumetric representations based on past visual experiences, known as 3D model descriptions, stored in memory for later recognition. These descriptions are object-centered, allowing matches from various angles.

Marr & Nishihara proposed representing objects with generalized cones, surfaces formed by moving a cross-section along a smooth axis.
This approach provides a rich set of volumetric primitives, such as cylinders and cones, for representing diverse shapes symbolically. Complex objects are hierarchically represented by breaking them down into smaller elements, each represented by generalized cones, forming a hierarchy of 3D models known as the 3D model description. This hierarchical representation allows for the encoding of finer details and connections between components, facilitating efficient object recognition from different perspectives.

62
Q

Identification of the major and component axes

A

The process of finding the major axes for complex objects involves several steps. Initially, the major axis of an object is identified based on the occluding contours of its silhouette in the 2-1/2D sketch, similar to determining the major axis from an object’s silhouette. This is relatively straightforward for simple objects like a vase but becomes more challenging for complex shapes. To address this, convex and concave contours are identified and marked to delineate segmentation points, allowing the outline to be divided into smaller regions, with each region’s axis identified. This process is often automated by computer algorithms.

However, this method may not always work perfectly, especially for objects viewed from unusual angles where the foreshortened representation in the 2-1/2D sketch can misplace the major axis for the 3D model. In such cases, additional depth cues from the 2-1/2D sketch, such as shading, texture gradients, and binocular disparity, become crucial in adjusting the major axis assignment. Once the axis is identified, the generalized cone representation is further developed by defining the shape of the cross-section and characterizing how the cross-section varies along the axis based on surface-based information in the 2-1/2D sketch.

63
Q

So storage

A

The challenge of storing and indexing a diverse range of 3D object representations for recognition purposes raises questions about the organization of this catalog. One approach is to structure the catalog hierarchically, where objects are grouped into categories based on their visual complexity and similarities. For instance, at the top level, there might be basic shapes like a cylinder, while subsequent levels could include more complex objects like bipeds, post-boxes, and hat-stands. When attempting to match a current representation with a stored one, the search starts at the highest level and proceeds down the hierarchy until a suitable match is found.

This hierarchical organization allows for efficient recognition by narrowing down the search space based on the visual complexity of objects. Moreover, objects within the same class can be recognized as similar yet distinct, enabling finer distinctions between objects.

For example, both a Spitfire and a Hurricane fighter aircraft would be categorized under WWII fighter aircraft, but further down the hierarchy, they would have separate representations capturing their unique characteristics. While this hierarchical scheme simplifies the concept of catalog organization, more complex linking mechanisms, as suggested by Marr, may be needed to enhance search efficiency. Nonetheless, this simplified approach captures the core idea of Marr’s proposal, which offered a novel perspective on addressing the challenging problem of object recognition and cataloging.

64
Q

Invariances
Marrs theory has 3-
Position invariance
Viewpoint invariance
Orientation invariance

A

Position- doesnt care where object is in vf so responds to the same and recognises the object in all orientations but for a human they might not always be able to discern it. Humans crap peripheral vision and retina in very first stages of visual processing.

Viewpoint invariance- recognise objects from all sorts of angle- humans struggle sometimes we may misalign the major axis and Marrs model gets this wrong as well so they fail in the same way here= good model. Eventually both succeed after exploiting additional depth cues from 2 1/2 d sketch. Condition etc.

Orientation invariance- thatcherisation is inverting the eyes and mouth but illustrates failure of this. Isolation they are okay upside down but upside down recognition of identify takes longer if face is upside down than right way around. Human vision fails but Marr and nishihara doesnt as it doesnt care ab orientation.

65
Q

Which face looks happier

A

Preference for smile on left side of face

Right hemisphere judges facial expression and emotions so when smile in left vf it strongly stimulates to the right hemisphere. Maybe. So happens in our brains
ASSYMETRY OF PERCEPTION

Marr and nishaharas model fails here. Why would we build up assymetry their theory still survives they test with appropriate adjustments but it tests consist w the theory

66
Q

Alternative forced choice

A

Good, not complicated
Cant use scale as its subjective
Can get a floor effect if they say 1 to everything so cant compare
Messy
See benefits of this- good clean data with a clear insight as to what’s happening

67
Q

Biedermans geons

A

Geons= set of lego bricks. Total of 30 of these he said when put them together can be used to build or describe any object we look at

Doesn’t fix problem w orientation invariance but offers alternative building block
Like Marr and nishaharas objects are represented in terms of their structures.
Primitives of object recognition here are geons not generalised cones

Benefit= geons are defined by 2d features which means that object recognition can follow directly from the primal sketch without the need for a contribution from the 2 1/2D sketch. But he didn’t disagree w the sketch he said it does what it does and then we go from primal sketch to 3d model rather than going to it.

68
Q

Bulthoff and Edelman
Viewpoint dependent memories

A

Devised 3d objects w paper clips and then got participants to recognise objects from different viewpoints. 3d representation of object in terms of building blocks at 1 viewpoint and then look from different viewpoint and then use what is stored in memory

Found participants poor at recognising new from old. Difficult to recognise complex unfamiliar objects from novel viewpoints even when prev studied in conditions thought to promote 3d interpretations. So concluded for these type of objects recognition is viewpoint dependant.

Rather than storing 3d representations of object in memory store viewpoint specific recognition in memory= strucual and iamge based theories both sued.

69
Q

Viewpoint dependent memories

A

Number of viewpoints that need to be stored are not that great.
Eg think of a car we think of them only from a 3 quarter view as thats all we rlly need here to tell a car apart.

70
Q

General criticisms of Marrs framework for vision

A

1) anatomical studies show reciprocal connections between nearly all visual areas including those higher up in anatomical hierarchy making a feed forward model seem rather under specified for intricate hardware

2) more recent connections it models of vision processing include feedback connections to earlier stages and thus are not purely bottom up but also top down element.

3) some experimental phenomna indicate the importance of top down eg rat man and argyle illusion we think of shadows.

4) during 80s and 90s increasingly apaprent that edge detection models were always going to struggle to achieve perfect segmentation of objects from backgrounds. More recent ones make an initial estimate of edge location and use this to make a first guess of object structure. Then fed back to edge detection to refine its computations.
5) may not need to pass through 2 and half d sketch as well bidermans.

71
Q

One last one

A

(section 3.1.2) challenges the need for information to first pass through a surface description stage (i.e. the 2-1/2D sketch).
This is not to say that surface descriptions do not exist; experiments with random dot kinematograms containing speed gradients (handout 13) and random dot stereograms (handout 12) show they do. Rather, in some circumstances it seems that this level of description is not necessary for the recognition of objects. Of course, vision performs many tasks and surface descriptions may be more important in some (e.g. grasping, navigation) than in others.
These criticisms (and many others) aside, the work of David Marr has had enormous impact on the vision community and, although in detail many of his ideas are slowly being shown to be wrong, the formality of his approach and the language of his ideas have been an inspiration to many who have followed.

72
Q

Iqras important points

A

Tinbergen- involved crude paper models of butterflies. Work suggested effective recognition of rival males and simple visua info processing task of 3 spines sticklebacks.

Marr and nish- 3d Mindel description from hierarchy of 3d models.
Proposed use of volumetric axis based primitives= basic shape defined along specific axis capture volumetric structure of objects not just visible surface.
Using generalised cone- surface created by moving cross section along smooth axis, may vary in size but shape is constant
Eg cylinder, traffic cone, cigars, bottle, pencil, banana, vase
No clear evidence that complex objects should be represented as cones
Early stage of processing= identifying the objects major axis
Fed by info made explicit at level of 2 1/2 D sketch

73
Q

Object recognition model Iqras important points

A

Objects described in object centred coordinate systems in terms of structures of component parts

Designed to produce viewpoint invariant objects representation and could be stored in memory. Invariant means we should recognise the objects regardless of position in vf but no expt work.

74
Q

Rest

A

Thatcherisiaiton- inverting features= failures of orientation invariance bc obvs cant tell orientation apart always

Rotating objects orientation through 180 degrees should have no effect on model but can affect visual perception but expt work doesn’t support this so marrs model doesnt support that it doesnt fail but human vision fails.

Rotating objects orientation small degrees 10 degrees shouldnt affect model expt work supports this

Mirror reversals of asymmetric faces= unusual effects in emotion perception

Criticism- top down processing missing
object recognition= primal straight to 3d not 2 and a half d
From finite set of 3d shapes having 2d descriptions so geons cubes wedges cones cylinders
For object rec intermediate stage not neeed can go straight from proximal sketch top object rec due to the way geons are defined in 2d! So all 2d now they go straight to 3d model.