Lecture 13- Flashcards
Motion information about
Ego- self motion
Time to contact
Surface structure
Identification of moving people and animals- camouflage
Emotion
Physical world stable is retinal iamge
How to detect movements
Physical worlds stable retinal image in constant motion. Pick up movement we need to detect their retinal motion against constant and complex retinal flow.
Firstly remove retinal flow from eye movements to allow compensation= inflow theory.
Outflow theory
The inflow theory suggests that feedback from stretch receptors in eye muscles helps the brain compensate for retinal motion caused by eye movements.
In contrast, the outflow theory, proposed by Helmholtz, suggests that compensation occurs by comparing retinal motion with the signal that initiated the eye movement, rather than relying on feedback.
To test these theories, one can close one eye and gently push the other to induce a passive eye movement. The observed effect contradicts the inflow theory: the world appears to shift in the opposite direction to the eye movement. This suggests that compensation does not rely solely on feedback from eye muscles. Instead, it supports the outflow theory, where compensation is based on comparing retinal motion with eye movement instructions.
However, it’s possible that both theories play a role in compensating for eye movements, along with full-field motion cues. So, while the evidence challenges inflow theory, it doesn’t conclusively prove outflow theory as the sole explanation.
Dynamic raw primal sketch= encountng direction and speed
In motion analysis, the initial step involves encoding the direction and speed of individual image features, which are then added to the raw primal sketch elements. This results in a dynamic raw primal sketch, represented as EDGE(position, orientation, size, contrast, direction, speed …). Humans appear to utilize at least two different systems for extracting motion information: a long-range system and a short-range system. However, recent experiments have prompted a re-evaluation of previous evidence, indicating the existence of four distinct motion processing mechanisms.
Long range= apaprent motion- no motion aftereffect, over long distances. Short temporal gaps.
long range
The long-range system, also known as classical apparent motion, operates indirectly by tracking individual image features over time. This system is highly adaptable, recognizing various complex objects as features. For example, different sequences of images can create the illusion of an object moving laterally. Even when the object changes shape or luminance between frames, the impression of motion remains. Experimenting with sequences of images, such as those in Fig 2, demonstrates this apparent motion effect. However, this system faces a correspondence problem similar to stereopsis: determining which dot in one frame corresponds to a dot in the next frame. In more complex scenarios, like random dot kinematograms, where two different patterns are presented in sequence, the visual system must decipher the correspondence between dots across frames. At higher frame rates, apparent motion closely resembles real motion, as the rapid succession of images blurs together, making the motion appear smooth. This phenomenon underscores the human visual system’s ability to perceive motion even in rapidly changing visual stimuli.
The short range system= real motion
The short range system actually signals the speed and direction of moving image features directly and is thought to rely upon specific motion sensors, sensitive to spatial and temporal luminance changes.
The simplest way that such a system might work is by using a sequence detector. Sensors are connected together at slightly different positions so that the motion detector responds only when the primary sensors are activated in the right order and with appropriate timing. The response of such a detector can, in principle, signal both direction and speed
Plausible mechanis, physiologically
The problem addressed here is detecting a small movement of a light spot or contour over a short period of time, which is part of a theory for understanding how motion is detected.
One approach to solving this problem is using a Reichardt detector, outlined in Figure 3. This detector consists of two parts: one responds to motion in one direction, and the other responds to motion in the opposite direction. Each part combines signals from two light sensors, one of which is delayed. For example, if a spot moves to the right between two frames, the left-hand sensor will activate first and its signal will be delayed and combined with the signal from the right-hand sensor, causing the unit to respond to rightward motion. Similarly, the unit on the right responds to leftward motion.
However, both units also respond to a stationary bright stimulus. To solve this, the outputs of the two units are compared: if only one unit responds, there was motion so opp side, and depending on which unit responded, the direction of the motion is determined.
One final tweak
Instead of using circular lightness detectors if we place orientation selective sensors this would mean the motion detector would also be selective for contour orientiton.
The aperture problem
What may be the site where this integration takes place
Applies to all motion detecting mechanisms whose window on the stimulus so rf is smaller than the object size, if biological or artifical. The problem is the ambiguity of moving contours not within the mechanism that underlies their detection.
When a motion sensor can only observe a small part of an image, it encounters what’s known as the aperture problem. This problem arises because the motion perceived by the sensor is limited to the component perpendicular to an edge, not the actual motion along the edge.
For example, when a large rectangle moves diagonally upwards to the right, a motion sensor would detect rightward motion for the right edge and upward motion for the top edge. In simpler terms, the sensor only sees part of the movement, not the whole picture. This poses a challenge for the visual system, as it must integrate all these localized motion signals to understand the true motion of the entire object. This integration process likely occurs in areas like MT in the brain, rather than in the initial visual processing area, V1.
Motion after efect
After adapting to movement in one direction stationary objects appear to move in the opposite direction. Fatigue etc.
Direction= comparing the amount of response in units selective for opposite direction.
Stationary objects= balanced response but adapting to one direction destroys this balance due to directions selective fatigue= signal for movement in opposite direction.
More compelx motion after effects- eg rotating spiral can be understood. If adapting pattern appeared to expend and rotate right wards then the stationary spiral observed afterwards will appear to contact and rotate leftwards.
Types of stimuli processed by long range motion system do not produce motion after effects= evidence that these two types of stimuli are processed by 2 diff mechanisms. Long and short
The dynamic full primal sketch
Common fate. Most powerful. Features belong together. Motion= group together move together in the same direction at same speed. Must be more complex.
Motion through world produces smooth but complex gradients of speed and direction so not all features will move w exactly same speed and direction yet they all belong to same objec.t
The dynamic 2 1/2 d sketch
Timing of action
When something moves towards us or we move towards it, we perceive an expanding pattern of motion. This expanding motion pattern has a focal point called the Focus of Expansion (FOE), which indicates whether an approaching object will collide with us or pass by. Even infants react to this cue, instinctively flinching if presented with a pattern suggesting an object on a collision course. Similarly, we can time our actions based on these expanding flow patterns.
The rate of expansion of the image depends on both the object’s distance and its speed of approach. Although measuring expansion rate doesn’t give separate estimates of distance and speed, it provides something more valuable: the “time to contact,” which is the distance to the target divided by the speed of approach. This measure helps us gauge when to act, regardless of whether we’re moving fast towards a distant object or slowly towards a nearby one. Human observers excel at estimating time to contact from expanding flow patterns, and various organisms, from drivers to diving gannets, likely use similar measures to guide their actions.
Extracting 3d info from relative motion
Time to contact not only helps estimate the 3D arrangement of surfaces in our surroundings but also provides a depth map of the external world. By measuring the rate of expansion across the image, we can gauge the time it takes to reach various points, essentially creating a time-based depth map. This method isn’t limited to movement towards a specific target; it works for any direction of movement.
Motion parallax further aids depth perception: closer objects appear to move faster across our retina than distant ones, a phenomenon evident when observing telegraph poles and trees from a moving train.
Additionally, movement relative to a surface generates smooth motion gradients in the image, with the type of gradient indicating the orientation of the surface—horizontal or vertical tilt. As the surface angle increases, the speed gradient in the image becomes more pronounced. By analyzing these gradients in retinal flow patterns, we can extract valuable information about the layout of 3D surfaces. Experimental evidence suggests that humans excel at using this information, supported by neurophysiological findings indicating selectivity for speed gradients in certain brain cells, such as those found in MT.
Biological motion
Biological motion
Gunnar Johannson (1973) attached a few lights to subjects’ joints (Fig 6) and filmed them in the dark so that only the lights were visible. A single frame from the resulting movie looks like an incoherent cluster of lights. But, as soon as the movie runs and the lights move about, even though the stimulus is now physically more complex, it is instantly recognisable as a person moving about. It is quite possible, using this kind of display, to make sense of two people dancing together, to recognise whether they are male or female, and even to estimate the weight of a box (also visible only as a few lights) that is lifted. The type of motion present in these stimuli has been referred to as ‘biological motion.’
Evidence from single cell recordings in macaque monkey suggests that some cells in area STP (superior temporal polysensory area), which receives input from both the dorsal ‘where’ —or ‘motion’—pathway and the ventral ‘what’—or ‘form’—pathway, are responsive for various body movements including walking
Humanistic motive and intent
Eventhough stimuli chosen to be very simple cartoon spontaneously attributed them with human characteristic
So the way things move used to work out things like eg a hits b and thats why it moved. Subtle and sophisticated ways involving humanistic motivate and intent.
Long range
Apparent motion but no motions after effect
Over long distances and short temporal gaps
Can tolerate changes in- colour, shape, luminance
The long-range motion detection system focuses more on how things move rather than their specific details like color, shape, or brightness. So, even if something changes color, shape, or brightness as it moves, this system can still detect its motion because it tracks the movement of individual features over time. In simpler terms, it pays attention to how things move rather than what they look like, which helps it tolerate changes in color, shape, and brightness while still recognizing motion.
Short range motion detection system vs long a well
The short-range motion detection system, on the other hand, focuses on detecting motion within a smaller, localized area of the visual field. It may be more sensitive to specific details like color, shape, and brightness changes because it operates within a limited spatial range. This system might be better at detecting fine details or subtle changes in nearby objects but may not be as effective at perceiving motion across larger distances or with rapidly moving objects. Essentially, while the long-range system tracks motion over a broader scope, the short-range system is more attuned to detecting motion in specific areas or objects within its immediate vicinity.
Real continuous motion and detected by motion detectors- Experiences motion aftereffect- The short-range motion detection system is more sensitive to specific details and changes in the visual field. When exposed to a moving stimulus for a while, it gets adapted to that motion. So when the stimulus stops, it may continue signaling motion in the opposite direction, causing the motion aftereffect. Long-range motion detection focuses on overall motion patterns and is less likely to experience this effect.
Becomes faster than temporal res and true motion unknown and ambigous. Eg cell selective for upward motion but responds strongly to motion square, motion of square is unknown we see an upward motion we see it mvoe up but could be left or right we dont know.
In short-range motion detection, when objects move too quickly, the sensors can’t keep up. In short-range motion detection, there’s a problem when the speed of the motion exceeds the temporal resolution of the sensors. This means the sensors can’t accurately perceive the true motion, leading to ambiguity. So, while we perceive the motion as upward, the true direction of the object’s movement remains unknown due to the limitations of the short-range motion detection system.
In contrast, long-range sensors, which detect motion over larger distances, can tolerate faster motion because they’re designed to track more gradual changes in movement direction over longer periods.
Why do we get rid of retinal motion thar arises due to eye movements from visual analysis
Not useful info
We eliminate retinal motion caused by eye movements from visual analysis because it can interfere with our perception of stable objects in the environment. If our visual system were to interpret every motion on the retina as movement in the external world, it would lead to confusion and distortions in our perception. Therefore, the brain filters out retinal motion caused by eye movements to ensure that we perceive the world as stable and coherent.
Retinal cues
Retinal motion cues are indeed important for many visual tasks, such as tracking moving objects or estimating motion direction. However, when it comes to analyzing overall scene motion, the visual system needs to compensate for the motion of the eyes themselves to avoid interpreting that motion as motion in the external world. This is why, in some cases, the visual system needs to distinguish between retinal motion cues arising from eye movements and those arising from motion in the external environment.
So not useful info when looking at overall motion
Object boundaries
Causality
Motive and intent
Time to contact
Surface slant
Relative depth
Sure! Let’s take the example of a person walking towards you.
- Object boundaries: Retinal cues help you perceive the person’s outline and distinguish them from the background.
- Time to contact: You estimate how long it will take for the person to reach you based on their speed and distance.
- Relative depth: You perceive the person as closer to you compared to objects in the background.
- Surface slant: If the person is walking on an inclined surface, you might notice changes in their posture or gait, indicating the slope.
- Causality: You understand that the person’s motion is driven by their intention to approach you.
These retinal cues contribute to your overall understanding of the person’s motion, allowing you to react appropriately to their approach. However, to fully comprehend their movement trajectory and predict their path accurately, your brain integrates these cues with other sensory information and contextual factors. We don’t want to eliminate retinal cues. Instead, in certain contexts, such as when studying overall motion perception or when focusing on specific motion processing mechanisms, researchers may isolate or control for specific cues to better understand their individual contributions. However, in everyday perception and interaction with the environment, retinal cues are essential for making sense of motion and understanding the world around us.
Reinhardt model (for short range)
The Reichardt model is a computational model used to detect motion in visual systems. It works by comparing signals from two adjacent light sensors over time. In short-range motion detection, this model is effective because it can detect small, rapid changes in motion, like those caused by nearby objects moving quickly. However, for long-range motion detection, where objects are farther away and moving more slowly, the Reichardt model isn’t as effective because it’s better suited for detecting rapid changes in motion rather than slower, more gradual movements. By adding orientation selective can do orientation as well so direction.
If model has pairs of units response to L and R motion distances= opp
If both units respond= stationary opbject
Units made to respond faster if distance between pairs of inputs for each unit is increased .
- Pairs of units response to left (L) and right (R) motion: In this model, there are pairs of neural units that are sensitive to motion in opposite directions. For example, one unit might detect leftward motion (L), while its paired unit detects rightward motion (R).
- If both units respond, it indicates a stationary object: If both units in a pair respond equally, it suggests that there is no motion occurring. This is because if an object is moving, one unit in the pair should respond more strongly than the other, indicating the direction of motion. But if both units respond equally, it suggests that the object is not moving.
- Units respond faster with increased distance between inputs: The responsiveness of these units increases as the distance between the inputs (e.g., visual stimuli) for each unit in the pair increases. This means that if the motion between the inputs is greater, the units will respond more quickly and strongly to indicate the direction and speed of the motion.
Overall comparing the responses of pairs of units sensitive to opposite directions of motion, with the strength of response indicating the presence, direction, and speed of motion
TTC
Expanding patterns of retinal motion are used to calculate time to contact TTC
TTC= 1/ expansion rate.
Forces of expansion are used to help calculate the direction of heading
Inflow vs outflow theory
Outflow= motion compared with eye movement instructions. Also known as corollary discharge theory and efference copy
Inflow cant explain why movements are compensated. Inflow is motion compared with feedback signals from the eye muscles.
Yes, that’s correct. Inflow theory fails to explain why compensation still occurs during passive eye movements when the muscles aren’t actively engaged. Therefore, inflow theory alone cannot fully account for how eye movements are compensated.
The inflow theory suggests that sensory feedback from eye muscles, called stretch receptors, is sent to the brain to compensate for retinal motion. This feedback allows the brain to adjust visual perception based on eye movements.
In contrast, the outflow theory, also known as corollary discharge theory, proposes that the brain predicts retinal motion by comparing the intended eye movement signal with actual retinal motion. This prediction allows the brain to separate self-generated motion from external motion in the visual field.
Inflow Theory: Imagine you’re driving a car and your friend is sitting next to you. In this theory, it’s like your friend is constantly telling you about the movements they see you making while driving. So, if you turn the steering wheel to the left, your friend tells you, “Hey, you turned the wheel to the left!” This constant feedback helps you adjust your driving based on what you’re doing.
Outflow Theory (Corollary Discharge Theory): Now, think of it this way: before you make a turn while driving, your brain sends a message to itself saying, “Hey, I’m about to turn left.” This message is like a prediction of what’s going to happen. So, when you actually make the turn, your brain already knows what to expect and can adjust your perception accordingly, without needing constant feedback from your movements.
Does that make it clearer?.
Correspondence pattern of motion
Problem in associating image points from 1 movie frame with the same points in a subsequent frame.
The aperture problem
Problem for moving orientated luminance contour viewed through aperture
True direction of motion is ambigious
The aperture problem arises when observing a moving oriented luminance contour through a restricted aperture, such as a small window or a limited field of view. In this scenario, the true direction of motion of the contour becomes ambiguous because only a portion of the contour is visible. Due to this restricted view, the observer cannot accurately determine the complete motion of the contour. This problem highlights the limitations of perceiving motion when only partial information is available, leading to an ambiguous interpretation of the motion direction.
Other convergence clues calc distance to nearby points of fixation
What can motion provide and not provide info about
TCC, surface slants relative depth, object trajectory
Not relative or absolute object size.
Is it desirable to remove the retinal motion that arises due to eye movements from further VISUAL analysis bc it provides no useful info ab the visual world
Yes but stil important eg navigation it’s important.
When the frame rate of a series of displaced images becomes faster than the temporal resolution of the visual system apparent motion becomes here same as real motion?? Explain
Temporal; resolution is the ability of our visual system to perceive changes in stimuli over time and the frame rates now matches it so frame relate becomes faster than that so it can perceive it. So not jerky rather it blends so we get smooth continuous blend
So when frame rate exceeds temp res images blend together and we get continuous motion or call flicker fusion. Apaprent motion= real motion they are indistinguishable bc it can perceive it
In optic flow the focus of expansion can be used— not bc time of contact why and what for
TTC yeah but it needs to know what direction it is heading in to detect the contact so first
Focus of expanasion= can be used to help calculate the direction of heading.
relative motion cues what and what not
Object boundaries
Causality
Motive and intent
Time to collision
Not relative reflectance
What is the aperture problem
Problem that for a moving orientated lumiancne contrast moved through man aperture the true direction of motion is ambigous.