Perceiving Depth Flashcards
Sources of depth information that would be available in a 2D picture (cues that rely on only one eye)
Pictorial (monocular)
– Information obtained from/relating to motion
Kinematic Cues
– Cues based on sensing the position of the eyes and muscle tension
Oculomotor
– Information obtained by comparing input from the left and right eyes (binocular)
Stereoscopic
Pretend this puppy is driving this car… Objects nearer to him appear to move
Faster
Motion Parallax/Motion Perspective
- when observer moved, displacement of an object’s image on the eye depends on its distance
- closer objects move more than farther objects
- Optic flow: when the whole visual field is considered,
Expansion/Contraction
- when an object approaches, its image expands
- if on a hit path the expansion is symmetric
Accretion/Deletion of Texture
when a surface moves relative to another, the nearer surface progressively occludes background texture on the farther surface
Kinetic Information
- means relating to motion
- motion perspective/motion parallax
- optical expansion/contraction
- accretion/deletion of texture
Stereoscopic: Binocular Disparity
- differences in the two eyes’ views of an object
- amount of disparity depends on the distance of an object from the observer
- the two images of a three-dimensional world are not the same
Stereoscopic: Horopter
- refers to sets of points in the world having identical binocular disparities.
- crossed disparity indicates that a point is nearer to the observer than the point being fixated.
- uncrossed disparity indicates that a point is farther from the observer than the point being fixated
Oculomotor: Accommodation
- refers to changes in the shape of the lens to achieve focused images at varying distances
- may provide distance info via unconscious sensing of the muscular movements (in the ciliary muscles) that produce the lens changes
Oculomotor: Convergence/Divergence
- refers to the turning of the two eye to get a particular point in the center of fixation (fovea) of each eye
- provides depth info via unconscious sensing of the muscular movements used to turn the eyes
Divergence: when the eye moves to see something further away
- doesn’t work past 2 meters (near space)
Oculomotor provide ______
metric: absolute distance information
* limitation not useful past 2 meters, lens (accommodation) is at thin as it gets
Pictorial Information (monocular)
- monocular cues (can operate with only one eye)
- mostly relates to rules of optics and geometry that governs the projection of the world onto the retina
- involves using rules of projection (inverse optics) in reverse
- laws of optics: scene → retina
- inverse optics: retina → scene
What are some results of the laws of optics that the brain might use to infer depth of objects in a 2D image?
- Nearer objects take up more of the visual field
- further away an object is, the nearer it appears to the vertical horizon (vanishing point)
Perception as an inference:
brain infers scene due to its probability
the nervous system calculates the probability of each scene given the sensory evidence, and prior knowledge, and chooses the scene that has the highest probability
- unnoticed judgement: Al-Haytham/Alhazen
- unconscious inference: von Helmholtz
- Bayes rule
Bayes Rule
- probability of a specific scene given an image is proportional to the probability that, that scene can give rise to that image times the probability of the likelihood of that retinal image in general
- *P(Sx | I) [posterior]: P(I | Sx)[likelihood]P(Sx)[prior]
Combining depth cues
- P(depth | cue1, c2, c3, …) : P(c1 | d)P(c2 | d)…P(d)
- the nervous system follows an optimal statistical rule of combination in combining different cues (weighted average if P(d) is uniform)
Which cue(s) provide non-metric information?
Occlusion
Which cue(s) provide metric information?
Convergence, Accommodation, and Familiar Size
Which cue(s) provide relative metric information?
Motion Parallax
Relative Size
Relative Height
Binocular Disparity
Pictorial Depth cues
Relative Size Familiar Size Texture Gradient Relative Height Linear Perspective Occlusion Aerial Perspective
Bayesian Inference
Combining multiple sources of information to arrive at a final percept of the depth to an object or a final interpretation of a 2D image
The final percept will be some combination of prior beliefs and the evidence at hand
Priors
Before even looking at an image, we have an a priori belief about how likely it is that the world is in a given state
• The evidence we obtain from the image is combined with the sea priori beliefs
Bayesian Inference: The Basic Idea
- Multiple kinds of information are taken into account to arrive at a final decision or percept
- A priori, humans have some belief about the likelihood of the world being in a given state (PRIOR)
- The retinal image could have been produced by many possible states of the world. However, assuming a given state of the world, that state has a likelihood of producing the retinal image supposing an image of the scene were taken from a random viewpoint (LIKELIHOOD)
Likelihood:
Assuming a given state of the world is true, the likelihood of that state producing the 2D image shown
Posterior:
Probability that the reality is a given state of the world given that we observed image I
Generic viewpoint:
The vast majority of random viewpoints of a scene will fall into this category; these viewpoints provide similar information about the scene and suggest roughly the same surfaces
Accidental viewpoint:
A kind of “freak accident” viewpoint. The 3D surfaces suggested at this viewpoint are very unlike those existing in the real world.