Seeing in 3D Flashcards

Question 1

Q

What is 3D vision?

Answer

A

When a 3D object or scene is projected onto the 2D retina, information about the third dimension is lost.
3D objects with very different shapes will give rise to exactly the same mental retinal image.
There is insufficient information in the retinal image alone to reconstruct the shape of the object
An infinite number of 3D shapes could give rise to a given 2D projection.
Can only be disambiguated by knowing the distance and depth or relative distance of their features.
Concept originates with Berkeley and the veil of perception whereby the perception of an object and the object itself are two distinct entities and all we can be certain of is the existence of the perception.

Question 2

Q

What are direct cues for depth and distance?

Answer

A

Sources of information for which there is a direct 1:1 correspondence between physical parameters and physiological signals.
accommodation, vergence eye movements and binocular disparity.

Question 3

Q

What are pictorial cues for depth and distance?

Answer

A

Sources of information about depth and distance for which there is no direct correspondence between physical parameters and physiological signals.
Instead they derive from out ability to learn about the relationships between distance and other cues in the environment.
Pictorial because they’re the cues used by artists to depict depth and distance in paintings.

Question 4

Q

What is accommodation?

Answer

A

A direct cue
The process of adjusting the thickness of the lens of the eye in order to bring something into focus.
When the ciliary muscles are completely relaxed the lens is flat and the eye is focused at infinity.
When the ciliary muscles contract, the lens becomes fatter and the focal length decreases so that the limit of our near vision the lens is as fat as it could be.
Theoretically the effort produced by the cilliary muscles to bring an object into focus or the value of the nervous signal causing it to contract could be read out to provide an index to the distance to the plane of focus.
This is not a useful source of information because the change in lens shape is only significant for distances about 2m.
The readout might specify distance to the point of focus but it provides little information about depth - the relative distance between two points.
Points in front or behind the plane of focus are burred and the blur contains no useful information about whether the point is nearer or further than the plane of focus or by how much.
Recovering 3D shape from accommodation would entail keeping track of multiple readings from successive fixations over time.
A slow process that would only be possible over short distances

Question 5

Q

What are vergence eye movements?

Answer

A

Vergence is the angle formed between the optic axis of the two eyes when we fixate on a particular point in the scene.
The nearer the point, the more eyes turn inwards or converge to fixate it.
Berkeley discounted this as a viable source of information about distance however it is useful for distances up to 6m.
Vergence codes for distance not depth therefore depth can only be inferred from multiple fixations.

Question 6

Q

What is binocular disparity?

Answer

A

The eyes are set apart by 62-65mm on average.
As a result they have different view points.
When we fixate at a point in a scene they receive slightly different images.
Depending on the position of an object in one’s field of vision, it can form an image at different distances away from the fovea of each eye - disparity.
Disparity is best measured in terms of visual angle.
If the raise from the object projected into the right eye from an angle ß and to the left eye an angle ∂, then the disparity is given by the difference between the two angles ß-∂.
Disparity is proportional to depth, the distance between the eyes and image planes of objects.
Binocular disparity could be very useful for assessing 3D shape because the depth of all points in the image, relative to the fovea, could in theory be measured simultaneously.
Also depends on obtaining an independent reading of the distance.

Question 7

Q

What did Wheatstonee’s (1836) stereoscope study demonstrate?

Answer

A

Demonstrated that binocular disparity gives rise to the perception of depth
Observer sits in front of a pair of mirrors angled so that each eye sees an image of a scene/object, drawn/photographed as if seen from the viewpoint corresponding to that eye.
The disparities in the two drawings correspond to the disparities that would have resulted from the object being projected onto the retina.
The observer fuses the two images so that they are seen as one.
The disparities are interpreted as if the observer were looking at the original 3D scene - what the observer sees

Question 8

Q

What do stereoanaglyph spectacles tell us about 3D vision?

Answer

A

Red and green filter over each eye
Red filter should only allow red light to pass while a green filter should only allow green light to pass
If we draw an object a scene from 2 viewpoints, 65 mm apart one in red and one in green and superimpose them and then view them through the anaglyph spectacles each eye will only see the drawing drawn in the colour of its corresponding filter.
The right eye covered with the red filter will only see the red drawing, the left eye covered with the green filter will only see the green drawing.
The visual system can detect the disparities between the two images in the same way as in the Wheatstone stereoscope and then interprets them as depth.
The images will stand out in 3D.
Wireframe model of a cube. If you look at it through one eye, all disparities are removed and it appears as the standard net cube illusion.
Its 3D shape is ambiguous and the percept may flip back and forth between a configuration with the upper face out and one with the lower face out.
When observed through both eye with the anaglyph spectacles it adopts a stable upper face out configuration - the shape depicted by the disparities between the red and green images.
More elaborate images can be used where the surface of an image varies continuously with depth
Apex of a cone flipping between being nearer and further away

Question 9

Q

What can random dot stereograms resolve?

Answer

A

Can be argued that the perception of depth from binocular disparity is a prerequisite for 3D shape perception OR that depth perception depends on first having identified shapes.
Images that contain disparity information but not detectable shape information
Julesz (1960) took a pair of identical images consisting of randomly assigned black and white pixels and then displaces a patch of pixels corresponding to shape by a small amount corresponding to some disparity.
The small gap that was left was filled with random values.
When viewed through a stereoscope the visual system fuses or aligns the pixels in the background of the two images and then detects that there is a disparity between the pixels that are in the foreground part of the image relative to the background and interprets their depth as being different.
Does this happen in normal vision and is it sufficient to allow the visual system to work out the shape of the patch
If this experiment is carried out using red and green dots and anaglyph glasses a square can be seen to stand out in the foreground against the background. When one eye is covered the square disappears because there is no information about the shape in each of the red and green images alone
The only information is in the disparity between the two images which we only detect with both eyes open.
The shape is inferred from the disparity.
The further away the greater the perceived depth which demonstrates the fact that disparity has to be scaled for viewing distance.

Question 10

Q

Evaluate binocular disparity for 3D vision?

Answer

A

Binocular disparity is a useful source of information about depth but not distance
Estimates of disparity thresholds vary according to how they are measured
Using the Howard Dollman peg test which involves two pegs placed at different positions and distances in the visual field - the minimum detectable disparity is 0.5 minutes of arc, which is comparable to the diameter of a cone photoreceptor.
So sensitive that stereoacuity, the ability to detect binocular disparity, is referred to as hyperacuity.
In real life this corresponds to a depth of 8 cm at a viewing distance of 6m
Stereopsis can precede shape perception but it does not necessarily do so.
Disparity must be scaled by viewing distance to be correctly interpreted
VIP seats in middle are optimum viewing distance for 3D movies
Viewing distance is necessary for interpreting depth and is measured sufficiently for close distances by accommodation and vergence eye movements but not further away.

Question 11

Q

What is occlusion.

Answer

A

Pictorial cues - interposition (occlusion)
E.g some cards seem to be placed further away than others - this is inferred from teh fact that the nearer cards occlude those behind them
When we look at the set up we see that it is different.
The cards that seem to be at the front are actually the furthest away and vice versa
But they have corners cut away so as to allow the further away cards to be seen
This allows the visual system to assume that the further away cards are occluding the nearer ones
Even when we know the layout of the scene we are unable to override the illusion that the smallest card is the nearest one
Knowledge does not override perception .’. the illusion is cognitively impenetrable
Perhaps because occlusion is so common in the natural environment that its safest for the visual system to assume always that if it looks like occlusion it is.

Question 12

Q

What is shape and shading?

Answer

A

Pictorial cues - shape and shading
Visual system tends to assume that light is directional and that brighter patches then correspond to directly illuminated surfaces
The darker patches correspond to surfaces in the shade
So the best shape compatible with this can change depending on the orientation of a picture - can appear to be a dome or crater
Visual system assumes that light comes from above
Shadows can be used to convey an impression of depth
The less a shadow is occluded by its corresponding object, the higher it seems above the background
The visual system assumes that the occluded patch lies on the surface the background and that less occlusion implies less distance between the object and its simulated shadow
Compatible with properties of surfaces and light sources that the visual system may have learnt through experience

Question 13

Q

What is aerial perspective?

Answer

A

Aerial perspective is the tendency for things to seem less distinct and bluer the further away they are due to moisture and pollution in the atmosphere
Perception of size is scaled by perception of distance the same actual image size will produce a larger or smaller apparent retinal image size depending on their perceived distance/ where they are place in an image.
The further away it seems, the larger it must be

Question 14

Q

What are textural gradients?

Answer

A

Texture gradients - recede with distance
Perception of size is scaled by perception of distance the same actual image size will produce a larger or smaller apparent retinal image size depending on their perceived distance/ where they are place in an image.
The further away it seems, the larger it must be

Question 15

Q

What is height relative to the horizon

Answer

A

the higher in the image and the closer to the horizon, the further away the object.
Perception of size is scaled by perception of distance the same actual image size will produce a larger or smaller apparent retinal image size depending on their perceived distance/ where they are place in an image.
The further away it seems, the larger it must be

Question 16

Q

What is linear perspective?

Answer

Study These Flashcards

A

Principle is that in real life for a given object the image size on the reina halbes as the viewing distance doubles
Reducing the dimensions of objects appropriately can be used to convey a sense of their depth
Linear perspective extends towards an implied vanishing point
Some ambiguity may be resolved on the basis of inbuilt preferences or assumptions - ‘heuristics’
One hypothesis is that as a result of our lifestyle we are biased towards 3D interpretations of images which are compatible with living ‘carpentered’ environments
Penrose triangle - impossible structure yet we readily accept it as a 3D object
It is argued that this is because we have an innate preference for 3D interpretations
Tend to go along with 3D interpretation, even when it is obviously wrong, suggesting that we have a preference for it.

Question 17

Q

Why do we need to perceive distance?

Answer

Study These Flashcards

A

One of the most important piece of information needed to interpret a visual scene is distance
We need to scale images to judge size and shape correctly
We need it to map the layout of the environment and the objects within it for the purposes of navigation and interaction.
The problem is that it is not obvious how distance is measured by the visual system, at least beyond a few meters
This may account for why we rely so much on potentially ambiguous pictorial cues
If this is the case why is the visual system so sensitive to binocular disparity if it is incapable of scaling it directly with an accurate estimate of viewing distance at least beyond a couple of meters.

Question 18

Q

What are conflicting cues?

Answer

Study These Flashcards

A

Given that we don’t always have access to accurate distance information we tend to rely on on multiple sources of potentially ambiguous information
Referred to by Helmholtz as cues and clues
Sometimes they convey the same information but sometimes they conflict
When they do the visual system must decide which interpretation to go with
Using anaglyph spectacles to observe red and green images starting with the red lens over the right eye as intended and then reversing the glasses reveals conflicts between cues. When this disparity in information is overridden it is evidence for which cue the visual system favours.
The visual system favours the pictorial cues over binocular disparity when they conflict
The visual system favours pictorial cues over shape
We rely on multiple cues to infer shape
When we look at a screen during a teams meeting, vergance cues and disparity cues tell the visual system that the scene is flat but other cues such as lighting and movement tell the visual system that it isn’t
Tiring as the visual system is constantly having to suppress some powerful cues in favour of others
We can test this by watching teams with one eye covered which removes both vergence and disparity cues. The images should look more 3D than it would with both eyes open.
Alternatively display on a large TV screen and sit several meters away.
At such a distance the depth information conveyed by the vergence eye movements should be negligible and that by disparity very much reduced, so that there is less of a conflict.

Question 19

Q

What is ames room?

Answer

Study These Flashcards

A

Trapezoidal room with a foreshortened wall adjacent to a raised floor and lowered ceiling on the RHS and low floor and high ceiling on the LHS
Room is viewed through a peephole
The geometry is set up so that the projections of the walls, floors, ceilings and windows of the trapezoidal room through the peephole is indistinguishable from that due to a regular cuboid room
In the demonstration a person in the near opposite corner looks oversized whereas the person in the far opposite corner looks tiny
A person walking along the back wall from one corner to the other seems to shrink or grown depending on what direction they walk
We know that people don’t shrink or grow in the course of a couple of seconds
But the retinal image of a person is scaled on the assumption that the room is cubic
If a person were to move diagonally through a cubic room tracing the same path the retinal image of the person would change in exactly the same way but we wouldn’t perceive them to change in size
So our interpretation of the retinal image is influenced by our inbuilt assumptions about the shapes of rooms
Can be used in cinematography to film dwarves and giants in the same scene

Question 20

Q

What can we conclude about seeing in 3D?

Answer

Study These Flashcards

A

It is not obvious that direct visual cues can provide the information needed to reconstruct the 3D scene unambiguously
Berkely first articulated the problem and suggested that the recovery of distance from visual images might depend on us learning to associate distance with other cues in the visual environment
Interpretation of many cues .’. depends on heuristics or assumption switch encapsulate knowledge of the world.
According to helmholtz perception relies on cues and clues - meaning incomplete information from multiple sources, which are then interpreted through unconscious inference
Gregory referred to perception as hypothesis testing implying that it involves a process of trying to find the best fit or interpretation of the available cues
Cues and clues often conflict as they point to different interpretations of the same scene.
When they do the conflict needs to be resolved in a way which preserves the most useful information about the scene
According to this account perception is said to be indirect because there isn’t sufficient information in the retinal image alone to specific unambiguously the structure of the scene that gave rise to it.

Seeing in 3D Flashcards

(20 cards)