Past paper 1 Section A (attention, audition and vision)) Flashcards
A) Exogenous vs. endogenous attentional cues
One influential concept of
visual attention was proposed by Posner (1980, JEP: Gen., Vol. 109) to account for findings from his
studies of attention-cueing. In a typical cueing paradigm, cues (e.g., central arrows) are presented
prior to the appearance of a target that could appear to the left or right of a central fixation point. The
cue summons the subject’s attention to either the left or right location. When, on some trials the target
appeared at the location to which attention had been cued (validly-cued trials) participants detected
the target and responded to it more quickly than when neither position was preferentially cued
(neutral trials). Conversely, on trials where the target appeared on the opposite side to the location that was cued (invalidly-cued trials), participants detected the target and responded to it more slowly than
when neither position was preferentially cued (neutral trials). The movement of attention from its
initial central location to the location of the cue was characterized by Posner (and other authors
subsequently) as the movement of a “spotlight” of attention across the visual field. Stimuli falling
within the spotlight were assumed to reach awareness more rapidly and hence responded to more quickly than those outside the spotlight.
Posner studied two types of spatial cueing. Endogenous orienting is elicited by centrally- presented
symbolic cues (e.g., ►). These cues reliably cue attention when they are spatially informative – that
is, when they predict the likely position that the target will appear in. They probably typically reflect
voluntary shifts of attention. However, some types of cue can cue attention even when they hold no
information about the likely position of the target. This type of cueing, associated in particular with
peripheral onsets, elicits exogenous orienting that is fast and reflexive (involuntary). Exogenous cues
can often summon attention to a location even when they are counter-predictive – that is where a cue
on one side will predict that the target is most likely to appear on the other side. Eriksen & Eriksen
(1974) concluded that the minimum size that human observers could set their hypothetical attention
‘spotlight to was 1 degree retinal angle. This conclusion reflected their finding that interfering effects
of task-irrelevant flankers were minimised when they fell more than 0.5 deg retinal angle away from
the target.
B) Interocular transfer of adaptation
interocular transfer. Refers to a change in threshold in one eye which had been occluded, similar to, but of lower magnitude, than that in the fixating eye in response to a visual stimulation.
No interocular transfer: colour + light/dark adaptation
Yes : tilt after effects, motion aftereffect
We have seen in previous lectures that adaptation to specific visual attributes
(a direction of motion, a particular wavelength of light or a particular orientation) can help to specify
the nature of the neural mechanisms underpinning these aspects of vision. In this lecture, we discuss
the interocular transfer of the effects of adaptation, to help understand where in the visual system
these adaptation effects occur. When an adapting stimulus is viewed with only one eye, the
aftereffects of the adaptation on subsequent perception can be measured separately for a test stimuli viewed with the same adapted eye, versus the test stimuli viewed with the unadapted eye.
If the site of adaptation in the eye/brain is pre-cortical (along the geniculostriate pathway), the affected cells
are monocular (receiving input only from one eye) and an aftereffect should only be seen in the
adapted eye.
Conversely, if the site of adaptation is in the visual cortex, the affected cells are much
more likely to be binocular (receiving input from two eyes) and we should then expect the aftereffect
to be similar when viewed in the adapted versus unadapted eyes.
C) Visual search
Conscious vision is also extremely limited in terms of how many objects we are aware of at any time.
While our intuition may tell us that we perceive many objects in a scene at once, phenomena such as
change blindness demonstrate that we are aware of only some of the objects in a scene at a time. To
what extent, then, can we process all objects in the visual scene at once.
The most powerful technique
for investigating this question is the ‘visual search’ paradigm. When an item differs from all items in a
display because of a single unique feature that is processed at an early stages of vision (e.g. colour,
edge orientation), it perceptually ‘pops out’ from the other items. Accordingly, the time taken to find
that odd-one-out item is short and independent of the number of other items in the display. This
phenomenon is termed ‘efficient, parallel search’ and implies that each of the items in the display was
processed simultaneously in terms of the feature that distinguished the odd-one-out. However,
higher-level cognitive functions like letter identity and face configurations seem to differ compared to
feature of early vision.
D) Auditory filter
the process responsible for the frequency selectivity of the auditory system. The initial stages of auditory processing are often described as consisting of a bank of auditory filters with different center frequencies.
The approach has been expanded to estimate the width of a range of different auditory frequency channels –
giving rise to psychophysical tuning curves by varying the spectral content of the presented noise (e.g.,
Vogten, 1974). A pure tone is presented at a low level (10 dB above detection threshold) to target a single
auditory channel. The masking stimulus is either a sine wave or noise filtered to cover only a small frequency
range. The sound level of the mask is adjusted to measure when masking just kicks in. This is done for
several frequencies near the pure tone signal.
This method assumes that the listener’s performance depends on using a single bandpass frequency
channel. There is likely to be overlap between perceptual channels, so listeners might have used
neighbouring channels to perform the task (an idea referred to as off-frequency listening). Subsequent
work has refined the approach to make this less likely.
A) Perceptual constancy
Vision automatically adjusts perception according to current conditions: Perceptual Constancy
Across both size and lightness constancy, we see that the retina input alone cannot explain constancy,
and that our high-level interpretation of the scene has a significant impact.
The adjustments vision makes to compensate for these changes can make physically identical
retinal inputs look very different.
Size constancy: (distance+size)An object’s size cannot be worked out from its retinal image alone. As the object gets
further away, it’s retinal image decreases proportionately.
Emmert’s Law (1881) states that the
perceived size of an object must therefore be scaled-up according to its distance from the observer.
Lightness constancy: (illumination+reflectance) In contrast to brightness (perceptual correlate of light intensity), ‘lightness’
refers to how reflective a surface is. Dark surfaces reflect only a small proportion of the light striking
their surfaces, absorbing the rest, whereas lighter surfaces reflect a greater proportion of light. Light
striking the retina from an object is a function of the level of illumination and the surface reflectance
(‘lightness’) of an object.
To work out the object’s lightness, vision must therefore be able to discount how much of the light reflected to the eye is due to the illumination level and how much due to the low or high reflectance of the object. By doing this, we will perceive the same level of lightness despite changes in the surrounding illumination.
B) Principle of Univariance
Although individual neurons are more responsive to some properties than to others, their responses are still ambiguous regarding the presence or absence of a particular feature - why?
We have seen that a cell’s state of adaptation can affect its response to a stimulus - additionally, stimulus salience can also affect the response, more salient stimuli eliciting larger responses.
So multiple factors affect a cell’s response. In contrast, a neuron’s response varies along one dimension
(its responds more or less). Hence, if a neuron is responding at 50% of its maximum firing rate, this
might either be due to the presence of a faint stimulus that it is tuned to, or a more salient stimulus
of an orientation that the cell prefers less.
We simply can’t tell from the firing of one cell - its response
is ambiguous - but by combining responses of multiple cells (pattern coding), we can (as illustrated in
the lecture).
C) Cone of confusion
The cues to distance we have discussed are, by themselves, ambiguous. IID and ITD suggest a relative
difference in lateralisation that could exist within a cone of different solutions. i.e. the sound source could be
at any position within this cone, and generate exactly the same IID and ITDs. How does the brain deal with
this ambiguity?
One simple way is by moving the head. This provides more
information (with a real sound source, not when using
headphones!) and can disambiguate the direction of the
sound based on an intersection of constraints.
Another valuable source of information comes from the
patterns of reflections, filtering and resonance that come from
the outer ear (pinna cues), as well as the shape of our
heads and upper torso. Together these physical properties of
the listener define the Head Related Transfer Function.
This complex filtering means that sounds originating from
different physical locations have different spectral properties.
A listener calibrated to the filtering properties of their own
auditory system can use this information to localise sounds.
These cues greatly help with judgments of sound source
elevation, but can also contribute to azimuth localisation
(e.g., some individuals who have unilateral hearing
impairment (i.e. are deaf in one ear) can localise sounds
well based on these cues).
Batteau (1967) demonstrated the importance of Pinna
cues, by contrasting the experience of sound via
headphones for a normal stereophonic recording, and a
recording made where the microphones were encased in
casts of a listener’s outer ear. Normal stereophonic
recording gave rise to listeners perceiving the sound
source to be inside their heads. By contrast, recordings
made inside casts of the pinnae were perceived as coming
from outside the head. Similar observations have been
made with full measurements of the HRFT. Other evidence
for the importance of the pinnae was provided by Gardner
and Gardner (1973) who showed that azimuth discrimination was much poorer if the pinnae were filled to
remove their filtering effects.
D) Multiple Object Tracking
During a most typical MOT task, eight identical items, usually filled circles, are presented to a participant in the beginning of the task, as shown on the figure above. Some of the items will be highlighted for a short period of time (by blinking or changing color) indicating that they are the targets to be tracked by the participant (a).
Then after the targets reverting to the identical state, all items will start moving around unpredictably, bumping into each other or the boarder (b). After a short period of time, these items will stop moving simultaneously. The participant then is asked either to identify all targets (full report) by clicking on the targets (c) or to identify if one specified item is one of the targets (partial report).[7]
Up to a maximum of 5 moving objects can be tracked successfully out of 10 total objects, as a typical MOT task shows.[1] However, this capacity may be changed based on the speed of the moving targets. Up to 8 moving targets can be tracked if they are moving at a relatively low speed, while only 1 target can be tracked if they are moving at a high speed.[9]
Moving objects are still being tracked when they are behind an occluder. Under certain situations, they also can be tracked even if all targets disappear together for a very brief amount of time.[10]
A person is able to complete two MOT tasks simultaneously if the targets are presented to the participant in separate hemifields. In other words, participant is able to track twice as many moving objects if the objects are divided between the left and right hemifields.[11]
The properties of moving targets are not relevant to the performance of MOT task.[10] Also participants have a very hard time to detect any property change of the targets during a mot task. In other words, even when targets are successfully tracked, participants can still have trouble recalling any color or shape change during the moving phase.[12]
A) The tilt aftereffect
When observers adapt to a patch of lines oriented 10-15 degrees from vertical, this causes misperception of a second patch of vertical lines, which appear to be tilted
in the opposite direction to those in the adapted patch (negative tilt aftereffect). Sharp peak in
aftereffects arises at around 10-15 degrees separation between adapting and testing angles,
suggesting that orientation-units in visual system tuned to orientations 10-15 degrees apart may
mutually inhibit each other.
Intriguingly, Hubel & Wiesel (1977) in their recordings from primary visual
cortex in the macaque, found that within functional units of the cortex know as ‘hypercolumns’ cells
in neighbouring regions tended to code orientations that were around 10-15 degrees apart. Perhaps
the tilt after-effect reflects inhibitory interactions between neighbouring regions in visual cortex:
these inhibitory interactions may help to finetune our perception of a line’s orientation. Perception of
orientation therefore seems to depend upon the relative firing of several orientation-sensitive
neurons coding a particular area of the visual field: another example of pattern coding.
B) The volley principle
However, there is a problem – individual neurons are not
capable of producing more than 1000 spikes / s, and we
can hear sounds up to 20 times this value. How could timing
information be useful given this limit on what individual
axons can signal? Wever and Bray (1937) suggested that
we need to think about more than 1 axon doing the firing.
They pointed out that a small group of axons could signal
much higher rates if their outputs were pooled. This is
known as the volley principle (where volley doesn’t refer to
hitting a tennis ball but rather is used on the sense of “a
number of bullets, arrows, or other projectiles discharged at
one time: the infantry let off a couple of volleys.” from the OED).
This reasoning is sound, and subsequent recordings from ganglion cells show that precise information about
timing can be seen in the responses of aggregated axons: a clear temporal signal. However, this ability to
show phase locking to the stimulus disappears once frequencies exceed 4-5 kHz. Thus, information in
temporal rate codes would not be conveyed to the brain for the frequencies of 5-20 kHz.
C) Receptive fields
Regions of the visual field in which
light stimulation causes the receptor to respond. By extension, other cells that are not themselves
stimulated by light, but whose responses are driven by photoreceptors (e.g., retinal ganglion cells,
visual cortical neurons) also have receptive fields.
D) The duplex theory of sound localisation
Lord Rayleigh noticed the potential for different distances for the two ears to give rise to differences in the
timing of signals. He tested this idea with two tuning forks, finding that timing differences alone were
sufficient to evoke an impression of auditory location:
interaural timing differences. He proposed that listeners
use different cues for different frequencies of sound - an
idea referred to as the duplex theory (1907). The two
cues are complementary in providing signals for
localisation at different parts of the sound frequency space.
In particular, timing differences are likely to be useful for
low sound frequencies, while intensity differences are
useful for high sound frequencies. Timing differences are
unreliable at higher frequencies because of an aliasing
problem (the brain doesn’t know which sound peaks to
match between the ears), while as noted above, level
differences are unreliable for low frequency sounds
because there is little acoustic shadowing.
Theoretical calculations and physical measurements
confirm the differential reliability of these signals at different
frequencies. Further, psychophysical experiments using
headphone presentation allows the two cues to be studied in
isolation - demonstrating that both support very fine
discriminations of auditory location: for pure tone stimuli
located directly in front of listeners, differences of ~ 1 degree
can be detected. For ITD stimuli, this is a 10 μsec time
difference; for IID stimuli, this is a 0.5 dB difference. The cues
can be combined together or put into opposition (termed ‘cue
trading’). The percept of sound direction for conflicting cues
depends on the frequency of the stimuli. While we can think
about these different types of information for different
scenarios, it is quite likely that we regularly experience a
mixture of IIDs and ITDs as we encounter broadband sounds
that have both high and low frequency components (e.g. speech) in our daily lives.