Audition Flashcards

1
Q

Lecture 1 content

A

This lecture is about the physical events that underlie our ability to hear. We’ll be addressing the following
questions:
* What is sound?
* How do we measure and characterise sounds?
* How does sound reach the ear?
* How do physical vibrations get translated into nerve impulses?

Here are some of the technical terms for the concepts we will cover:
* Simple harmonic motion and the resonance properties of physical materials
* Fundamental frequencies (F0) and harmonics (overtones)
* Propagation of sound in the air
* The Pinna and Ear Canal
* The Tympanic membrane and the Ossicles (malleus, incus, stapes)
* The Cochlea: oval window, round window, Basilar Membrane, Organ of Corti
* Inner and Outer hair cells, Tectorial Membrane
* The transduction of vibration energy into electrical responses

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Lecture 2 content

A

This lecture is all about how sound information is encoded in the cochlea and how this relates to our
perception of sound. We’ll address the following questions:
* How does the basilar membrane work in response to sounds?
* How do hair cells encode and relay sound information?
* How is the information encoded by the cochlea used to determine pitch and loudness?
Here are some of the concepts we will cover:
* Travelling waves on the Basilar membrane
* Spectral decomposition by the cochlea
* Outer hair cells, active amplification and otoacoustic emissions
* Inner hair cells and their afferent connections
* Place-based encoding along the basilar membrane
* Phase locking of auditory ganglion cells
* The volley principle for conveying sound timing information
* Place vs. time codes for pitch
* Missing fundamental sounds
* Rate and number of neuron encoding schemes for loudness

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Lecture 3 content

A

In the previous lectures we considered ideas related to the properties of sound sources (e.g. pitch, timbre).
This lecture concerns how sound information is used to locate sources. We’ll address the following:
* How does encoded sound information progress from the cochlea to the cortex?
* How are binaural (i.e. from two ears) signals used for localisation?
* What are the neural circuits involved in extracting binaural signals?
* What is the role for monaural information?
Here are some of the concepts we will cover:
* The ascending auditory pathway: cochlea nucleus, superior olivary complex, inferior colliculus,
medial geniculate nucleus, auditory cortex
* Binaural processing in the brain stem and the preservation of tonotopy
* Spherical coordinates (azimuth, elevation, distance) and binaural sound cues
* Sound intensity with distance, acoustic shadows and interaural intensity differences (IIDs)
* Timing cues to lateralisation and interaural timing differences (ITDs)
* Rayleigh’s duplex theory and frequency dependency
* Difference coding in the Lateral vs. Medial Superior Olive and Jeffress’ coincidence detection
* Pinna cues and the Head Related Transfer Function (HRTF)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is sound?

A

Objects in the world have physical properties relating to their size, shape, material composition and location.
Our ability to recognise and interact with these objects depends on the brain making sophisticated deductions
(inferences) based on the information delivered by the sensory organs (eyes, ears, etc.). Through experience
and evolution, these inferences are shaped by the information contained in the signals themselves and our
knowledge of our own sensory transduction processes.
Imagine you are sitting in your house. You hear a door slam shut followed by an electronic beep. The sound
information reaching your ears allows you to work out:
The sound came from outside. [Where]
The slam happened before the beep sound. [When]
The door was a car door. The beep was the locking mechanism.[What]
It was someone arriving in their car. [What]
The sound information by itself would not allow you to work out, for instance: the colour of the car that arrived
or whether the person was left or right handed. Why are some inferences possible (Type of door) and others
not (e.g. colour)? You (and your brain) know (implicitly if not necessarily explicitly) that the sound vibrations
reaching your ears caused by the door relate to its location, weight (mass), elasticity (stiffness), physical
dimensions and shape, material composition (e.g. wood vs steel).
To summarise, when things happen to objects they can often emit sound (vibration) energy, and different
objects emit sound energy which reflect their own physical characteristics.
Simple Harmonic motion of a mass and spring system

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What causes the sound?

A

Physical objects have mass and a certain amount of
elasticity (even glass bends a bit). If a force is applied to an object, it will bend until counteracting forces push it back (or it breaks). The action of the incoming force and counteracting force can be thought of (modelled simply) as a mass-spring system. Where the mass of the object and the stiffness of the spring determine the frequency of vibration. The act of applying force to the object will put it into vibration, but the vibration frequency of the object is mostly determined by its physical properties. For example hitting a drum and striking a bell can be done with the same
percussive action, but they will not sound alike. The
vibrations come from the object more than from the thing that hits the object.

A sine wave can be used to describe the vibrating behaviour of many objects. The ‘standard’ sine wave, examples as seen in the simple harmonic motion diagram above – can be specified using just 3 simple parameters (amplitude, frequency and phase). With these we can describe any possible sine wave.
The Frequency and Amplitude of a vibration are related to the pitch of a sound and its loudness.
A sine wave is a good description of real vibrations, and a single sine wave (a pure tone) gives rise to a particular
sound.

However, when most real objects vibrate, they
produce vibrations at many frequencies. To illustrate this, we can think about the vibrations that are produced if we pluck a guitar string. The string will vibrate back and forth along its whole length at a frequency determined by its
tension, mass and stiffness. However, while the whole
string has a particular mass, it is useful to think of the string as being composed of two halves, each of which has half
the mass and therefore vibrates at twice the frequency. In
fact, it’s not just half the string, we can think of thirds,
quarters etc of the string, each of which will vibrate at
higher frequencies related to integer (i.e. whole number)
multiples of the basic frequency of the string. More
formally, real objects tend to vibrate at a Fundamental
resonant frequency (F0) and at harmonics (overtones)
of that frequency: 2 x F0, 3 x F0, 4 x F0, and so on. When
these vibrations happen, they do so at all once, so the
sound that we hear corresponds to the combination of the
Fundamental and its higher harmonics.
Simple strings are easy to visualise and understand as the
vibration is in only one dimension. But real objects can
vibrate in two or three dimensions. For example, plucking
a guitar string causes the body of the guitar to vibrate
along its length, side to side, and front to back. These
vibrations happen along all of these dimensions at once,
at the fundamental and higher harmonics. This complex
pattern of vibrations is expressed in the Frequency
Spectrum for a given sound. This spectrum, together with
the Amplitude envelope of the sound is responsible for
the different sounds made by objects. The envelope is
described as having an onset, steady state, and an offset.
The onset plays a major role in how an instrument sounds.
For example, the same musical note played on a bell and
a piano produce different amounts of vibration energy at
different frequencies that start at different times and last
for different amounts of time. This difference in the way
things sound is referred to as the timbre (or texture) of a sound.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How do we interpret complex real sounds?

A

The insights of a famous French Mathematician come to our aid. Joseph Fourier (1768-
1830) wasn’t interested in sound, but the abstract problem of how heat energy flows through a solid object. Yet his basic research in experimental mathematics has had huge and lasting impact – it’s no
exaggeration that much of the technology we take for granted today originates from Fourier’s theory.
Fourier proposed that any signal, no matter how complex, could be described as the sum of a family
of simple sine waves. Thus, when presented with a complex signal, we can decompose it into a number of
simple consistent parts that depend on only 3 parameters: amplitude, frequency and phase.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is frequency spectrum?

A

When using Fourier Analysis to decompose a signal into
its consistent parts we typically plot the Frequency and
Phase parameters of the signal for each frequency. Phase
conveys important information, but its interpretation is
complicated, so we will say no more about it and dwell on
the Frequency spectrum. We can think about the
Amplitude parameter as a quantitative value – it tells us
how much of that frequency is present.
A pure tone (single sine wave) is represented on a
spectrum by a single line that plots the amplitude of that
sine for that single frequency. A “complex tone” is defined
as an acoustic stimulus consisting of a combination of
pure tones – this gives rise to a spectrum with a series of
lines on it. When played, we hear the sound of all of the
components together, rather than hearing the individual
pure tones that compose the sound.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

The harmonic series of a square wave

A

A classic example of Fourier decomposition is provided by
considering a square wave. The square wave has the
special property that it is composed of (an infinite) family of
sine components that are odd harmonics of the
fundamental (i.e., F0, 3F, 5F, 7F, … etc) and the amplitude
of each component decreases exponentially (i.e., 1, 1/3,
1/5, 1/7, … etc). While the spectrum continues to infinity,
we can have a reasonable approximation of the square
wave signal with a small number of components.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Continuous spectra

A

The above examples have discrete Fourier series (i.e. lines
in the spectra at individual frequencies). This is the case
when the source is oscillatory, as in a violin string. Other
sound sources can be more noise like. For example
environmental noises like wind or running water. A special
case of this is random signal fluctuations that you might
hear if you mistune an analogue radio (‘crackle’) – this
random noise is generally referred to as ‘white noise’ and
it has a spectrum that is flat (continuous horizontal line) –
i.e. it contains all possible frequencies of sound and all of
these frequencies have the same amplitude.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

The limitations of a strict Fourier understanding

A

This understanding of Fourier analysis is based on the
assumption that the signals we have looked at, (the sine
wave, the square wave and white noise waveforms) go on for an infinite time to generate the spectra we see
above. Introducing the temporal changes (i.e. stops and starts) in the time domain requires additional
components in the frequency domain. Hence, a brief click is composed of the sum of frequencies over a wide
range. Sensory systems such as audition are understood to process signals over a finite time window, and
limited frequency bands, where a range of frequency signals are considered within each band, rather than
being limited to a single discrete frequency value.
In sum, Fourier analysis provides the scaffolding for understanding how sensory systems such as your ears
work, but strict Fourier analysis is not a biologically realistic model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How does sound reach the ears?

A

A vibrating object causes the molecules in the medium
around it (e.g. air or water) to vibrate as well. This causes an
increase in the concentration of molecules (air pressure) in
one place (compression) and reduces it (rarefaction)
elsewhere. These pressure differences are not sustainable
because there is nothing to prevent the air in the highpressure region from moving toward a lower pressure region.
This causes a travelling wave through the air.
In an open space, sound waves travel out in all three
dimensions away from the source. The same amount of
vibration energy is therefore spread out over a greater volume
of space as we get further from the source. The amplitude of
vibration is decreased in proportion to the square of the
distance (inverse square law).
* In a free field an omni directional sound source radiates in all directions
Another property that changes with distance relates to the frequency content of a sound. When sound is
propagated through the air, energy is absorbed by air molecules as the wave is propagated through the air.
This affects high frequencies much more than low frequencies. For example, a strike of lightning sounds very
different depending on how close you are to the location of the strike. Nearby, it sounds like a loud ‘crack’ (it
has low frequency and high frequency sound energy). Further away, it sounds like a low ‘rumble’ of thunder –
most of the high frequency information was absorbed in the intervening air.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How are sounds characterised?

A

We covered the frequency content of sounds (above),
now we turn to the amplitude of the sound waves. This
is typically done in terms of the pressure that a sound
source produces. The human ear is sensitive to a huge
range of sound pressures. The lowest pressure level
we can hear is equivalent to about one ten millionth of
the pressure that is put on your finger by a penny. This
contrasts with some of the greatest pressures we can
sense which are a million, million times greater (i.e.,
1,000,000,000,000) than the least we can hear. To
cope with this huge range of values, it’s useful to use
a logarithmic scale (log 1 = 0; log 10 = 1; log 100 = 2;
log 1000 = 3; etc), and express a particular sound
pressure relative to the lowest pressure we can hear.
This ratio value is typically measured in decibels. The
‘standard’ decibel scale for quantifying sound uses a
minimum sound level of 20 micropascal, which is the
average value for the least people can hear (although
individuals differ). When we use this ratio value
pressure levels are quoted as “dB SPL” where SPL
refers to the ‘sound pressure level’.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How are vibrations translated into nerve impulses?

A

Vibrations are channelled by the outer ear (pinna and meatus) to the ear drum (tympanic membrane).
The air pressure differences between the two sides of the ear drum cause it to move in and out with the sound.
The next step is the complex ossicular chain of the
middle ear: the malleus, incus and stapes. The Malleus
is attached to the ear drum, with the stapes interfaces
with the oval window of the cochlea. This complex
chain of bones serves to act as an impedance
matching device. This is needed because the cochlea
is filled with fluid (perilymph) that has a much higher
impedance than air. By analogy, if you shout to a friend
swimming underwater, they are unlikely to hear because
the majority of the sound energy would bounce off the
surface of the water due to the water’s higher
impedance. So it is with the cochlea – if the ossicles are
damaged – sounds need to be very loud for a person to
have any chance of hearing them. The area of the ear
drum is much larger than the face of the stapes, meaning
that the ossicles concentrate the same force over a
much smaller area (think of it like a drawing pin – you
can push these into a hard wall because the point of the
pin is very small in relation to the area you push with your
thumb). The hinge-like ossicular chain also provides a
degree of leverage to further amplify the movement of
the stapes. This arrangement is quite complex, because
muscles interfacing with the ossicles allow a reflex to
partially disengage the stapes during loud sounds (the
acoustic reflex). This reflex is rather slow (it won’t
protect your ears from sudden loud sounds), but it can
help reduce damage to the cochlea in high sound
pressure environments. It also disengages when we
speak – probably accounting for why recordings of our
own voices sometimes sound odd.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Name the structure of cochlea / inner ear

A

The cochlea is a snail-like structure embedded in the rigid temporal bone. It is a spiral of ~ 2.5 turns, is ~ 3.5
cm in length and has an average diameter of ~ 2 mm. It is divided into 3 compartments: the scala vestibuli
(trans. “entrance steps”), the scala tympani (trans. “steps to the drum”), and scala media (trans. “middle
steps”). The scala vestibuli and tympani interconnect at the apex (top) of the cochlea at the helicotrema (trans.
“hole in the helix”), and are filled with perilymph (trans. “surrounding clear fluid”). The scala media is much
smaller and sits away from the centre of the spiral – it is filled with the potassium (K+) ion rich fluid endolymph
(trans. “inside clear fluid”).
When the stapes move, pressure is applied to the fluid
inside the cochlea causing it to move. This movement will
cause a movement of the Basilar membrane (B.M.) that
divides the scala vestibuli and scala tympani. As the
perilymph does not compress, the increased pressure
causes the round window to bulge outwards as the oval
window is deflected inwards. Different frequencies of
sound stimulation give rise to maximal vibrations of the
membrane at different locations (tonotopy). Low
frequency sounds cause maximal movement near the
apex of the cochlea, while high frequency sounds cause
displacement at the basal end near the stapes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Selective frequency by two factors:

A
  1. The stiffness of the BM changes along its length – it is stiff at the base at floppy at the top (a high to low
    stiffness gradient)
  2. The perilymph has inertia – this means that it is much harder to produce the necessary accelerations and
    decelerations for high frequencies than low frequencies. As such high frequency movements are attenuated
    as distance increases away from the oval window (i.e., a low to high impedance gradient).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Mechanism to detect sound

A

Movements of the BM cause movements of the organ of
corti that is located in the scala media. Auditory hair cells
are mounted on the BM side pointing up towards the scala
vestibuli. When the BM moves, it causes a shearing motion
between the portion of the organ of corti that sits on the BM
and the tectorial membrane (the ‘roof’ of the structure)
that sits above it. This movement causes a movement of
the stereocilia of the hair cells.
There are two types of hair cells: inner (towards the
centre of the spiral) and outer, and they have different
functions (we’ll explore these in lecture 3).
The stereocilia on the tops of the hair cells form rows that
are connected together by tip links. When there is
movement in one direction, the tip links cause an opening
of ion channels in the stereocilia. The endolymph in the
surrounding scala media is rich in positively charged
potassium (K+) which enters the cell through the open ion
channels causing the cell to depolarise. In turn this leads
to a release of glutamate providing an excitatory
glutamatergic connection with the neighbouring spiral
ganglion cells. These ganglion cells have long axons that
travel through the auditory nerve (the VIIIth cranial nerve)
to the cochlea nucleus of the brainstem.
The deflection of the stereocilia changes the extent of hair
cell depolarisation by changing the number of open ion
channels and controlling the flow of electrical current into
the cell. This dependency allows vibration to be
transduced into an analogue electrical signal. This
analogue depolarisation is transmitted to the brain via
discrete action potentials in the ganglion cells.

17
Q

What are filters?

A

Filters and filtering are ways to separate things, be it
the coffee grinds from your drink, or unwanted
frequencies in a signal. In a music system’s
loudspeakers, filters separate the high frequencies from
the low so that the bass notes go to the big speaker
and the high notes go to the tweeter. Electronic filters
are designed in one of three basic types: high pass,
low pass and band pass/band stop filters. A high
pass filter will remove or attenuate low frequencies
below a certain cut-off point. A low pass filter will
remove or attenuate frequencies above a certain cu toff
point. A bandpass filter has an upper and a lower cut off
point and passes only frequencies in the range between
these two. A band stop filter attenuates the frequencies between the cut-off points, leaving those above and below the range intact. Bandpass filters are also
described by their centre frequency and bandwidth.
The idea of the bandpass filter is particularly relevant to the study of the auditory system, because one part
of our hearing system is thought to behave like a bank
of overlapping bandpass filters, covering the range of
audible frequencies, 20 – 20,000 Hz.

18
Q

How do we characterise the filters used in the
human auditory system?

A

What about the frequency range that individual filters
can process? We expect that the band pass filters of
the auditory system will respond to more than just one
frequency – but is the range: ±0.1 Hz, ±2 Hz, ±50 Hz, ±
1000 Hz, or what? To find out Fletcher (1940)
conducted a classic experiment based on masking.
Masking is when one sound is made less audible due to
the presence of another sound. Have you ever had
difficulty hearing someone talk in a nightclub or busy
pub? That’s masking!
Fletcher started by passing white noise through a band
pass filter, so that only particular frequencies were
present in the noise: so called band pass filtered
noise. He then presented a single pure tone centred in
the middle of the band pass filtered noise. Next, he
measured how the sound level needed to be changed
so that listeners could detect the tone in the presence
of the filtered noise. He changed the characteristics of
the noise to see how this affected performance. The
pass band for the noise was centred at the frequency of
the pure tone, and he systematically changed the width
of the pass band to see how exposing listeners to more
and more noise affected their performance. He found
that as the bandwidth of the noise increased,
performance got worse, and listeners needed more and
more sound pressure in the pure tone to detect it.
However, this didn’t go on indefinitely. – after a certain
point, adding more noise didn’t make performance any
worse. The big idea was that the pure tone signal was
passing through one filter only. Once the noise width
exceeded the bandwidth of this filter, no more noise
was going to go in. You had reached the maximum
noise to signal ratio, hence performance would not get
any worse. Thereby Fletcher was able to estimate the
width of the auditory filter. The bandwidth at which the
signal threshold stops increasing was called the
‘critical bandwith’ (CB)

19
Q

What is the shape of the Auditory filter?

A

The approach has been expanded to estimate the width of a range of different auditory frequency channels –
giving rise to psychophysical tuning curves by varying the spectral content of the presented noise (e.g.,
Vogten, 1974). A pure tone is presented at a low level (10 dB above detection threshold) to target a single
auditory channel. The masking stimulus is either a sine wave or noise filtered to cover only a small frequency
range. The sound level of the mask is adjusted to measure when masking just kicks in. This is done for
several frequencies near the pure tone signal.

This method assumes that the listener’s performance depends on using a single bandpass frequency
channel. There is likely to be overlap between perceptual channels, so listeners might have used
neighbouring channels to perform the task (an idea referred to as off-frequency listening). Subsequent
work has refined the approach to make this less likely.

20
Q

How does the basilar membrane work?

A

The cochlear is a small structure buried in the temporal bone, with the basilar membrane hidden away inside
a complex 3D spiral. Understanding how it functions is therefore something of a challenge. Before there
were any measurements of its function, scientists speculated on the way it might work. Two major classes of
theory about the functions of the BM relate to the idea that:
* Sound frequency information is determined by a spatial position code along the basilar membrane.
Different places on the BM resonate with different sound frequencies. Neurons can read off the
activity at these locations, giving rise to axons that respond to a specific
frequency (or narrow range of frequencies).
* Sound frequency information is determined by the frequency at which
neurons fire: a temporal code. Thus, particular locations on the BM are
not critical, rather it is the rate at which action potentials are produced.
The Hungarian biophysicist Georg von Békésy revolutionised our understanding
of the functions of the BM using careful observations of the BM response to
sound in human cadaver cochleae. His technique involved drilling a small hole in
a cochlea and placing silver particles onto the surface of
Reissner’s membrane (the top membrane of the scala
media, but its motion mimics that of the BM). He then
observed motion of the membrane through a microscope,
using stroboscopic illumination to help see the very rapidly
moving surface. He had to apply very intense sounds (140
dB!) to get sufficiently large amplitudes of movement so that
they were visible, and was only able to record from near the
apex of the cochlea (i.e., low frequency sounds). His insights
into the cochlea won him the Nobel Prize (1961).
von Békésy used a Fourier based approach to understand
the BM’s movement. By measuring the response of the BM
to pure tones, he was able to predict its response to complex
sounds based on a Fourier decomposition. One of his main
observations was the travelling wave vibration. He
demonstrated that different tone frequencies evoke maximal
excursion of the BM at different places along its length: i.e.,
tonotopy.
Subsequent work has refined these methods, for instance
using lasers or radioactive sources to study the movement of
the BM at less intense sound levels. A significant
development has been the appreciation that the movements
of the BM are more tightly constrained in position than was
measured by von Békésy. In particular, post-mortem BMs
show lower amplitude movements and so appear less
specific to a given frequency.
This provides compelling evidence that the BM acts like a
mechanical frequency analyser. Different places along the
BM vibrate for different frequency components, where the
tonotopy is approximately logarithmic with the frequency of
stimulation. This is like Fourier analysis, but differs in that
the bandwidths (i.e., frequency range) are quite broad.
Further, because the BM is a continuous sheet, responses
at two locations are not independent of each other. Although
different frequencies change the location of the peak of the
response, many other portions of the BM still respond to the
travelling wave of a given frequency, just with smaller
amplitude.

21
Q

How do the hair cells convey sound information?

A

We know from Lecture 1 that the hair cells in the organ of Corti are responsible for transducing physical
vibration into electrical impulses, but we didn’t discuss their functions. There are two types of hair cells, inner
hair cells (closest to the centre of the cochlea spiral) and outer hair cells.
Outer hair cells play a critical role in actively amplifying the incoming sound. In particular, when their
stereocilia are stimulated they contract, pulling the tectorial membrane with them and adding vibration
energy to the organ of corti. They contain a remarkable motor protein (prestin) that is one of the fastest
acting biological motors known to man (much faster than the myosin molecules that contract muscle). They
have been observed to contract and expand over 70,000 times per second, but may be able to go faster!
Their movement was beautifully demonstrated by Jonathan Ashmore at UCL – using a patch clamp pipette
on a single outer hair cell from a guinea pig, they injected an electrical current waveform of “Rock around the
clock” – demonstrating movements that mirror changes in sound level.
The amount of active amplification provided by the outer
hair cells is not constant – it amplifies low intensity
sounds a lot (~50 dB) and the gain reduces as the input
intensity increases. This change of amplification is a
significant non-linearity in cochlea behaviour. The
‘active process’ of amplification can introduce new
frequency information on the Basilar Membrane. You
should hope your music system is linear – you don’t want
it introducing its own frequencies to your favorite tunes,
but the human auditory system is anything but linear!
The active amplification process makes the relationship
between input and output curved (a compressive
nonlinearity).
The behaviour of the outer hair cells can be
diagnostically very useful. We can stimulate the ear with
two pure tones, and then record the sound information
that comes back from the ear. If the hair cells are
working correctly, then we can measure sounds coming
out of the ear that contain frequencies that weren’t in the
input. These sounds are referred to as otoacoustic
emissions – and nowadays they form the basis of the
new-born hearing screen.
Damage to the outer hair cells significantly reduces
active amplification with the result that individuals can
have significant hearing impairments. Thus, the outer
hair cells play a key role in normal hearing.
The inner hair cells have a quite different function and
are critically involved in the fast transmission of sound
information to the brain. Outer hair cells generally form
synapses to a single ganglion cell which pools responses over several hair cells. Inner hair cells however
speak to multiple ganglion cells, and they don’t share their audience! Each individual inner hair cell has
synaptic connections to around 20 ganglion cells which send fast axons through the auditory nerve to the
cochlea nucleus.
The organisation of the cochlea nerve is tonotopic, with individual axons responding most to a particular
sound frequency (the characteristic frequency) and only responding to a limited range of frequencies (i.e.,
band pass filters). The ganglion cells that connect with a given hair cell have different characteristics, and
are classified into 3 types depending on how many action potentials they produce ‘at rest’ (i.e. when there is
no sound). High spontaneous rate axons (20-50 spikes per second in quiet) will respond to low intensity
sounds, but they saturate (i.e. can’t fire any faster) once sound level reaches ~ 40 dB SPL. Medium (< 18
spikes/s at rest) and Low (< 1 spike/s) spontaneous rate axons will not increase their firing rates with sound
level until it reaches 20-30 dB SPL, and they continue increasing firing until sound level increases to 80 dB
SLP or more. Thus, individual axons have different response gains – some respond better to low sound
levels and others high sound levels: this differential activity may contribute to our perception of loudness over
such a wide range.
All of this is quite suggestive of the importance of a place code for signalling sound frequency. However,
reliable temporal information is also available to the brain…

22
Q

Explain phase locking

A

Measurements of action potentials (‘spikes’) in the cochlea
nerve reveal an interesting behaviour. Spikes tend to occur
around the peak amplitude of a sound wave – i.e., their
firing is synchronized to the phase of the sound source.
This relationship isn’t perfect – action potentials are
stochastic events (there is a degree of randomness to
them) – so there isn’t an action potential fired for every
cycle of the stimulus, and the precise timing of the spike
relative to the peak amplitude of the sound will vary.
Nevertheless, there is a high correlation between the firing
rate of individual neurons and the stimulating sound
frequency. This correlation tends to become stronger as the
sound level is increased (i.e., action potentials are more
likely to occur, and their timing will be more tightly coupled
to the stimulus). This clearly suggests a useful temporal
code for sound frequency.

23
Q

How could timing
information be useful given this limit on what individual
axons can signal? - The volley principle

A

However, there is a problem – individual neurons are not
capable of producing more than 1000 spikes / s, and we
can hear sounds up to 20 times this value. How could timing
information be useful given this limit on what individual
axons can signal? Wever and Bray (1937) suggested that
we need to think about more than 1 axon doing the firing.
They pointed out that a small group of axons could signal
much higher rates if their outputs were pooled. This is
known as the volley principle (where volley doesn’t refer to
hitting a tennis ball but rather is used on the sense of “a
number of bullets, arrows, or other projectiles discharged at
one time: the infantry let off a couple of volleys.” from the OED).
This reasoning is sound, and subsequent recordings from ganglion cells show that precise information about
timing can be seen in the responses of aggregated axons: a clear temporal signal. However, this ability to
show phase locking to the stimulus disappears once frequencies exceed 4-5 kHz. Thus, information in
temporal rate codes would not be conveyed to the brain for the frequencies of 5-20 kHz.

24
Q

Pitch comparision in simple tones

A

We can posit theories of pitch in terms of sensory signalling relating to place on the BM or in terms of pitch
relating the temporal firing rate pattern.
How can we test these ideas? Moore (1973) conducted a classic experiment that took the limitations of the
temporal rate code (<~4 kHz) as its starting point. He asked how well listeners could tell the difference
between two pure tones of similar frequency. He found that this ability changed with frequency – listeners
were best in the range 0.5 – 2 kHz. Once sounds exceed 5 kHz listeners got much worse – suggesting that
tone frequency discrimination falls over (the scale is logarithmic!) once temporal rate code information is
unavailable.
A similar conclusion was reached by Attneave & Olson (1971) – they asked listeners to identify the melody
conveyed by a sequence of pure tones – they found that performance was poor for sounds above 4 kHz.
This again suggests the temporal rate code signals may be critical to perceive musical intervals.
Pitch and frequency are quite easy to equate when using pure tones, however adding harmonics makes it
more complicated. To measure pitch, we can use pure tones as a yardstick. For instance we can run the
experiment where a test sound is presented to listeners, and they then adjust the frequency of a pure tone
so that it matches the pitch of the test sound.
If we present a 100 Hz pure tone, they’d adjust the
comparison so that it was 100 Hz more or less. Similarly,
a 300 Hz pure tone would sound like / be matched to a
300 Hz pure tone. So far, so simple

25
Q

Pitch comparision in complex tones-the Fundamental frequency,Periodicity Pitch

A

Now, imagine we present a complex tone, one consisting
of a 100 Hz pure tone fundamental and its first 5
harmonics (i.e., 200, 300, 400, 500, 600 Hz) tones
presented together, what would they hear? The lowest?
The highest? The average? Listeners would say that this
complex tone matched a 100 Hz tone – i.e. the pitch is
determined by the Fundamental frequency.
What about if we mess around with the harmonic series?
Imagine removing the 300 Hz pure tone from the
complex tone. It wouldn’t matter – the pitch would still be
heard at the Fundamental.
Now things get more interesting. If we presented a complex tone consisting of 3 tones: 200, 400, 600 Hz –
listeners would match this to a pure tone of 200 Hz, because in this case the Fundamental would be 200 Hz.
But, if the complex tone instead was 200, 300, 400 Hz, they would match it to a pure tone of 100 Hz!
In this case the spacing between the harmonics (100 Hz) indicates that 200 Hz is not the fundamental
frequency of the sound because 300 Hz is not an integer multiple of 200 Hz. Thus the auditory system
resolves this apparently odd input by reasoning that the largest valid Fundamental (i.e. highest common
devisor) is 100 Hz, giving rising to a pitch at 100 Hz. This is called the Periodicity Pitch or the pitch of the
missing fundamental.

26
Q

theories of pitch in terms of sensory signalling relating to place on the BM (1)or in terms of pitch
relating the temporal firing rate pattern (2)

A

(1) Not really, A pure place-based account of pitch has difficulty with
these sounds – they indicate that pitch can be heard for
locations on the BM for which there is no significant
vibration energy. The key determinate of pitch (at least
below 4 kHz) would appear to be conveyed by temporal
signals.
The octave is a doubling of frequency, but notes separated
by an octave or two sound similar. Also, given that tones
with different frequency components can have the same
pitch sheds doubt on the place theory of pitch perception.
It is thought pitch perception is most likely occurring in
more central auditory structures, past the cochlea.

(2)YES, Moore (1973) conducted a classic experiment that took the limitations of the
temporal rate code (<~4 kHz) as its starting point. He asked how well listeners could tell the difference
between two pure tones of similar frequency. He found that this ability changed with frequency – listeners
were best in the range 0.5 – 2 kHz. Once sounds exceed 5 kHz listeners got much worse – suggesting that
tone frequency discrimination falls over (the scale is logarithmic!) once temporal rate code information is
unavailable.

27
Q

Is loudness depended on change of firing rate or number of recruited neurons?

A

Both encoding schemes have merit. Psychophysical loudness matching experiments (e.g.,
adjust the amplitude of this 1000 Hz pure tone so it matches the loudness of a test sound) reveal ‘isoloudness’ contours that demonstrate that loudness is (at least) a function of frequency and sound pressure
level. The relative contributions of firing rate and number of neurons firing may well change at
different frequencies, just as the ability to sense pitch may depend on both place and rate type codes
for different frequencies.

28
Q

Psychoacoustics and deceiving the Ear

A

The relationship between the physical stimulus (that is the sound wave) and the perception of the sound, is
also influenced by such things as expectation and experience. It has been shown that when an entire word is
missing from a sentence but replaced by a cough sound, people report hearing the missing word. This is
obviously a more central auditory perceptual effect. The phenomena of the missing fundamental was
manipulated by R. Shepard to create an auditory illusion of an ever decreasing or ascending tone.

Context effect in speech recognition in language

29
Q

Where does information go on leaving the cochlea?

A

We previously considered the different types of
connections leaving the hair cells of the cochlea. These
ganglion cells send axons through the auditory nerve to
terminate at the cochlea nucleus. From there the main
connections are with the Superior Olivary Complex
(SOC) in the brain stem. This 2nd level of auditory
processing integrates signals from the two ears. This
makes quite a marked contrast with the anatomy of the
visual system where inputs from the two eyes are not
integrated until the cortex. From the SOC, the main
projection is to the Inferior colliculus (involved in
localisation), then to the Medial Geniculate Nucleus and
finally projecting to the primary auditory cortex. The
binaural combination of signals occurs at all stages of
processing from the SOC up. The tonotopy that we
discussed in relation to the Basilar Membrane and auditory
nerve fibres is preserved in the ascending pathway such
that the primary auditory cortex is tonotopically organised
in a way that mirrors the BM.

30
Q

How can we localise events using the ears?

A

The main cues we use for localisation of a sound object
come from inter-aural disparities. (interaural timing differences+ interaural intensity difference)
Because the ears
sit on opposite sides of the head, they sample the world
at different places, which means that they ‘hear’ slightly
different things. We can describe the middle of the
head as the intersection of the interaural axis (a
geometrical construct of a straight line that passes
through both ears) and the midline of the head. We
can construct a sphere centred on this point; thisdescribes all the places that are equally distant from the
centre of the head. Using this spherical coordinate system,
we can describe any position in space in terms of its
direction from the listener (azimuth and elevation) and its
distance. Here we will focus on cues to direction only.
Using this coordinate system, you will see that sound
sources located away from the midline are at different
physical distances from the two ears. Because the ears of
an adult are about 18cm apart there are implications for the
intensity of sounds reaching the ears and the time of
arrival.
The small distance between the ears, of itself, will not
introduce a noticeable difference in sound intensity.
However, sound is blocked by the head, meaning that the
head casts an acoustic shadow. In particular, the ear on the side of the head away from the sound source
will receive a less intense sound because the head gets in the way, reflecting and diffracting the
approaching sound energy. The magnitude of this shadowing effect depends on the frequency of the
sound relative to the size of the head. In particular, high frequency sounds get reflected easily by
obstacles, while low frequency sounds get diffracted around it.
This results in a greater shadowing effect for high frequency sounds than for low frequency sounds.

31
Q

IID and ITD

A

Lord Rayleigh noticed the potential for different distances for the two ears to give rise to differences in the
timing of signals. He tested this idea with two tuning forks, finding that timing differences alone were
sufficient to evoke an impression of auditory location:
interaural timing differences. He proposed that listeners
use different cues for different frequencies of sound - an
idea referred to as the duplex theory (1907). The two
cues are complementary in providing signals for
localisation at different parts of the sound frequency space.
In particular, timing differences are likely to be useful for
low sound frequencies, while intensity differences are
useful for high sound frequencies. Timing differences are
unreliable at higher frequencies because of an aliasing
problem (the brain doesn’t know which sound peaks to
match between the ears), while as noted above, level
differences are unreliable for low frequency sounds
because there is little acoustic shadowing.
Theoretical calculations and physical measurements
confirm the differential reliability of these signals at different
frequencies. Further, psychophysical experiments using
headphone presentation allows the two cues to be studied in
isolation - demonstrating that both support very fine
discriminations of auditory location: for pure tone stimuli
located directly in front of listeners, differences of ~ 1 degree
can be detected. For ITD stimuli, this is a 10 μsec time
difference; for IID stimuli, this is a 0.5 dB difference. The cues
can be combined together or put into opposition (termed ‘cue
trading’). The percept of sound direction for conflicting cues
depends on the frequency of the stimuli. While we can think
about these different types of information for different
scenarios, it is quite likely that we regularly experience a
mixture of IIDs and ITDs as we encounter broadband sounds
that have both high and low frequency components (e.g. speech) in our daily lives.

32
Q

What are the neural mechanisms that support sound localisation?

A

Work in Barn Owls suggests that different portions of the Superior Olive Complex (the first stage of binaural
combination) play important roles in binaural processing:
Medial Superior Olive: measures timing differences individual cells appear to prefer particular delays
between the left and right ears.
Lateral Superior Olive: measures intensity differences. Contrasts the firing rates of inputs from the left and
right cochlear nuclei.
The timing differences that humans are able to use are tiny! What could be responsible for this exquisite
sensitivity? In the early 1950s a psychophysicist called Lloyd
Jeffress proposed a model of how timing comparisons might
be signalled based on interconnections between the left and
right ears to detect coincidence. His suggestion was based
on the idea that signals from the contralateral cochlear
nucleus would have to travel further, and neural transmission
has a finite speed. The model proposes a variety of different
axon lengths to meet the signal coming from the contralateral
ear. These different path lengths equate to different relative
timings between the ears. The coincidence detector will fire
only when it receives concurrent inputs from the left and right
ears, where this simultaneity of inputs depends on the neural
path length, thus signalling the azimuth of the sound source.
Subsequent investigations have revealed that the anatomical
structure of the Medial Superior Olive is remarkably like the wiring diagram predicted by Jeffress.

33
Q

How to deal with the ambiguity of the cone of confusion

A

The cues to distance we have discussed are, by themselves, ambiguous. IID and ITD suggest a relative
difference in lateralisation that could exist within a cone of different solutions. i.e. the sound source could be
at any position within this cone, and generate exactly the same IID and ITDs. How does the brain deal with
this ambiguity?
One simple way is by moving the head. This provides more
information (with a real sound source, not when using
headphones!) and can disambiguate the direction of the
sound based on an intersection of constraints.
Another valuable source of information comes from the
patterns of reflections, filtering and resonance that come from
the outer ear (pinna cues), as well as the shape of our
heads and upper torso. Together these physical properties of
the listener define the Head Related Transfer Function.
This complex filtering means that sounds originating from
different physical locations have different spectral properties.
A listener calibrated to the filtering properties of their own
auditory system can use this information to localise sounds.
These cues greatly help with judgments of sound source
elevation, but can also contribute to azimuth localisation
(e.g., some individuals who have unilateral hearing
impairment (i.e. are deaf in one ear) can localise sounds
well based on these cues).
Batteau (1967) demonstrated the importance of Pinna
cues, by contrasting the experience of sound via
headphones for a normal stereophonic recording, and a
recording made where the microphones were encased in
casts of a listener’s outer ear. Normal stereophonic
recording gave rise to listeners perceiving the sound
source to be inside their heads. By contrast, recordings
made inside casts of the pinnae were perceived as coming
from outside the head. Similar observations have been
made with full measurements of the HRFT. Other evidence
for the importance of the pinnae was provided by Gardner
and Gardner (1973) who showed that azimuth discrimination was much poorer if the pinnae were filled to
remove their filtering effects.

34
Q

Other benefits from having two ears

A

Simply hearing the same thing with two ears can have benefits. If the noise associated with each ear is
independent, then some of the noise will cancel out, meaning that signals are more detectable: i.e., same
signal but different noise boosts performance. These improvements can be up to a factor of √2.
If the noise associated with each ear is the same, however, it helps if the signal presented to each ear is
slightly different. In particular, a signal that is inaudible because it is masked by noise can become audible if
either (1) the phase of the signal is changed in one ear; or (2) the signal is turned off in one ear!! This is likely
due to the mechanisms of spatial hearing. In these cases, there exists a disparity between the signal and the
background noise, such that they will be attributed to different physical locations in space. This phenomenon
is often referred to as the cocktail party effect, where individual voices can stand out from the background
noise because the brain processes them as originating from different physical locations.

35
Q

The Jeffress model

A

The timing differences that humans are able to use are tiny! What could be responsible for this exquisite
sensitivity? In the early 1950s a psychophysicist called Lloyd
Jeffress proposed a model of how timing comparisons might
be signalled based on interconnections between the left and
right ears to detect coincidence. His suggestion was based
on the idea that signals from the contralateral cochlear
nucleus would have to travel further, and neural transmission
has a finite speed. The model proposes a variety of different
axon lengths to meet the signal coming from the contralateral
ear. These different path lengths equate to different relative
timings between the ears. The coincidence detector will fire
only when it receives concurrent inputs from the left and right
ears, where this simultaneity of inputs depends on the neural
path length, thus signalling the azimuth of the sound source.
Subsequent investigations have revealed that the anatomical
structure of the Medial Superior Olive is remarkably like the wiring diagram predicted by Jeffress.