Audition Flashcards
Lecture 1 content
This lecture is about the physical events that underlie our ability to hear. We’ll be addressing the following
questions:
* What is sound?
* How do we measure and characterise sounds?
* How does sound reach the ear?
* How do physical vibrations get translated into nerve impulses?
Here are some of the technical terms for the concepts we will cover:
* Simple harmonic motion and the resonance properties of physical materials
* Fundamental frequencies (F0) and harmonics (overtones)
* Propagation of sound in the air
* The Pinna and Ear Canal
* The Tympanic membrane and the Ossicles (malleus, incus, stapes)
* The Cochlea: oval window, round window, Basilar Membrane, Organ of Corti
* Inner and Outer hair cells, Tectorial Membrane
* The transduction of vibration energy into electrical responses
Lecture 2 content
This lecture is all about how sound information is encoded in the cochlea and how this relates to our
perception of sound. We’ll address the following questions:
* How does the basilar membrane work in response to sounds?
* How do hair cells encode and relay sound information?
* How is the information encoded by the cochlea used to determine pitch and loudness?
Here are some of the concepts we will cover:
* Travelling waves on the Basilar membrane
* Spectral decomposition by the cochlea
* Outer hair cells, active amplification and otoacoustic emissions
* Inner hair cells and their afferent connections
* Place-based encoding along the basilar membrane
* Phase locking of auditory ganglion cells
* The volley principle for conveying sound timing information
* Place vs. time codes for pitch
* Missing fundamental sounds
* Rate and number of neuron encoding schemes for loudness
Lecture 3 content
In the previous lectures we considered ideas related to the properties of sound sources (e.g. pitch, timbre).
This lecture concerns how sound information is used to locate sources. We’ll address the following:
* How does encoded sound information progress from the cochlea to the cortex?
* How are binaural (i.e. from two ears) signals used for localisation?
* What are the neural circuits involved in extracting binaural signals?
* What is the role for monaural information?
Here are some of the concepts we will cover:
* The ascending auditory pathway: cochlea nucleus, superior olivary complex, inferior colliculus,
medial geniculate nucleus, auditory cortex
* Binaural processing in the brain stem and the preservation of tonotopy
* Spherical coordinates (azimuth, elevation, distance) and binaural sound cues
* Sound intensity with distance, acoustic shadows and interaural intensity differences (IIDs)
* Timing cues to lateralisation and interaural timing differences (ITDs)
* Rayleigh’s duplex theory and frequency dependency
* Difference coding in the Lateral vs. Medial Superior Olive and Jeffress’ coincidence detection
* Pinna cues and the Head Related Transfer Function (HRTF)
What is sound?
Objects in the world have physical properties relating to their size, shape, material composition and location.
Our ability to recognise and interact with these objects depends on the brain making sophisticated deductions
(inferences) based on the information delivered by the sensory organs (eyes, ears, etc.). Through experience
and evolution, these inferences are shaped by the information contained in the signals themselves and our
knowledge of our own sensory transduction processes.
Imagine you are sitting in your house. You hear a door slam shut followed by an electronic beep. The sound
information reaching your ears allows you to work out:
The sound came from outside. [Where]
The slam happened before the beep sound. [When]
The door was a car door. The beep was the locking mechanism.[What]
It was someone arriving in their car. [What]
The sound information by itself would not allow you to work out, for instance: the colour of the car that arrived
or whether the person was left or right handed. Why are some inferences possible (Type of door) and others
not (e.g. colour)? You (and your brain) know (implicitly if not necessarily explicitly) that the sound vibrations
reaching your ears caused by the door relate to its location, weight (mass), elasticity (stiffness), physical
dimensions and shape, material composition (e.g. wood vs steel).
To summarise, when things happen to objects they can often emit sound (vibration) energy, and different
objects emit sound energy which reflect their own physical characteristics.
Simple Harmonic motion of a mass and spring system
What causes the sound?
Physical objects have mass and a certain amount of
elasticity (even glass bends a bit). If a force is applied to an object, it will bend until counteracting forces push it back (or it breaks). The action of the incoming force and counteracting force can be thought of (modelled simply) as a mass-spring system. Where the mass of the object and the stiffness of the spring determine the frequency of vibration. The act of applying force to the object will put it into vibration, but the vibration frequency of the object is mostly determined by its physical properties. For example hitting a drum and striking a bell can be done with the same
percussive action, but they will not sound alike. The
vibrations come from the object more than from the thing that hits the object.
A sine wave can be used to describe the vibrating behaviour of many objects. The ‘standard’ sine wave, examples as seen in the simple harmonic motion diagram above – can be specified using just 3 simple parameters (amplitude, frequency and phase). With these we can describe any possible sine wave.
The Frequency and Amplitude of a vibration are related to the pitch of a sound and its loudness.
A sine wave is a good description of real vibrations, and a single sine wave (a pure tone) gives rise to a particular
sound.
However, when most real objects vibrate, they
produce vibrations at many frequencies. To illustrate this, we can think about the vibrations that are produced if we pluck a guitar string. The string will vibrate back and forth along its whole length at a frequency determined by its
tension, mass and stiffness. However, while the whole
string has a particular mass, it is useful to think of the string as being composed of two halves, each of which has half
the mass and therefore vibrates at twice the frequency. In
fact, it’s not just half the string, we can think of thirds,
quarters etc of the string, each of which will vibrate at
higher frequencies related to integer (i.e. whole number)
multiples of the basic frequency of the string. More
formally, real objects tend to vibrate at a Fundamental
resonant frequency (F0) and at harmonics (overtones)
of that frequency: 2 x F0, 3 x F0, 4 x F0, and so on. When
these vibrations happen, they do so at all once, so the
sound that we hear corresponds to the combination of the
Fundamental and its higher harmonics.
Simple strings are easy to visualise and understand as the
vibration is in only one dimension. But real objects can
vibrate in two or three dimensions. For example, plucking
a guitar string causes the body of the guitar to vibrate
along its length, side to side, and front to back. These
vibrations happen along all of these dimensions at once,
at the fundamental and higher harmonics. This complex
pattern of vibrations is expressed in the Frequency
Spectrum for a given sound. This spectrum, together with
the Amplitude envelope of the sound is responsible for
the different sounds made by objects. The envelope is
described as having an onset, steady state, and an offset.
The onset plays a major role in how an instrument sounds.
For example, the same musical note played on a bell and
a piano produce different amounts of vibration energy at
different frequencies that start at different times and last
for different amounts of time. This difference in the way
things sound is referred to as the timbre (or texture) of a sound.
How do we interpret complex real sounds?
The insights of a famous French Mathematician come to our aid. Joseph Fourier (1768-
1830) wasn’t interested in sound, but the abstract problem of how heat energy flows through a solid object. Yet his basic research in experimental mathematics has had huge and lasting impact – it’s no
exaggeration that much of the technology we take for granted today originates from Fourier’s theory.
Fourier proposed that any signal, no matter how complex, could be described as the sum of a family
of simple sine waves. Thus, when presented with a complex signal, we can decompose it into a number of
simple consistent parts that depend on only 3 parameters: amplitude, frequency and phase.
What is frequency spectrum?
When using Fourier Analysis to decompose a signal into
its consistent parts we typically plot the Frequency and
Phase parameters of the signal for each frequency. Phase
conveys important information, but its interpretation is
complicated, so we will say no more about it and dwell on
the Frequency spectrum. We can think about the
Amplitude parameter as a quantitative value – it tells us
how much of that frequency is present.
A pure tone (single sine wave) is represented on a
spectrum by a single line that plots the amplitude of that
sine for that single frequency. A “complex tone” is defined
as an acoustic stimulus consisting of a combination of
pure tones – this gives rise to a spectrum with a series of
lines on it. When played, we hear the sound of all of the
components together, rather than hearing the individual
pure tones that compose the sound.
The harmonic series of a square wave
A classic example of Fourier decomposition is provided by
considering a square wave. The square wave has the
special property that it is composed of (an infinite) family of
sine components that are odd harmonics of the
fundamental (i.e., F0, 3F, 5F, 7F, … etc) and the amplitude
of each component decreases exponentially (i.e., 1, 1/3,
1/5, 1/7, … etc). While the spectrum continues to infinity,
we can have a reasonable approximation of the square
wave signal with a small number of components.
Continuous spectra
The above examples have discrete Fourier series (i.e. lines
in the spectra at individual frequencies). This is the case
when the source is oscillatory, as in a violin string. Other
sound sources can be more noise like. For example
environmental noises like wind or running water. A special
case of this is random signal fluctuations that you might
hear if you mistune an analogue radio (‘crackle’) – this
random noise is generally referred to as ‘white noise’ and
it has a spectrum that is flat (continuous horizontal line) –
i.e. it contains all possible frequencies of sound and all of
these frequencies have the same amplitude.
The limitations of a strict Fourier understanding
This understanding of Fourier analysis is based on the
assumption that the signals we have looked at, (the sine
wave, the square wave and white noise waveforms) go on for an infinite time to generate the spectra we see
above. Introducing the temporal changes (i.e. stops and starts) in the time domain requires additional
components in the frequency domain. Hence, a brief click is composed of the sum of frequencies over a wide
range. Sensory systems such as audition are understood to process signals over a finite time window, and
limited frequency bands, where a range of frequency signals are considered within each band, rather than
being limited to a single discrete frequency value.
In sum, Fourier analysis provides the scaffolding for understanding how sensory systems such as your ears
work, but strict Fourier analysis is not a biologically realistic model.
How does sound reach the ears?
A vibrating object causes the molecules in the medium
around it (e.g. air or water) to vibrate as well. This causes an
increase in the concentration of molecules (air pressure) in
one place (compression) and reduces it (rarefaction)
elsewhere. These pressure differences are not sustainable
because there is nothing to prevent the air in the highpressure region from moving toward a lower pressure region.
This causes a travelling wave through the air.
In an open space, sound waves travel out in all three
dimensions away from the source. The same amount of
vibration energy is therefore spread out over a greater volume
of space as we get further from the source. The amplitude of
vibration is decreased in proportion to the square of the
distance (inverse square law).
* In a free field an omni directional sound source radiates in all directions
Another property that changes with distance relates to the frequency content of a sound. When sound is
propagated through the air, energy is absorbed by air molecules as the wave is propagated through the air.
This affects high frequencies much more than low frequencies. For example, a strike of lightning sounds very
different depending on how close you are to the location of the strike. Nearby, it sounds like a loud ‘crack’ (it
has low frequency and high frequency sound energy). Further away, it sounds like a low ‘rumble’ of thunder –
most of the high frequency information was absorbed in the intervening air.
How are sounds characterised?
We covered the frequency content of sounds (above),
now we turn to the amplitude of the sound waves. This
is typically done in terms of the pressure that a sound
source produces. The human ear is sensitive to a huge
range of sound pressures. The lowest pressure level
we can hear is equivalent to about one ten millionth of
the pressure that is put on your finger by a penny. This
contrasts with some of the greatest pressures we can
sense which are a million, million times greater (i.e.,
1,000,000,000,000) than the least we can hear. To
cope with this huge range of values, it’s useful to use
a logarithmic scale (log 1 = 0; log 10 = 1; log 100 = 2;
log 1000 = 3; etc), and express a particular sound
pressure relative to the lowest pressure we can hear.
This ratio value is typically measured in decibels. The
‘standard’ decibel scale for quantifying sound uses a
minimum sound level of 20 micropascal, which is the
average value for the least people can hear (although
individuals differ). When we use this ratio value
pressure levels are quoted as “dB SPL” where SPL
refers to the ‘sound pressure level’.
How are vibrations translated into nerve impulses?
Vibrations are channelled by the outer ear (pinna and meatus) to the ear drum (tympanic membrane).
The air pressure differences between the two sides of the ear drum cause it to move in and out with the sound.
The next step is the complex ossicular chain of the
middle ear: the malleus, incus and stapes. The Malleus
is attached to the ear drum, with the stapes interfaces
with the oval window of the cochlea. This complex
chain of bones serves to act as an impedance
matching device. This is needed because the cochlea
is filled with fluid (perilymph) that has a much higher
impedance than air. By analogy, if you shout to a friend
swimming underwater, they are unlikely to hear because
the majority of the sound energy would bounce off the
surface of the water due to the water’s higher
impedance. So it is with the cochlea – if the ossicles are
damaged – sounds need to be very loud for a person to
have any chance of hearing them. The area of the ear
drum is much larger than the face of the stapes, meaning
that the ossicles concentrate the same force over a
much smaller area (think of it like a drawing pin – you
can push these into a hard wall because the point of the
pin is very small in relation to the area you push with your
thumb). The hinge-like ossicular chain also provides a
degree of leverage to further amplify the movement of
the stapes. This arrangement is quite complex, because
muscles interfacing with the ossicles allow a reflex to
partially disengage the stapes during loud sounds (the
acoustic reflex). This reflex is rather slow (it won’t
protect your ears from sudden loud sounds), but it can
help reduce damage to the cochlea in high sound
pressure environments. It also disengages when we
speak – probably accounting for why recordings of our
own voices sometimes sound odd.
Name the structure of cochlea / inner ear
The cochlea is a snail-like structure embedded in the rigid temporal bone. It is a spiral of ~ 2.5 turns, is ~ 3.5
cm in length and has an average diameter of ~ 2 mm. It is divided into 3 compartments: the scala vestibuli
(trans. “entrance steps”), the scala tympani (trans. “steps to the drum”), and scala media (trans. “middle
steps”). The scala vestibuli and tympani interconnect at the apex (top) of the cochlea at the helicotrema (trans.
“hole in the helix”), and are filled with perilymph (trans. “surrounding clear fluid”). The scala media is much
smaller and sits away from the centre of the spiral – it is filled with the potassium (K+) ion rich fluid endolymph
(trans. “inside clear fluid”).
When the stapes move, pressure is applied to the fluid
inside the cochlea causing it to move. This movement will
cause a movement of the Basilar membrane (B.M.) that
divides the scala vestibuli and scala tympani. As the
perilymph does not compress, the increased pressure
causes the round window to bulge outwards as the oval
window is deflected inwards. Different frequencies of
sound stimulation give rise to maximal vibrations of the
membrane at different locations (tonotopy). Low
frequency sounds cause maximal movement near the
apex of the cochlea, while high frequency sounds cause
displacement at the basal end near the stapes.
Selective frequency by two factors:
- The stiffness of the BM changes along its length – it is stiff at the base at floppy at the top (a high to low
stiffness gradient) - The perilymph has inertia – this means that it is much harder to produce the necessary accelerations and
decelerations for high frequencies than low frequencies. As such high frequency movements are attenuated
as distance increases away from the oval window (i.e., a low to high impedance gradient).