Audition and cross-modal integration Flashcards
What is sound?
- characteristics of sound
- complex sounds
Sound = longitudinal wave (movement parallel to propagation), carrying information
- air molecules uniformly distributed, vibrating objects disturb this, causing them compression (due to high pressure) and rarefaction (low pressure)
- vibration frequencies tell you about physical properties of object
- Damping = sine waves getting smaller over time - amplitude of vibration decreases proportionally to distance travelled^2 (inverse square law)
Complex sounds vs pure tones:
- pure tones have 3 parameters (A, f, phase)
- complex tones - two or more simple/pure tones –> fundamental resonant frequency + harmonics (overtones) of that frequency
- Fourier analysis –> any signal can be described as sum of simple sine waves - frequency + phase parameters where sound is concerned (BUT: assumes infinite time - audition = finite time window + limited frequency bands)
How does sound get to the ear?
- ear drum, ossicles, cochlea, basilar membrane
- Outer ear channels vibrations - emphasises speech-relevant frequencies
- vibrations –> tympanic membrane (ear drum)
- ear drum – ossicular chain –> cochlea (impedance matching device = vibrations to larger amplitude movements of oval window)
- ear drum has higher area than stapes –> same force over smaller area - amplifies movement
- stapes moving applies pressure to perilymph - causes basilar membrane to move
- BM - tonotopically organised, vibration causes shearing motion between organ of corti sitting on BM + tectoral membrane –> causes movement of stereocilia of hair cells
Frequency filtering
- auditory system as overlapping band-pass filters
- masking
- psychophysical tuning curves
Auditory system can hear 20Hz - 20,000Hz
- most sensitive at 1kHz-8kHz
- can measure minimum detectable sound (can detect 50% of time) to work out thresholds
Band pass filters –> block extreme, only allow central frequencies through
Masking = one sound made less audible due to presence of another sound
- Fletcher (1940) –> measured frequency range where masking signal interfered with pure tone detection - critical band provides estimate of width of perceptual auditory filter
BUT: assumes rectangular bandpass filters, not overlapping - a simplification
Psychophysical tuning curves (Vogten, 1974)
- tested filtering of pure tones
- reveals shape of perceptual auditory filters (inverted-U shape –> not rectangle)
- filters maybe wider by using more than one channel - off-frequency listening
- pure tone presented at low level to target a single auditory channel
Basilar membrane
- how does it function as a bandpass filter?
Tonotopically organised
Place code:
- Helmholtz - different locations resonate at different frequencies (neurons respond to this placement)
- von Bekesy (2017) - drilled holes in cadavers + presented frequencies to see where vibrated - max vibrations are at different places for different frequencies
Temporal code:
- BM vibrates as frequency of input, sending synchronised neural firing to brain,
- the rate of firing indicates the frequency (greater frequency =greater firing)
Plack (2013) - likely it is combo of temporal + place
Hair cells + phase locking
- outer hair cells
- inner hair cells
- phase locking
Outer hair cells (12,000) –>cochlea amplifiers - amplify input vibrations
- greatest vibration for low input sound levels (<35dB), max. is 90dB
- active process of amplification introduces frequencies not present in input (otoacoustic emissions)
- Kemp (1978) - otoacoustic emissions helps identify if BM working properly -stimulate with pure tone and analyse output
Inner hair cells (3500) –> fast transmission of sound information to brain - one hair cell to many ganglion cells (~20) mapping
- cochlea nerve tonotopically organised - each axon most responsive to a characteristic frequency
- Types of axon:
high spontaneous rate - respond to quiet sounds (<40dB)
medium + low spontaneous rate - don’t respond until 20dB + saturate at 80dB
- successive hair cells differ in frequency by 0.2%
Phase locking: firing of ganglion cells at peak amplitude (for specific frequencies - low ones)
- maximum set firing rate at 1000Hz - need population response/coding to get to 20KHz
- Wever + Bray (1927) - higher rates can be signalled if axon outputs are pooled (Wever’s volley principle)
Pitch and loudness:
- pitch (place vs temporal)
- loudness (firing rate vs number)
Pitch = sounds organised on musical scale from high to
low
- for complex tones - pitch stay sthe same even if different harmonies OR fundamental frequency alone removed
- related to frequency - high freq. will have high pitch
Place = related to place of max response on BM - represented neurally
BUT then you’d expect no performance change from 4kHz+
- Moore (1973) -much worse at distinguishing pitch from 5kHz+
- Attneave + Olson (1971) - frequencies can be discriminated 4kHz+ but pitch can’t clearly
Temporal - pitch related to time intervals between APs
BUT: can’t hold from 4kHZ+ because no phase locking
Loudness = sounds ordered from a scale from quiet to loud
- related to amplitude - higher A = louder
Firing rate –> louder sounds = higherfiring rate
Number of neurons –> louder sounds linked to more neurons
- the relative contributions of each may change with frequency
BM + nerve encoding reflects patterns of activity with both spatial and temporal features
Cochlea to brain
ganglion cells - [via auditory nerve] -> cochlea nucleus –> superior olivary complex (in brain stem) - integrates info from both ears –> inferior colliculus (for localisation) –> medial geniculate nucleus –> primary auditory cortex
tonotopy is preserved
Localisation of sounds
- IID
- ITD
Inter-aural intensity differences - due to shadow of head (acoustic shadow) - louder at ear near source
Inter-aural timing differences - time interval between sound entering one ear and another (intensity drops over distance)
- head reflects + diffracts sound –> high f = more easily reflected (greater shadowing); low f = more easily diffracted (effect not so big)
-greatest at 90 degrees azimuth (position of sound relative to listener) - degrees measured from inter-aural axis (between ears) - Lord Rayleigh noticed ITDs - used timing+ intensity to
localise sounds - looking at delay between tuning fork + detection in ears
Low f = unreliable (little shadowing), high f =ambiguous (above 1500Hz) -
aliasing problem - which peaks to match between ears
Rayleigh’s duplex theory (1907): ITD for low F, IID for
high F
Bregman (1990) -many things happening around us, need to keep track on many levels
- grouping with temporal neighbours, or grouping based on frequency range
- fundamental f treated as a distinct sound
- grouped based on synchrony of onset/offset, same spatial location
How is the auditory system so sensitive to location of sound in space?
- Jeffress
Jeffress model (1947) –> neural transmission has fixed speed
- signals from contralateral cochlear nucleus travel further
- a variety of axon lengths to meet incoming signal
- different paths equate to different timings between ears
- coincidence detector fires if concurrent inputs from L+R ears
anatomicals structure of medial superior olive similar to the wiring diagram he suggested
How do we avoid confusion?
ITD + IID exist within cone of confusion -can’t tell if front/behind or above/below
- Head movements - create disparity, shifts ear position in space
- pinna cues, head shape +upper torso help –> head related transfer function - complex filtering means different sounds from different spaces have different spectral properties
- Pinna cues
- Batteau (1967) - normal recording vs recording made where microphones encased in casts of outer ear - normal recording in headphones - sounds in head, if was made in pinna - sounds like coming from outside
- Gardener + Gardener (1973) - azimuth discrimination worse if pinna filled - removes filtered effects
- interference effects - created by pinna so sounds from different directions modified in a unique way
- Hoffman et al., (1998) - wax in ear for weeks - localisation perturbed, improves over time (adjust) then back to normal when removed
- cocktail party effect - noise associated with each ear is different, so can pick out individual voices based on location
auditory signals > visual signals
TEMPORAL (audition has better temporal resolution):
Sekuler et al., (1997) - bouncing ball - if no sound - cross diagonally, if audiovisual condition - appear to bounce (greatest effect if more synchronous)
Shams et al., (2000) - sound-induced flash illusion –> if 2 beeps, 1 flash - perceive 2 flashes; 3+ beeps + 1 flash - perceive more flashes but it does plateau
visual signals > auditory signals
SPATIAL (vision has greater spatial resolution):
Ventriloquist effect - sounds like coming from dummy (‘captured’ by mouth)
Cinemas (Altman, 1980) - sounds like coming from actors
Prior knowledge and multi sensory integration
Bayesian estimation:
- calculate unknown probability of event given known probabilities
- e.g. reliability of modality
- probability + prior knowledge
Perceptual judgement included previously acquired knowledge of statistical structure of world (e.g. light from above assumption)