hearing, music and speech Flashcards
onset discrimination
Listeners can detect very brief differences in timing between the two ears. At the best frequencies (around 1000 Hz), some listeners can detect differences as small as 10 μs.
medial superior olive (MSO)
A relay station in the brainstem where inputs
from both ears contribute to detection of the interaural time difference.
first place in the auditory system where input from both ears converges
firing rates of neurons hear increase in response to very brief time differences from the two ears of cats
The properties of the ILD relevant for auditory localization are similar to those of the ITD:
sounds are more intense at the ear that is closer to the sound source, and less intense at the ear farther away from the source
the ILD is largest at 90 and -90 degrees. It is nonexistent at 0 degrees (directly in front) and 180 degrees (directly behind
between these two extremes, the ILD correlates with the angle of the sound source, but because of the irregular shape of the head, the correlation is less precise than it is with ITDs
spatial hearing and blindness
many studies have shown that severe loss of vision can result in improved auditory perception of localization in sounds in space
region of the visual cortex is recruited to process auditory inputs when visual inputs are no longer available
attack
the part of a sound during which amplitude increases (onset)
the way a complex sound begins
decay
the part of a sound during which amplitude decreases (offset)
the way a complex sound ends
auditory scene analysis
processing an auditory scene consisting of multiple sound sources into separate sound images
good continuation
Gestalt grouping rule stating that sounds will tend to group together as continuous if they seem to share a common path, similar to a shared contour for vision.
acoustic startle reflex
The very rapid motor response to a sudden sound. Very few neurons are involved in the basic startle reflex, which can also be affected by emotional state.
rapid body movement following an abrupt sound - very fast
inattentional deafness
The failure to notice a fully-audible, but unexpected sound because attention was engaged on auditory stream.
chord
A combination of three or more musical notes with different pitches played simultaneously
absolute pitch
perfect pitch
a rare ability whereby some people are able to accurately name or produce notes without comparison to other notes
melody
A sequence of notes or chords perceived as a single coherent structure.
tempo
The perceived speed of the presentation of sounds.
syncopation
Any deviation from a regular rhythm.
rhythm
A repeated pattern of sounds comprised of strong and weak elements.
vocal folds
The pair of elastic tissues that vibrate due to airflow generated by lungs, depending on how close orapart and how tense or lax they are
phonation
The process through which vocal folds are made to vibrate when air pushes out of the lungs
respiration and phonation
to initiate a speech sound, air must be pushed out of the lungs, through the trachea and up to the larynx.
the diaphragm flexes to draw air into the lungs, and elastic recoil forces air back out
at the larynx, air must pass through the two vocal folds, which are made up of muscle tissue that can be adjusted to vary how freely air passes through the opening between them
three basic components of speech
respiration (lungs)
phonation (vocal folds)
articulation (vocal tract
vocal tract
the area above the larynx
the oral tract and nasal tract combined
articulation
The act or manner of producing a speech sound using the articulators—vocal tract structures including the mouth, tongue, soft palate, and jaw
manipulation of mouth structures
resonator
Most objects such as musi- cal instruments and vocal tracts are resonators because, due to their shape, they increase amplitude at some fre- quencies, called resonant frequencies, compared to other frequencies.
changing the size and shape of the space through which sound passes increases and decreases energy at different frequencies
formant
A resonance of the vocal tract. Formants are specified by their center frequency and are denoted by integers that increase with relative frequency.
labeled by number, from lowest frequency to highest
categorical perception
For speech
as well as other complex sounds and images, the phenomenon by which the discrimination of items is little better than the ability to label items.
Loudness
the psychological aspect of sound related to perceived intensity or amplitude
Pitch
the psychological aspect of sound related mainly to the fundamental frequency
Timbre
the psychological sensation by which a listener can judge that two sounds with the same loudness and pitch are dissimilar
Conveyed by harmonics and other high frequencies
Localization
knowing where is the sound source
Duration
length of time the stimulus is presented
Density
hollow vs solid sound - if it has an echo
Dissonance
how well/poorly do the notes go together
Or/piano & kid running around
some are dependent of history & our experiences
Lower frequencies
Closer to oval window
Displace basilar membrane in apex of cochlea
Higher frequencies
Displace basilar membrane on base of cochlea
Farther away from oval window as frequency increases
Place code (place principle)
the frequency of a sound is coded by the place along the cochlear partition that has the greatest mechanical displacement
High sound will have displacement closer to base
Temporal code (frequency principle)
the frequency of a sounded is coded by the timing of neural firing as it relates to the period of the sound
problem : neurons always fire, has a refractory period- takes time to complete AP- meaning we could hear 1000hz only
Harmonic spectrum
the spectrum of a complex sound in which energy is at integer multiples of the fundamental frequency
Fundamental frequency
lowest frequency of harmonic spectrum
Also, the greatest common divisor of the component frequencies
Perceived pitch is determined by this
What happens when the first harmonic is missing?
The pitch listeners hear will still correspond to the fundamental frequency
Missing-fundamental effect
perceived pitch corresponds to the fundamental frequency, even if it is missing
Azimuth
the angle of a sound source on the horizon relative to a point in the center of the head between the ears- measure in degrees
Sound localization
Two ears: critical for determining auditory locations
That sound is closer to one ear than another
Sounds arrive slightly sooner at the ear closer to the source
Interaural time differences (ITDs)
Interaural level differences (ILDs)
Use a combination of cues
ITDs:<1600Hz
ILDs:>800 Hz
Interaural time differences (ITD)
the difference in time between a sound arriving at one ear versus the other
Know location because of which ear it arrives at first
Interaural time differences
Sound travels quickly
The variation there is in time it takes for a sound to reach each depends on where it comes from in space
the difference in level (intensity) between a sound arriving at one ear versus the other
sounds more intense at the ear closer to the source because the head partially blocks the sound pressure wave from reaching the opposite ear
Different positions around the head
Cone of confusion
A region of positions in space where all sounds produce the same ITDs and ILDs
Happens when sound comes from directly in front or behind
Same time and level intensity differences
Brain may struggle to determine the actual location of a sound
Perceptual phenomenon
Head movement in sound localization
cones of confusion are not the only cues for determining sound sources
Head movement: only one spatial location will be consistent with the ITDs and ILDs before and after the movement
Spectral cues in sound localization
Directional transfer function (DTF):
A measure that describes how the pinna, ear canal, head and torso, change then intensity of sounds with different frequencies that arrive at each ear from different locations in space (azimuth and elevation)
Pinna have a complex shape and can funnel certain sound frequencies more efficiently than others
Intensities of frequencies vary because of direction of sound
Size and shape of one’s body can impact which frequencies reach the ear more quickly
A measure that describes how the pinna , ear canal, head and torso change the intensity of sounds with different frequencies that arrive at each ear at different points in space
le/imp. knowing live music sounds different than listening through headphones-earbuds bypass pinna
Auditory Distance Perception
We are better at judging auditory direction, but not how far away something is
relative intensity of the sound
spectral composition of sounds
relative amounts of direct vs. reverberant energy
ITDs, ILDs, and DTFs do NOT provide much information about distance when sound is >1m away
relative intensity of the sound
If there are 2 identical sounds we are better at perceiving - but requires us to make assumptions
Bullfrog louder=closer(assumption)
Inverse square law
As distance from a source increases, intensity decreases faster
Intensity is proportional to the inverse of the squared distance
The effectiveness of relative intensity decreases as the distance increases
Sounds farther away do not seem to change direction in relation to listener- as much as nearer sounds do- like motion parallax
spectral composition of sounds
Higher frequencies decrease in energy more than lower frequencies with distance (atmospheric absorption,objects)-judge signal as coming from farther away
Sound absorbing qualities of air dampen high frequencies more then low frequencies
So when sound sources are farther away, higher frequencies decrease in NRG more then low frequencies
Farther away the sound source the more “muddier” it sounds
Change noticeable only for large distances
Relative amounts of direct vs reverberant energy
Sounds are some combination of this
direct vs reverberant energy
Relevant amount of reverberant energy decreases with distance (or direct energy decrease with distance)
We are poor at knowing how far a sound is
direct energy
arrives directly from source, when a sound is closer most NRG is direct
reverberant energy
has bounced of services in the environment, provides a greater proportion of the total when farther away
Tone height
a sound quality corresponding to the level of pitch
Monotonically related to frequency
Tone chroma
a sound quality shared by tones that have the same octave interval- related to the octave
Each note on the musical scale (A-G) has a different chroma
Octave
the interval between two sound frequencies
having a ratio of 2:1
When one of two periodic sounds is double the frequency of the other, the two sounds are one octave apart
Example:
Fundamental frequency
(C3 =130.8 Hz }
x2 (C4 = 261.6 Hz }one octave
x2 (C5 = 523.2 Hz }one octave
Music and Cultural Differences
Some relationships between notes, such as octaves, are universal
Musical scales vary widely across cultures
Different notes within an octave (e.g., 7 vs 5)
Estimates of intervals between notes
across correspond to the music scale from their culture
Six-month-old infants detect inappropriate notes in both scales but US adults only detect deviations from the Western scale
Source segregation (auditory scene analysis)
processing an auditory scene consisting of multiple sound sources into separate sound images
Spatial sound source
Sounds that came from the same area are treated like they are coming from the same source
If sounds are moving in space they can be more easier to separate (same for if the listener moves)
Temporal sound source
Sound components beginning at the same time are treated as coming from the same source
Helps group harmonics into a single complex sound
Onset sound source (common fate)
Gestalt grouping rule stating that the tendency of sounds to group together will increase if they begin and/or end at the same time
Auditory stream segregation
the perceptual organization of a complex acoustic signal into separate auditory events for which each stream is
heard as a separate event
Dividing the auditory world into separate auditory objects
Challenge with competing sound in the environment
Spectrogram
a pattern for sound analysis that provides a three- dimensional display plotting time on the horizontal axis, frequency on the vertical axis, and intensity in colour or gray scale
How do humans recognize speech?
Variability problem- when we hear speech sounds it may not be the same when we hear it next
Speech is the basis of human language- connects us with other people
Lower pharynx makes it easier for humans to choke on food
Acoustic-phonetic invariance
There must be some constant set of acoustic features associated with each perceived phoneme- depends on prior phonemes
NO: there is a lack of invariance
Varies with context
We perceive sounds based on relative changes in the spectrum (spectral contrast)
Phoneme- smallest unit of speech, allows us to tell the difference between different words
Coarticulation
the influence of one phoneme on the
acoustic properties of another, due to the articulatory movements required to produce them in sequence
The overlap of articulation in space and time
Speech sounds are most described in terms of articulation
Speed production is very fast: 10-15 consonants and vowels per second
Inertia prevents tongue, lips, jaw, etc. from moving too fast
Experienced talkers position tongue, etc. in anticipation of next consonant or vowel, causing coarticulation
spreads out vowel and consonant information to aid understanding
Some other sources of variation
Noise
Sloppy pronounciation
Did you go to the store?” > “Dijoo…?”
Accents
Gender (male: 80-200 Hz, female:150-320 Hz)
Speed-rate at which we talk
Motor theory of speech perception
motor processes used to produce speech sounds are used in reverse to understand the acoustic speech signal
Evidence in favour of the motor theory
the McGurk Effect- auditory and visual info can differ and interfere da,ga,ba
Evidence against the motor theory
Speech production is just as complex, so appealing to production to understand perception doesn’t help much
Nonhuman animals can respond to human speech signals, despite not being able to produce them
Categorical Perception
Manipulate sound stimuli to vary continuously from ‘bah’ to ‘dah’ to ‘gah’
Instead of perceiving gradual, continuous changes, people perceive sharp categorical boundaries between the stimuli
Listeners report hearing differences between sounds only when those differences would result in different labels for the sounds
Multiple Acoustic Cue
We do NOT need individual acoustic invariants to distinguish speech sounds
Integrate multiple sources of information to recognize patterns
E.g., visual, onset, duration, frequency, etc.
Experience
Meaning and Phoneme Perception
What do you hear?
Phonemic restoration effect
There was time to *ave… - replace with consistency of the context
rave? save? wave? shave?
How do humans recognize speech
Variability problem
Segmentation problem
How do we segment sound into words?
Look for breaks in sound stimulus?
Tak by jste nevedeli kde jedno slovo konci a druhe slovo zacina
Phoneme co-occurence
Meaning and Segmentation
What do you hear?
An American delights in simple play things
Anna Mary candy lights since imp pulp lay things
I’m blue song…
Backwards records
Context matters
Relative differences matter, not absolutes
Meaning, expectation, experience are powerful
Development of speech perception
Begins in utero
Newborns prefer hearing their mother’s voice
over other women’s voices
Four-day-old French babies prefer hearing French over Russian
Newborns prefer hearing children’s stories that were read aloud by their mothers during their third trimester of pregnancy
Development of speech perception
Infants
Begin filtering out irrelevant acoustics
Example: ‘r’ and ‘l’ are not distinguished in Japanese