Speech Perception Flashcards
Speech
Complex acoustic stimulus used by most humans
Often essential for language and language development
Understanding of speech perception requires knowledge of:
Speech production
Language
Auditory system
Speech perception is a multifaceted and complicated topic
Cooper, Liberman & Borst (1951
Cooper, Liberman & Borst (1951)
Discovered that a two-formant pattern with proper F1 and F2 transitions elicited perception of stop-vowel syllables even without inclusion of a stop-burst in the signal
The /ba/-/da/-/ga/ Experiment
Cooper et al. (1951) asked What happens to listeners’ perception when the starting frequency of F2 is changed in small and systematic steps over a large range of frequencies?
They created a continuum of stimuli to investigate this question
Categorical Perception
Relatively continuous variation of the physical stimulus—the starting frequency of the F2 transition—did not result in a continuous change in the perceptual response.
Place of articulation seemed to be perceived categorically–a series of adjacent stimuli yielding one response followed by a sudden change in response pattern at the next step along the continuum
When the labeling functions for two adjacent phonemes (like /b/ and /d/, or /d/ and /g/) changed suddenly, they crossed at a point where 50% of the responses were for one label, and 50% for the adjacent label
50% point was called the phoneme boundary and indicated the stimulus defining the categorical distinction between two sounds.
Categorical perception is demonstrated
Categorical perception is demonstrated when continuous variation in a physical stimulus is perceived in a discontinuous (i.e., categorical) way.
The study of psychological reactions to variations in physical stimuli is called psychophysics
Categorical perception is an example of a psychophysical phenomenon
Labeling vs Discrimination
A discrimination experiment was required to verify the categorical perception of stop consonant place of articulation.
The previous categorical perception functions may been due to the listeners’ restriction to just three response categories. For example, listeners were not permitted to respond, “This stimulus sounds as if it is midway between a /b/ and /d/ (or between a /d/ and /g/)”
Listeners were asked if two stimuli were the same or different, they said “same” for stimuli chosen within a category, and “different” when stimuli were chosen from adjacent categories
Categorical labeling functions were confirmed by the discrimination experiment.
Liberman, Cooper, Shankweiler, and Studdert-Kennedy (1967) pointed to
Liberman, Cooper, Shankweiler, and Studdert-Kennedy (1967) pointed to categorical perception as a cornerstone of the motor theory of speech perception.
Listeners do not hear the continuous changes in F2 starting frequency, at least until a category boundary is reached, because they cannot produce continuous changes in place of articulation.
Places of articulation for stops are essentially categorical, allowing no “in-between” articulatory placements.
Motor Theory of Speech Perception
Built on the idea that speech perception is constrained by speech production
Categorical production of a speech feature, such as place of articulation for stops, limits speech perception to the same categories. Detection of acoustic differences within categories is therefore not possible.
Lieberman et al.’s (1967) focus on the role of speech production in speech perception extended beyond the demonstration of categorical perception
Regarded the lack of acoustic invariance for a given stop consonant as a problem for a theory of speech perception in which listeners based their phonetic decisions on information in the acoustic signal.
Instead, the constant factor in speech perception, at least for stop consonants, was thought to be the articulatory characteristics of a stop consonant.
Liberman et al. (1967) argued for
Liberman et al. (1967) argued for a species-specific mechanism in the brain of humans—a specialized and dedicated module for the perception of speech.
An important component of this claim was the “match” between the capabilities of the speech production and speech perception mechanisms.
The match was proposed as an evolutionary, encoded form of communication.
The encoding is on the speech production side of communication; the decoding is provided by the special perceptual mechanism in the brain of humans.
Motor Theory Primary Claims
Speech perception is a species-specific human endowment
Speech acoustic signal associated with a given sound is far too variable to be useful for speech perception, but the underlying articulatory behavior is not, hence the claim that speech is perceived by reference to articulation.
Speech Perception is Species Specific
The ability to speak and form millions of
The ability to speak and form millions of novel sentences is exclusive to humans.
By extension, a theory of speech perception can be thought of as a capability “matched” to speech production is regarded by many scientists as an exclusively human capability.
There is evidence in monkeys, bats, and birds (and other animals) of perceptual mechanisms matched to the specific vocalizations produced by each of these animals
Categorical Perception in Infants
Demonstration of categorical perception in infants as young as 1 month
Taken as evidence that the mechanism is innate and hence as strong support for the motor theory of speech perception
The infant categorical perception functions were very much like those obtained from adult listeners, even though infants do not produce speech.
Data obtained using high amplitude sucking paradigm
Possible Falsification of the Motor Theory
Kuhl et al. and others demonstrated categorical perception for voice onset time (VOT) and stop place of articulation in chinchillas and monkeys, respectively.
If categorical perception is the result of a special linkage between human speech production and perception, as claimed by Liberman et al. (1967), the finding of categorical speech perception in animals could be considered a falsification of the linkage specifically, and the motor theory in general
Duplex Perception
Phenomenon in which the speech module and general auditory mechanisms seem to be activated simultaneously by one signal
If the F3 transition portion is edited out from the schematic signal in the upper part of the figure and played to listeners, the brief signal (~50 ms in duration) sounds something like a bird chirp or whistle glide.
People do not hear these isolated transitions—as “chirps,” quick frequency glides (glissandi, in musical terms)—they are not heard as phonetic events.
Listeners hear the three-formant pattern as either /g/ or /d/, but when that brief, apparently critical F3 transition is isolated from the spectrographic pattern and played to listeners, they hear something with absolutely no phonetic quality.
Duplex Perception–Same Ear Experiments
Whalen and Liberman (1987) discovered that a duplex perception was obtainable when the base and isolated F3 transition were delivered to the same ear, provided the isolated F3 transition was increased in intensity relative to the base.
When the “chirp” intensity was relatively low in comparison with the “base,” listeners heard a good /dɑ/ or /gɑ/ depending on which F3 transition was used. As the F3 “chirp” was increased in intensity, a threshold was reached at which listeners heard both a good /dɑ/ or /gɑ/ plus a “chirp.”
Fowler and Rosenblum repeated this experiment using the slamming metal door signal split into a base and chirp
Relatively low “chirp” intensities in combination with the “base” produced a percept of a slamming metal door. As the “chirp” intensity was raised, a threshold was reached at which listeners heard the slamming metal door plus the shaking can of rice/tambourine/jangling keys.
Fowler and Rosenblum thus evoked a duplex percept exactly parallel to the one described above for /dɑ/ and /gɑ/, except in this case for nonspeech sounds.
Acoustic Invariance & Theories of Speech Perception
The lack of acoustic invariance for speech
The lack of acoustic invariance for speech sounds was an important catalyst for the development of the motor theory of speech perception.
Blumstein and Stevens (1979) performed an acoustic analysis of stop burst acoustics that led them to reject this central claim of the motor theorists.
Liberman and Mattingly (1985) identified complications with “auditory theories of speech perception” which claim that information in the speech acoustic signal is sufficient, and sufficiently consistent, to support speech perception.
These theories regard the auditory mechanisms for speech perception to be the same as mechanisms for the perception of any acoustic signal
Acoustic Invariance & Theories of Speech Perception
Liberman and Mattingly pointed to what
Liberman and Mattingly pointed to what “extraphonetic” factors that cause variation in the acoustic characteristics of speech sounds (e.g., sex, age, speaking rate)
An auditory theory of speech perception either requires either of the following
listeners must learn and store all these different formant patterns OR
employ some sort of cognitive process to place all formant patterns on a single, “master” scale.
“Speaker (talker) normalization” problem: question of how one hears the same vowel (or consonant) when so many different-sized vocal tracts produce it with different formant frequencies
How the Motor Theory Addresses the Speaker Normalization Problem
Argues that the perception of different formant transition patterns is mediated by a special mechanism that extracts intended articulatory gestures and “outputs” these gestures as the percepts.
For example, the motor theory assumes that the intended gestures for the vowel a given word are roughly equivalent for men, women, and children, even if the outputs of their different-sized vocal tracts are different.
The special speech perception module registers the same intended gesture for all three speakers, and hence the same vowel perception (or the same consonant perception).
Another Issue with Auditory Theories
For any given sound, there are at least
For any given sound, there are at least several different acoustic cues that can contribute to the proper identification of the sound.
Liberman and Mattingly (1985) pointed out that none of these individual values are necessarily critical to the proper identification of a sound segment, but the collection of the several values may be.
Among these several cues, the acoustic value of one can be “offset” by the acoustic value of another to yield the same phonetic percept
Best, Morrongiello, and Robson (1981) Continued
When the length of the closure interval
When the length of the closure interval between the /s/ and /eɪ/ was rather short (~30–50 ms) and resulted in roughly equal “say” and “stay” response
When the F1 starting frequency was the higher one, a longer closure interval was required for listeners to hear “stay.”
When the F1 starting frequency was the lower one, a shorter closure interval allowed the listeners to hear “stay.”
The two cues to the presence of a /t/ between the /s/ and /eɪ/ seemed to “trade off” against each other to produce the same percept—a clear /t/ between the fricative and the following vowel.
“Trading relations” is the term used for any set of speech cues that can be manipulated in opposite directions to yield a constant phonetic percept.
Sufficient Acoustic Invariance
Blumstein and Stevens (1979) demonstrated a fair degree of acoustic consistency for stop consonant place of articulation, and many automatic classification experiments imply consistency in the acoustic signal for vowels, diphthongs, nasals, fricatives, and semivowels.
Lindblom (1990) argued for a more flexible view of speech acoustic variability that does not need absolute acoustic invariance for a speech sound, but only enough to maintain discriminability from neighboring sound classes.
Presumably, an initial front-end acoustic analysis of the speech signal by general auditory mechanisms is supplemented by higher-level processing which resolves any ambiguities in sound identity.
Bottom-up vs. Top-down processing