Speech Perception Flashcards

1
Q

Speech

A

Complex acoustic stimulus used by most humans
Often essential for language and language development
Understanding of speech perception requires knowledge of:
Speech production
Language
Auditory system
Speech perception is a multifaceted and complicated topic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Cooper, Liberman & Borst (1951

A

Cooper, Liberman & Borst (1951)
Discovered that a two-formant pattern with proper F1 and F2 transitions elicited perception of stop-vowel syllables even without inclusion of a stop-burst in the signal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

The /ba/-/da/-/ga/ Experiment

A

Cooper et al. (1951) asked What happens to listeners’ perception when the starting frequency of F2 is changed in small and systematic steps over a large range of frequencies?
They created a continuum of stimuli to investigate this question

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Categorical Perception

A

Relatively continuous variation of the physical stimulus—the starting frequency of the F2 transition—did not result in a continuous change in the perceptual response.
Place of articulation seemed to be perceived categorically–a series of adjacent stimuli yielding one response followed by a sudden change in response pattern at the next step along the continuum
When the labeling functions for two adjacent phonemes (like /b/ and /d/, or /d/ and /g/) changed suddenly, they crossed at a point where 50% of the responses were for one label, and 50% for the adjacent label
50% point was called the phoneme boundary and indicated the stimulus defining the categorical distinction between two sounds.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Categorical perception is demonstrated

A

Categorical perception is demonstrated when continuous variation in a physical stimulus is perceived in a discontinuous (i.e., categorical) way.
The study of psychological reactions to variations in physical stimuli is called psychophysics
Categorical perception is an example of a psychophysical phenomenon

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Labeling vs Discrimination

A

A discrimination experiment was required to verify the categorical perception of stop consonant place of articulation.
The previous categorical perception functions may been due to the listeners’ restriction to just three response categories. For example, listeners were not permitted to respond, “This stimulus sounds as if it is midway between a /b/ and /d/ (or between a /d/ and /g/)”
Listeners were asked if two stimuli were the same or different, they said “same” for stimuli chosen within a category, and “different” when stimuli were chosen from adjacent categories
Categorical labeling functions were confirmed by the discrimination experiment.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Liberman, Cooper, Shankweiler, and Studdert-Kennedy (1967) pointed to

A

Liberman, Cooper, Shankweiler, and Studdert-Kennedy (1967) pointed to categorical perception as a cornerstone of the motor theory of speech perception.
Listeners do not hear the continuous changes in F2 starting frequency, at least until a category boundary is reached, because they cannot produce continuous changes in place of articulation.
Places of articulation for stops are essentially categorical, allowing no “in-between” articulatory placements.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Motor Theory of Speech Perception

A

Built on the idea that speech perception is constrained by speech production
Categorical production of a speech feature, such as place of articulation for stops, limits speech perception to the same categories. Detection of acoustic differences within categories is therefore not possible.
Lieberman et al.’s (1967) focus on the role of speech production in speech perception extended beyond the demonstration of categorical perception
Regarded the lack of acoustic invariance for a given stop consonant as a problem for a theory of speech perception in which listeners based their phonetic decisions on information in the acoustic signal.
Instead, the constant factor in speech perception, at least for stop consonants, was thought to be the articulatory characteristics of a stop consonant.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Liberman et al. (1967) argued for

A

Liberman et al. (1967) argued for a species-specific mechanism in the brain of humans—a specialized and dedicated module for the perception of speech.
An important component of this claim was the “match” between the capabilities of the speech production and speech perception mechanisms.
The match was proposed as an evolutionary, encoded form of communication.
The encoding is on the speech production side of communication; the decoding is provided by the special perceptual mechanism in the brain of humans.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Motor Theory Primary Claims

A

Speech perception is a species-specific human endowment
Speech acoustic signal associated with a given sound is far too variable to be useful for speech perception, but the underlying articulatory behavior is not, hence the claim that speech is perceived by reference to articulation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Speech Perception is Species Specific
The ability to speak and form millions of

A

The ability to speak and form millions of novel sentences is exclusive to humans.
By extension, a theory of speech perception can be thought of as a capability “matched” to speech production is regarded by many scientists as an exclusively human capability.
There is evidence in monkeys, bats, and birds (and other animals) of perceptual mechanisms matched to the specific vocalizations produced by each of these animals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Categorical Perception in Infants

A

Demonstration of categorical perception in infants as young as 1 month
Taken as evidence that the mechanism is innate and hence as strong support for the motor theory of speech perception
The infant categorical perception functions were very much like those obtained from adult listeners, even though infants do not produce speech.
Data obtained using high amplitude sucking paradigm

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Possible Falsification of the Motor Theory

A

Kuhl et al. and others demonstrated categorical perception for voice onset time (VOT) and stop place of articulation in chinchillas and monkeys, respectively.
If categorical perception is the result of a special linkage between human speech production and perception, as claimed by Liberman et al. (1967), the finding of categorical speech perception in animals could be considered a falsification of the linkage specifically, and the motor theory in general

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Duplex Perception

A

Phenomenon in which the speech module and general auditory mechanisms seem to be activated simultaneously by one signal
If the F3 transition portion is edited out from the schematic signal in the upper part of the figure and played to listeners, the brief signal (~50 ms in duration) sounds something like a bird chirp or whistle glide.
People do not hear these isolated transitions—as “chirps,” quick frequency glides (glissandi, in musical terms)—they are not heard as phonetic events.
Listeners hear the three-formant pattern as either /g/ or /d/, but when that brief, apparently critical F3 transition is isolated from the spectrographic pattern and played to listeners, they hear something with absolutely no phonetic quality.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Duplex Perception–Same Ear Experiments

A

Whalen and Liberman (1987) discovered that a duplex perception was obtainable when the base and isolated F3 transition were delivered to the same ear, provided the isolated F3 transition was increased in intensity relative to the base.
When the “chirp” intensity was relatively low in comparison with the “base,” listeners heard a good /dɑ/ or /gɑ/ depending on which F3 transition was used. As the F3 “chirp” was increased in intensity, a threshold was reached at which listeners heard both a good /dɑ/ or /gɑ/ plus a “chirp.”
Fowler and Rosenblum repeated this experiment using the slamming metal door signal split into a base and chirp
Relatively low “chirp” intensities in combination with the “base” produced a percept of a slamming metal door. As the “chirp” intensity was raised, a threshold was reached at which listeners heard the slamming metal door plus the shaking can of rice/tambourine/jangling keys.
Fowler and Rosenblum thus evoked a duplex percept exactly parallel to the one described above for /dɑ/ and /gɑ/, except in this case for nonspeech sounds.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Acoustic Invariance & Theories of Speech Perception
The lack of acoustic invariance for speech

A

The lack of acoustic invariance for speech sounds was an important catalyst for the development of the motor theory of speech perception.
Blumstein and Stevens (1979) performed an acoustic analysis of stop burst acoustics that led them to reject this central claim of the motor theorists.
Liberman and Mattingly (1985) identified complications with “auditory theories of speech perception” which claim that information in the speech acoustic signal is sufficient, and sufficiently consistent, to support speech perception.
These theories regard the auditory mechanisms for speech perception to be the same as mechanisms for the perception of any acoustic signal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Acoustic Invariance & Theories of Speech Perception
Liberman and Mattingly pointed to what

A

Liberman and Mattingly pointed to what “extraphonetic” factors that cause variation in the acoustic characteristics of speech sounds (e.g., sex, age, speaking rate)
An auditory theory of speech perception either requires either of the following
listeners must learn and store all these different formant patterns OR
employ some sort of cognitive process to place all formant patterns on a single, “master” scale.
“Speaker (talker) normalization” problem: question of how one hears the same vowel (or consonant) when so many different-sized vocal tracts produce it with different formant frequencies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

How the Motor Theory Addresses the Speaker Normalization Problem

A

Argues that the perception of different formant transition patterns is mediated by a special mechanism that extracts intended articulatory gestures and “outputs” these gestures as the percepts.
For example, the motor theory assumes that the intended gestures for the vowel a given word are roughly equivalent for men, women, and children, even if the outputs of their different-sized vocal tracts are different.
The special speech perception module registers the same intended gesture for all three speakers, and hence the same vowel perception (or the same consonant perception).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Another Issue with Auditory Theories
For any given sound, there are at least

A

For any given sound, there are at least several different acoustic cues that can contribute to the proper identification of the sound.
Liberman and Mattingly (1985) pointed out that none of these individual values are necessarily critical to the proper identification of a sound segment, but the collection of the several values may be.
Among these several cues, the acoustic value of one can be “offset” by the acoustic value of another to yield the same phonetic percept

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Best, Morrongiello, and Robson (1981) Continued
When the length of the closure interval

A

When the length of the closure interval between the /s/ and /eɪ/ was rather short (~30–50 ms) and resulted in roughly equal “say” and “stay” response
When the F1 starting frequency was the higher one, a longer closure interval was required for listeners to hear “stay.”
When the F1 starting frequency was the lower one, a shorter closure interval allowed the listeners to hear “stay.”
The two cues to the presence of a /t/ between the /s/ and /eɪ/ seemed to “trade off” against each other to produce the same percept—a clear /t/ between the fricative and the following vowel.
“Trading relations” is the term used for any set of speech cues that can be manipulated in opposite directions to yield a constant phonetic percept.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Sufficient Acoustic Invariance

A

Blumstein and Stevens (1979) demonstrated a fair degree of acoustic consistency for stop consonant place of articulation, and many automatic classification experiments imply consistency in the acoustic signal for vowels, diphthongs, nasals, fricatives, and semivowels.
Lindblom (1990) argued for a more flexible view of speech acoustic variability that does not need absolute acoustic invariance for a speech sound, but only enough to maintain discriminability from neighboring sound classes.
Presumably, an initial front-end acoustic analysis of the speech signal by general auditory mechanisms is supplemented by higher-level processing which resolves any ambiguities in sound identity.
Bottom-up vs. Top-down processing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Bottom-up & Top-down Processing

A

The front-end analysis (bottom-up) is like a hypothesis concerning the identity of the sequence of incoming sounds, based on initial processing of the incoming acoustic signal.
The higher-level processes (top-down) include knowledge of the context in which each sound is produced, plus syntactic, semantic, and pragmatic constraints on the message.
Listeners bring more to the speech perception process than a capability for acoustic analysis.
These additional sources of knowledge considerably loosen the demand for strict acoustic invariance for each sound segment.

23
Q

Bottom-up & Top-down Processing
Top-down processes influence

A

Top-down processes influence the bottom-up analyses, taking advantage of the rich source of information in the auditory signal.
Stevens (2005) has proposed a speech perception model in which bottom-up auditory mechanisms analyze the incoming speech signal for segment identity and top-down processes resolve ambiguities emerging from this front-end analysis.
When an account of speech perception is framed within the general cognitive abilities of humans, including top-down processes, a role for general auditory analysis in the perception of speech becomes much more plausible (Lotto & Holt, 2006).
In this view, the lack of strict acoustic invariance for speech sounds cannot be used as an argument against a primary role of general auditory mechanisms in speech perception.

24
Q

Auditory Explanation of Perception of Place

A

Fruchter and Sussman (1997) explored the perceptual value of locus equations by varying the parameters in small steps and presenting the resulting linear functions to listeners for identification of /b/, /d/, and /g/
They found that the varying combinations of F2 onset and F2 target were not heard as continuous variations, but were clustered in the categories /b/, /d/, and /g/, consistent with the acoustic measurements that separated the three stops.
Sussman, Fruchter, Hilbert, and Sirosh (1998) explained that many animals have special neural mechanisms for connecting two acoustic events and using those connections to establish categories
argued that these connections may be species-specific and matched to vocalization characteristics of the different species

25
Q

Replication of Speech Perception Effects Using Nonspeech Signals

A

Categorical perception of speech signals has been a centerpiece of the original and revised motor theory.
However, the demonstration of the same effects with nonspeech signals, seems to damage the proposed link between speech production and speech perception implied by findings of categorical perception for speech signals
If the results of a nonspeech experiment are the same as a speech experiment, the categorical perception effect can be attributed to general auditory, not perceptual, mechanisms specialized for speech..

26
Q

Pisoni (1977)

A

Reviewed speech perception experiments in which labeling and discrimination data suggested categorical perception of voiced and voiceless stops.
Figure 12-11 shows a typical data set
The interpretation of categorical perception of VOT was consistent with the motor theory of speech perception. Speakers cannot produce continuous changes in VOT, so they cannot perceive them

27
Q

Animal and Infant Perception of Speech Signals

A

Auditory theorists point to the demonstration in animals of categorical perception for many speech sound contrasts, as well as the ability of animals to learn phonetic contrasts when properly trained, as evidence for the use of general auditory mechanisms in the perception of speech
Saffran and Thiessen (2007) have summarized evidence for the human infant’s use of phonetic data from the environment in the development of sound categories.
They argue for general cognitive mechanisms in a child’s learning of speech and language that allow the child to build an enormous phonetic database which is used to organize regularities within the data.
statistical learning is a theoretical approach to child speech and language learning

28
Q

direct Realism Theory of Speech Perception

A

Alternative to both the motor theory and a general auditory approach
J.J. Gibson proposed the idea that animals, including humans, perceive the visual layouts of environments directly, by linking the stimulation of their senses with the sources of the stimulation.
Objects in the environment structure the medium through which they are conveyed to the senses
In Gibson’s view, perceivers do not “process” and “encode” the light waves via cognitive operations whose output is a symbolic representation.
Some scientists (Cleary & Pisoni, 2001) question the utility of direct realism as a theory because it is hard to understand how reasonable experimental tests can be made to support or falsify it.

29
Q

How does Direct Realism differ from other theories?

A

Motor theory requires a special, completely automatic mechanism (a module) to transform acoustic signals into articulatory behavior. Direct realism proposes no special mechanisms.
In direct realism, listeners hear the articulatory gestures, whereas in a general auditory approach, listeners hear the acoustic signal, not the gestures that produced the signal.
Direct realists argue that a general auditory approach to speech perception is overly complicated because a listener must learn and store all the different variants of spectra for a given sound (due to coarticulation).

30
Q

Direct Realism vs General Auditory Approaches

A

In direct realism, listeners are not burdened with the learning and storage problem of these many spectral and temporal variations because they “hear” the articulatory gestures combined, or co-produced
in direct realism the degree to which two articulatory gestures, such as lingual and labial gestures, are coproduced, or overlapped, is perceived directly.
In a general auditory approach, the large number of variations introduces acoustic variability for a given sound that may complicate the learning of speech sound categories and the mature form of speech perception

31
Q

Vowel Perception

A

Strong categorical perception is present for consonants but not vowels
Motor theorists argue that humans can produce continuous variations in vowels so it is logical that they can perceive them
However, it is not strictly true that vowels can be produced continuously
Vowels and consonants are not treated differently in auditory theories
A vowel system, and the acoustic characteristics of the segments within the system, can be established over the course of phonetic learning by auditory exposure to thousands of vowels.
Along the learning trajectory in language development an acoustic template is organized for a vowel category.

32
Q

Vowel Perception
Vowel templates are often

A

Vowel templates are often conceptualized in terms of formant frequencies at the temporal middle of a vowel. However, there are some issues with with this view:
Why would vowel acoustic templates be developed only for the target formant frequencies?
Auditory measurement of target formant frequencies at the temporal middle of the vowel would require an analysis window centered around the midpoint
Short measurement windows may be no more than 20% of the overall vowel duration. Such “slice-in-time” sampling of vowel acoustics ignores frequency change throughout the vowel.
Hillenbrand and Nearey (1999) showed that the identification of vowels whose formant frequencies changed naturally over time were identified better than synthesized “flat formant” versions (90-95% compared to 70-75%_

33
Q

Normalization

A

Normalization reduces or eliminates some of the difficulties associated with large variability in formant frequencies resulting from factors such as vocal tract length.
Speaker normalization is an auditory process that represents the acoustic characteristics of speech sounds on a common scale, reducing or eliminating the variability due to various factors
An example of normalization is the use of formant frequency ratios, rather than individual formant frequencies, as the acoustic representation of vowels
Formant frequencies for a given vowel are different depending on the length of the vocal tract, but formant ratios may be more constant for different vocal tract lengths.
The auditory system may process the acoustic signal for vowels using ratios such as F2/F1 and F3/F2, rather than the single formant frequencies F1, F2, and F3.

34
Q

Support for Formant Ratios

A

The ratio approach is consistent with the idea that formant frequencies for a given vowel produce a pattern of stimulation on the basilar membrane, the part of the cochlea containing cells responsible for hearing sensation
This pattern can be moved up and down the basilar membrane, which is to say up and down the frequency range important for speech perception (roughly 50–12,000 Hz).
The exact frequencies stimulated on the basilar membrane are not critical for vowel perception; the pattern of stimulation is critical, and it is similar regardless of vocal tract length.

35
Q

Talker Normalization

A

Calibration, by a listener, of an individual talker’s vowel space
Frequencies of a speech or nonspeech signal preceding a vowel affects the identification of a vowel.
A low frequency signal preceding a vowel produces a different effect on vowel identification compared with a preceding high-frequency signal.
It is as if the frequency of the signal preceding the vowel establishes frequency expectations for the listener that are carried over to the vowel identification
The second finding is that the intelligibility of a list of words is better when the words are spoken by a single talker, as compared with two or more talkers

36
Q

Direct Realism & Vowels

A

In the direct realism theory of speech perception, articulatory gestures are perceived directly and no special mechanisms are required to make perceptual decisions concerning phonetic events.
The acoustic variability for a given vowel becomes a non-issue
The directly-perceived gestures for the vowel may have variable formant frequencies, especially across speakers whose vocal tract lengths are very different (e.g., men versus children), but the gestures are nearly the same.
Unfortunately, it is not true that different speakers use the same articulatory gestures for a specific sound.
Johnson, Ladefoged, and Lindau (1993) showed substantial articulatory variation for the same vowel across different speakers.
Westbury, Hashi, and Lindstrom (1998) reported significant speaker-to-speaker variability of prevocalic “r”
Direct realism does not have a ready answer at the level of articulatory gestures for this variability

37
Q

Direct Realism & Vowels

A

In the direct realism theory of speech perception, articulatory gestures are perceived directly and no special mechanisms are required to make perceptual decisions concerning phonetic events.
The acoustic variability for a given vowel becomes a non-issue
The directly-perceived gestures for the vowel may have variable formant frequencies, especially across speakers whose vocal tract lengths are very different (e.g., men versus children), but the gestures are nearly the same.
Unfortunately, it is not true that different speakers use the same articulatory gestures for a specific sound.
Johnson, Ladefoged, and Lindau (1993) showed substantial articulatory variation for the same vowel across different speakers.
Westbury, Hashi, and Lindstrom (1998) reported significant speaker-to-speaker variability of prevocalic “r”
Direct realism does not have a ready answer at the level of articulatory gestures for this variability

38
Q

Speech Perception & Word Recognition

A

Speech sound identification is clearly an important part of speech perception, but just as clearly the goal of speech perception is to recognize words, their combinations, and ultimately the message they convey.
Can imagine the lexicon as consisting of word “units” represented by strings of phonemes.
Spoken word recognition occurs when the incoming sounds are identified and well matched to one of these stored word units.
However, analysis of speech acoustic information is not independent of lexical effects. Ganong (1980) showed this effect:
Listeners responded to stimuli along a /d/-/t/ continuum.
When the real word “dash” was at one end of the continuum and the nonword “tash” at the other, a significant increase was observed for /d/ responses to the maximally ambiguous VOT.

39
Q

Speech Perception & Word Recognition

A

Gangong’s word showed that top-down processes interact with processing of the incoming acoustic signal.
Spoken word recognition does not require a complete acoustic analysis of all sounds in a word. Listeners make decisions concerning word identity before all incoming sounds are analyzed. This is especially the case for longer, multisyllabic words
Spoken word recognition unfolds over time, by a continuous process of lexical activation as sound analysis and top-down information become increasingly available as the acoustic signal enters the auditory system

40
Q

Lexical Activation & Word Recognition

A

The process by which a candidate set of words is activated by the incoming acoustic signal.
The activation is typically by the acoustic signal associated with word-onset sounds.
As the acoustic input continues and more sounds are available to the listener, the number of word candidates decreases—the word-recognition process including both bottom-up and top-down processes brings a listener closer to the complete word. The acoustic analysis is the bottom-up part of spoken word recognition. As the acoustic analysis is under way, top-down processes eliminate lexical candidates.

41
Q

Speech Perception & Word Recognition

A

The acoustic signal for each sound is variable depending on such factors as dialect, speaking rate, speech style (formal vs. casual), and immediate phonetic context
“Phonetic recalibration”– the acoustic boundary for two contrasting phonemes (such as the /r/-/w/, /s/-/ʃ/, or /b/-/d/ contrasts) is reset depending on factors such as dialect, speaking rate or listening to or learning a new language
Effect of top-down process in speech perception
Dependent on categorical perception method

42
Q

Speech Intelligibility

A

Refers to the effectiveness of communicating an oral message to listeners.
Speech intelligibility tests were originally designed to measure the effectiveness of speech transmission over communication systems, such as telephone land lines
The original concept can be extended to individuals with hearing disorders and speech disorders
Speech reception tests can provide a speech discrimination score
Intelligibility tests for the evaluation of individuals with speech impairment, such as speakers with dysarthria and hearing loss

43
Q

Speech Intelligibility

A

Need to consider additional factors that contribute to speech intelligibility
E.g., “Mr. Jones has a speech intelligibility score of 75%”
Need additional information is available about the test, the listening conditions (including the specific speech materials), and the listeners
Words in sentences tend to have better intelligibility than words in isolation
Sentences provide a context and make them more predictable than in isolation.

44
Q

“Explanatory” Speech Intelligibility Tests

A

Kent et al. (1989) extended the interpretation of speech intelligibility testing by isolating the specific phonetic problems that contributed to intelligibility deficits.
This idea emerged from a common clinical observation that two individuals with the same overall speech intelligibility score (e.g., 60%) may have very different reasons for the intelligibility deficits.
Single-word, multiple choice instrument in which the response alternatives are related to the target by carefully manipulated phonetic contrasts.
An individual with a speech disorder produces the word list, which is then presented to a panel of listeners for word identification.
The analysis of the data includes not only the total number of incorrect words, but the phonetic contrast errors underlying the incorrect choices.

45
Q

“Explanatory” Speech Intelligibility Tests
This analysis generates a profile of “vulnerab

A

This analysis generates a profile of “vulnerable” contrasts—those frequently involved in mismatches between the speaker’s intended word and the listener’s choice
“Vulnerable” contrasts also provided an explanation for why the same overall intelligibility score might be obtained for two individuals whose speech disorders sound so different.
suggest different therapy priorities.
Such speech intelligibility tests are probably of greatest use when a single speaker’s progress (or decline) is tracked across therapy or progression of a disease.
The individual serves as his or her own control in evaluating the effects of management or disease progression.

46
Q

Scaled Speech Intelligibility

A

Can assign numbers to reflect perceived magnitudes of speech intelligibility
Additionally, can also have perceptual scaling of other speech dimensions including speech normalcy, speech acceptability, naturalness, hypernasality, articulatory imprecision, and voice qualities such as breathiness, hoarseness, roughness, and strain.
Speech intelligibility is scaled often in the research literature, and in clinical practice
Equal-appearing interval scales are simple, can be administered quickly, and probably reflect the potential effects of all aspects of speech production (articulation, voice quality, prosody) on speech intelligibility.
nonsegmental factors such as voice quality can contribute to scaled speech intelligibility and may not be captured well by word and/or sentence intelligibility tests that use percentage correct measures.

47
Q

Possible Issues with Scaling Procedures

A

The equal steps between numbers along the scale may imply that a difference between, for example, a scale value of 4 and 5 is psychologically equivalent to the difference between 5 and 6
Issue of floor and ceiling effects and how the endpoints of the scale are anchored
Direct magnitude estimation (DME) is one method of avoiding linearity and endpoint problems
listeners are told to assign the numbers as ratios relative to a defined anchor, or to the immediately preceding stimulus
Listeners hear a sequence of speech stimuli varying in intelligibility and they assign numbers to each stimulus to reflect the magnitude of the speech intelligibility deficit

48
Q

Phonetic Transcription

A

Speech perception task in which a listener generates a symbol or sequence of symbols to represent spoken sounds.
SLPs use phonetic transcription to document sound pattern errors in clients who have developmental speech delay or other types of sound acquisition disorders.
Researchers use phonetic transcription to examine the nature of speech sound development in children and adults with conditions affecting speech

49
Q

Phonetic Transcription
Phonetic transcription is notoriously unreliable

A

Phonetic transcription is notoriously unreliable—two skilled transcribers often use different symbols for identical sounds in the same spoken word
Based on perceptual processes in listening to speech
The auditory theory seems to be the best match for transcription of normal and disordered speech.
The auditory theory is based on analysis that is not different from the auditory analysis of other signals in the environment.
Consistent with the uncertainties of phonetic transcription.
There is evidence of a range of acoustic patterns consistent with a specific sound category, with some acoustic patterns judged as “excellent” versions of the sound and others judged as poor representatives

50
Q

Why should SLPs and AuDs care about speech perception?

A

Clients engage the services of speech-language pathologists to be understood better—to be more intelligible
Audiologists have as a top priority improving clients’ ability to understand speech.
An understanding of speech perception processes in the typical listener can assist speech-language pathologists and audiologists in developing a remediation plan.

51
Q

Speech Perception for SLPs and AuDs

A

Persons with damage to the cerebellum, a part of the brain that plays an important role in regulating sequential motor behavior (among other things), often have a speech disorder called ataxic dysarthria.
The knowledge of spoken word recognition, which depends on an understanding of speech perception, may direct a clinician to plan therapy that is likely to benefit both the speaker and the listener.
An understanding of speech perception and word recognition allows a clinician to exploit knowledge concerning typical listening strategies and therefore maximize the effect of speech therapy.
A goal of audiometric testing is the identification of the frequencies of hearing loss in clients.
When a hearing aid is programmed for optimal speech understanding, knowledge of speech perception is important in matching the amplification characteristics of the aid to hearing loss patterns across frequency.

52
Q

Review
The concept of acoustic invariance

A

The concept of acoustic invariance for speech sounds—or the lack of it—plays a central role in the various theories of speech perception.
The motor theory of speech perception was based on the finding of categorical perception for stop consonant place of articulation.
Additional findings, including duplex perception and trading relations, were used to support the motor theory.
The general auditory approach takes the perspective that the speech acoustic signal is sufficiently consistent to support speech sound perception, and that general auditory mechanisms (not special mechanisms) are used in the perception of speech signals.

53
Q

Review

A

Direct realism claims that articulatory gestures are perceived directly (not by special mechanisms), and that listeners hear articulatory gestures, not acoustic representations of speech sounds that must be processed by cognitive mechanisms for proper recognition.
Speech perception involves not only the perception of speech sounds, but the access of words from the lexicon by a combination of bottom-up and top-down processes.
The most common estimates of speech intelligibility use word or sentence tests, in which the scores are expressed as percentages of correctly heard words, or scaling techniques in which listeners attach numbers to the magnitudes of a speech intelligibility deficit.