Topic 6 - Speech Perception Flashcards
Acoustic Signal
Produced by air that is pushed up from the lungs through the vocal cords and into the vocal tract
Vowels
Vowels are produced by vibration of the vocal cords and changes in the shape of the vocal tract by moving the articulators
These changes in shape cause changes in the resonant frequency of the vocal tract and produce peaks in pressure at a number of frequencies called formants
Each vowel has a characteristic series of ‘formants’ (resonant frequencies)
Articulators
structures such as tongue, lips, teeth, jaw and soft palate
Formants
Resonant Frequencies
The first formant has the lowest frequency, the second has the next highest, etc.
Sound/Speech spectrograms
Sound/Speech spectrograms are a better way to show the changes in frequency and intensity for speech
Consonants
Produced by a constriction of the vocal tract and air flow around articulators
Phoneme
smallest unit of speech that changes meaning of a word
In English there are 47 phonemes - 13 major vowel sounds and 24 major consonant sounds
Variability problem
There is no simple correspondence between the acoustic signal and individual phonemes
- Variability comes from a phoneme’s context
- Acoustic signals that vary can be perceived categorically (/b/ as in ‘bat’ or ‘bite’) i.e., as ‘the same’ sound
Coarticulation
overlap between articulation of neighbouring phonemes also causes variation
Variability between different speakers
Speakers differ in pitch, accent, speed in speaking, and pronunciation
- This acoustic signal must be transformed into familiar words
- People perceive speech easily in spite of the variability problems due to perceptual constancy
Categorical perception
This occurs when a wide range of acoustic cues results in the perception of a limited number of sound categories
Voice onset time (VOT)
time delay between when a sound starts and when voicing begins (vocal chords begin to vibrate)
VOT experiment Eimas & Corbit
VOT for /da/ is 17ms (short) and /ta/ is 91ms (long)
Computers were used to create stimuli with a range of VOTs from long to short
Listeners do not hear the incremental changes, instead they hear a sudden change from /da/ to /ta/ at the phonetic boundary
Thus, we experience perceptual constancy for the phonemes within a given range of VOT
The McGurk Effect
Visual stimulus shows a speaker saying “ga-ga.”
Auditory stimulus has a speaker saying “ba-ba.”
Observer watching and listening hears “da-da”, which is the midpoint between “ga” and “ba.”
Observer with eyes closed will hear “ba.”
Physiological link between vision and speech
Calvert et al. showed that the same brain areas are activated for lip reading and speech perception.
FFA activation
Von Kreigstein et al. showed that the FFA is activated when listeners hear familiar voices
This shows a link between perceiving faces and voices.
Experiment by Rubin et al.
meaning and phoneme
Short words (sin, bat, and leg) and short nonwords (jum, baf, and teg) were presented to listeners.
The task was to press a button as quickly as possible when they heard a target phoneme
On average, listeners were faster with words (580 ms) than non-words (631 ms)
Experiment by Warren
meaning and phoneme
Listeners heard a sentence that had a phoneme covered by a cough
The task was to state where in the sentence the cough occurred.
Listeners could not correctly identify the position and they also did not notice that a phoneme was missing - called the phonemic restoration effect
This did not happen for non-word sentences
Experiment by Miller and Isard
meaning and phoneme
Stimuli were three types of sentences:
- Normal grammatical sentences - Anomalous sentences (made no sense) that were grammatical - Ungrammatical strings of words
Listeners were to shadow (repeat aloud) the sentences as they heard them through headphones
Results showed that listeners were – 89% accurate with normal sentences – 79% accurate for anomalous sentences – 56% accurate for ungrammatical word strings
Segmentation Problem
there are no physical breaks in the continuous acoustic signal
Top-down processing, including knowledge a listener has about a language, affects perception of the incoming speech stimulus
Segmentation is affected by context, meaning, and our knowledge of word structure
Speech Segmentation
the perception of individual words in a conversation
Word Structure, Segmentation
Transitional probabilities - the chance that one sound will follow another in a language
Statistical learning - the process of learning transitional probabilities and other language characteristics
Statistical Learning experiment by Saffran et al.
Learning phase - infants heard nonsense words in two-minute strings of continuous sound that contained transitional probabilities
Nonsense words were in random order within the string
If infants use transitional probabilities, they should recognize the words as units even though the string of words had no breaks
Indexical characteristics
characteristics of the speaker’s voice such as age, gender, emotional state, level of seriousness, etc
Speaker characteristics experiment by Palmeri et al.
Listeners were to indicate when a word was new in a sequence of words.
Results showed that they were much faster if the same speaker was used for all the words, than when a different speaker was used for each
Broca’s aphasia
Individuals have damage in Broca’s area in frontal lobe
Laboured and stilted speech and short sentences but they understand others
Wernicke’s aphasia
individuals have damage in Wernicke’s area in temporal lobe
Speak fluently but the content is disorganized and not meaningful
They also have difficulty understanding others and word deafness may occur in extreme cases
Brain damage
Some patients with brain damage can discriminate words but are unable to discriminate syllables (and vice versa)
Brain scans found:
A “voice area” in the STS that is activated more by voices than other sounds
A ventral stream for recognizing speech and a dorsal stream that links the acoustic signal to movements for producing speech - called the dual stream model of speech perception
Experience dependent plasticity
Before age one, human infants can tell difference between sounds that create all languages
The brain becomes “tuned” to respond best to speech sounds that are in the environment
Other sound differentiation disappears when there is no reinforcement from the environment