Speech Perception Flashcards
acoustic signal/acoustic stimulus
patterns of pressure changes in the air produced by the position/movement of structures within the vocal apparatus
role of vocal cords and vocal tract
the acoustic signal for most speech sounds is created by air that is pushed up from the lungs past the vocal cords and into the vocal tract (the airway above the larynx used for the production of speech - it includes the oral and nasal tracts). The produced sound depends on the shape of the vocal tract as air escaping from the lungs is pushed through it.
articulators
structures such as the tongue, lips, teeth, jaw and soft palate, which alter the shape of the vocal tract by moving
How are vowels produced?
by vibration of the vocal cords; the specific sounds of each vowel are created by changing the overall shape of the vocal tract. This change of shape changes the resonant frequency of the vocal tract and produces peaks of pressure at a number of different frequencies. The frequencies at which these peaks occur are called formants
formants
each vowel has a particular series of formants. The first formant has the lowest frequency, the second formant is the next highest.
How are consonants produced?
by constriction (narrowing) of the vocal tract. For example, to produce a /d/, one places their tongue against the ridge above the upper teeth and release a slight rush of air as they move the tongue away from the alveolar ridge
Manner of articulation
how the articulators interact when making a speech sound, e.g. /b/ is created by blocking the airflow and releasing it quickly
Place of articulation
the locations of the articulation - lips, alveolar bridge or the soft palate (e.g., when saying /g/, /d/, and /b/, the place of articulation moves from back to the front of the mouth
voicing
whether or not the vocal cords are vibrating (/b/, /m/, /z/, etc.) or not (/p/,/s/, etc.)
Sound spectrogram
a visual representation of a sound’s spectrum of frequencies as it varies with time. It shoes which frequencies are in a sound at which moment of time
Formant transitions
rapid shifts in frequency preceding or following formants
phoneme
the shortest segment of speech that, if changed, would change the meaning of a word. Phonemes do not refer to letters but to speech sounds that determine the meaning of speech
variability problem
a particular phoneme can be associated with a number of different acoustic signals
Coarticulation
the overlap between the articulation of neighboring phonemes, e.g., bats and boots. Since the articulators are constantly moving while we talk, the shape of the vocal tract associated with a particular phoneme is influenced by the sounds that precede and follow that phoneme
between-speaker differences
variations in how different people pronounce words. Variations between speakers such as pitch, speed and accent result in a particular phoneme having different acoustic signals for different speakers
within-speaker differences
variations in how an individual pronounces words. For example, talking to a friend versus talking to a teacher might lead to differences in pronounciation
motor theory of speech perception
based on the fact that motor commands have a 1:1 relationship to phonemes. It states that hearing a sound triggers motor processes in the listener associated with producing the sound. Has been disproven
categorical perception
stimuli that exist along a continuum are perceived as divided into discrete categories
voice onset time (VOT)
the time delay between when a sound begins and when the vocal cords begin vibrating
phonetic boundary
the VOT at which perception changes from one category to another. The fact that all stimuli on the same side of the phonetic boundary are perceived as the same category is an example of perceptual constancy
phonemic restoration effect
sounds missing from speech can be restored by the brain and actually be perceived
-> more likely to happen for longer words
-> influenced by meaning of words following the missing phoneme
speech segmentation
the perception of individual words in a conversation. The acoustic signal for spoken sentences is continuous, but we perceive breaks between words. The meaning of words help us segment speech by knowing which word can fit in which context
transitional probability
the chance that one sound will follow another sound. Every language has different transitional probabilities, which are learned along with learning to understand words and sentences
why is speech perception multimodal=
it can be influenced by information from other senses such as vision and touch
McGurk effect
seeing a person’s lips make movements for one sound influences how the actual sound that arrives to the ears is perceived
aphasia
a language disorder caused by damage in a specific area of the brain that controls language expression and comprehension
Broca’s aphasia
caused by damage to Broca’s area in the frontal lobe. It causes patients to have slow, labored, ungrammatical speech. They also have difficulty understanding some types of sentences (e.g., sentences that contain connecting words such as “by”).
Wernicke’s aphasia
caused by damage to Wernicke’s area in the temporal lobe. It causes patients to produce fluent & grammatically correct, but meaningless speech. They are also unable to understand speech and writing
word deafness
inability to recognize words, even though the ability to hear pure tones remains intact. It occurs in the most extreme form of Wernicke’s aphasia
Dual-stream model of speech perception
a model that identifies 2 streams for speech perception
ventral stream
supports speech comprehension. sends signals from the anterior auditory area to the frontal cortex.
dorsal stream
involved in linking the acoustic signal to the movements used to produce speech. Sends signals from the posterior auditory area to the parietal lobe and motor areas.
voice cells
neurons in monkeys’ what hearing pathway in the temporal lobe that respond strongly to recordings of monkey calls than to calls of other animals
Why has the Motor Theory of Speech Perception been disproven?
Study: TMS of motor areas associated with making specific sounds can aid in perception of these sounds.
-> Stimulation of lip area resulted in faster responding to the phenomes /b/ and /p/
-> Stimulation of the tongue area resulted in faster responding to the phonemes /d/ and /t/
! Does not explain how people with brain damage that disables their speech motor system can still perceive speech
! Does not explain how infants can understand speech before they have learned to speak.
Study: VOT
Listeners are asked to indicate what sound they heaer when the researchers vary the VOT between 0 and 80ms
-> Listeners only reported hearing /da/ or /ta/, despite the fact that a large number of different stimuli (with different VOTs) were presented
-> From 0 to about 35ms, listeners hear /da/. From 35ms and above, their perception changes abruptly to /ta/.
Meaningful context study
Listeners were presented with a series of short words or nonwords. They were asked to press a key as rapidly as possible whenever they heard a sound that began with the phenome /b/. On average, it took participants 632ms to respond to the nonwords and 580ms to respond to the real words.
Study phenomic restoration effect
Participants listen to a recording of the sentence “The state governors met with their respective legislatures convening in the capital city.”, where the first /s/ in “legislatures” was replaced with the sound of a cough
-> Participants were asked where in the sentence the cough occurred
-> None of the participants identified the correct position of the cough. None noticed that the /s/ in “legislatures” was missing.
Study McGurk Effect
fMRI activity was recorded in observers who watched a silent videotape of a person making mouth movements for saying numbers.
-> Watching the lips move activated an area in the auditory cortex that was previously shown in another experiment to be activated when people are perceiving speech.
Study Dual-Stream model of speech perception
fMRI response was measured in 2 conditions: telling a story (production condition) and listening to the story (comprehension condition)
-> Several brain areas responded in both conditions => this could be due to shared mechanisms in speech production and comprehension, but could also be explained with different types of processing within these 2 regions in the different conditions
-> However, the neural responses to production and comprehension were coupled (followed the same time course), suggesting shared processing mechanisms
Speech Perception and Face Perception Study
Listeners who listened to sentences by familar speakers or unfamiliar speakers had their brain activity measured with fMRI
-> In both conditions, the superior temporal sulcus (associated with speech perception) was activated
-> When familiar speakers were listened to, the fusiform face area was also activated
What happens when speech sounds become part of a language?
Anterior + Ventral regions, mostly in left superior temporal cortex, but also in posterior superior temporal cortex