Speech Perception Flashcards
Speech Production
- cavities/articulators: speech sounds
- vocal cords: voicing
- lungs: power source for speech
Source-filter Theory of Speech Production
Source-filter theory: the unshaped source material (sound from vocal folds) is shaped by the articulators (filter), giving rise to a sound with characteristics of both source and filter
vocal tone (from vocal cords; source) + resonatory cavitives/articulators (filter) = speech (output)
Articulation
Articulation: the approach or contact of two speech organs
- eg: tip of tongue + upper teeth for “th” in thin
Acoustics
Acoustics: the study of the physical properties of sounds, eg:
- loudness
- pitch
Frequency
Frequency: (acoustic property) described in terms of cycles per second; measured in Hertz (Hz)
Pitch
Pitch: auditory property related to frequency
Formants
Formants: overtone pitches that get emphasized by vocal tracts in a particular shape, and gives vowels their characteristic sounds
- fundamental frequency (F0) — lowest formant
- first formant (F1)
- second formant (F2)
Sound Variability
Sound variability: no one-to-one correspondence between acoustic signal and sound perceived
Sound varies:
1) Across speakers
2) Within speakers
Sound Variability
1) Across Speakers
Sound variability across speakers:
- different speakers have different mouth sizes, shapes and vocal tracts
- individuals differ in the range of their fundamental frequency (F0)
Sound Variability
2) Within Speakers
Sound variability within speakers:
Articulatory Speed: articulators are not always in the ideal position; production therefore is not “ideal”
Coarticulation Effects: variation in the pronunciation of a phoneme caused by the articulatory properties of neighboring sounds
- eg: cat vs can
Parallel Transmission: information for segments overlap
- however, it’s difficult to know when the sound begins and ends in the speech spectrum
Sound Variability
Other Sources
Other sources of sound variability:
- foreign accents/variants
- noise
Perceptual Invariance
Perceptual Invariance (“lack of invariance”): the ability to perceive sounds that have highly variable acoustic manifestations as instances of the same sound category
Speech Perception
Interpreting sounds into categories
- our minds impose a lot of structure on speech sounds (as a result of learning)
- we mentally group clusters of similar sounds (allophones/variants) into phonemes
- mental categories play an important role; these categories warp perception
VOT
Voice Onset Time (VOT): the time difference between the release of the stop and the onset of vibration
Speech Perception Tasks
Tasks for Speech Perception
1) Forced Choice Identification Task
2) ABX Discrimination Task
Speech Perception Tasks
1) Forced Choice Identification Task
Forced Choice Identification Task
- asks the participants to label the stimuli
- eg: “what is this sound?” (di // ti ?)
Speech Perception Tasks
2) ABX Discrimination Task
ABX Discrimination Task
- present two different stimuli (A and B), one of which is a control sample, the other a modified sample
- eg: “di” and “ti”
- present a third stimulus (X)
- participants must decide whether stimulus X is more representative of A or B
Categorical Perception
Categorical Perception: perception of continous changes in a stimulus as having a sharp break between discrete categories
- despite the variability, you interpret the gradience of sounds into categories (with sharp boundaries)
Speech Perception Cues
(1) Visual Cues
McGurk effect:
- an audio-visual illusion that illustrates how perceivers merge info for speech sounds across the senses
- eg: VIDEO “ga” + RECORDING “ba” → PERCEPTION is “da”
- we integrate information from visual cues
visual cues are a bottom-up process :
- interpretation
- ↑
- sensory input
Speech Perception Cues
(2) Contextual Effects:
Ganong Effect
Ganong effect
- the impact of lexical knowledge on auditory perception of words when stimuli are acoustically ambiguous
- the context where the sound is affects speech perception
- the perception of phonetically-ambiguous sounds can be biased by the likelihood of the resulting word-form being a lexical item or not
context cues are a top-down process
- knowledge
- ↓
- interpretation
Speech Perception Cues
(2) Contextual Effects:
Phoneme Restoration Effect
Phoneme Restoration Effect: under certain conditions, sounds that are actually missing from a speech signal can be restored by the brain, and may appear to be heard
- “hallucinating effects” on sounds
- non-speech sound replaces a speech sound
- eg : /s/ in “legi[^]lature”
context cues are a top-down process
- knowledge
- ↓
- interpretation