Speech Flashcards
Vowels are __________ airflow
unobstructed
Consonants are ________ airflow
obstructed
Three dimensions of consonants:
Place - bilabial, labio-dental, dental, alveolar, palatal, velar, glottal
Manner - stops, fricatives, affricatives, nasals, liquids, glides
Voicing
Spectrograms are:
“Visual speech”
Three components of a spectrogram:
- Frequency of the acoustic signal - speech sounds consist of several frequencies (y-axis)
- Time - all speech signals have a temporal aspect (x-axis)
- Intensity - darkness or color (3D aspect)
Interesting properties of the speech signal
Parallel transmission
Segmentation problem
What is parallel transmission?
Phonemes are encoded at the same time, no breaks between phonemes
What is the segmentation problem?
It’s acoustically hard to tell where words begin and end; but, we have no problem perceiving words (we can hear words in our language even if people talk fast)
What is the lack of invariance problem?
There is no one-to-one correspondence between the acoustic cues and the phonemes perceived
What is the psychological definition of a phoneme?
A category of sounds that we perceive to be the same sound
What are sources of variability in speech?
Coarticulation - related to parallel transmission; overlapping articulation of phonemes, how we say a sound is affected by what comes before and after it
Variability between speakers - gender, pitch, accent, speed, age
Variability within speakers - people are sloppy speakers
What is the original McGurk effect?
You see a speaker articulating /ga/, hear /ba/ over headphones, but perceive the speaker saying /da/
The McGurk effect provides strong evidence for __________
the motor theory of speech perception
Perception is a compromise between:
what is heard and what is seen
Motor theory of speech perception
We use our knowledge of production to understand speech
Addresses the lack of invariance problem - perception is based on articulatory information and not just the signal
Cohort model
Three stages:
- select a set of candidates - based on phonetic information (bottom-up)
- narrow set based on more information and other variables (recognition point - the point at which a word is unique - can be auditory or visual)
- the item is fit into the context
TRACE model
Connectionist model
Words are represented across different levels (words, phonemes, features)
all these levels interact with each other
Top-down effects
Context effects Illusions Phonemic restoration Verbal transformation Sinewave speech Backward speech
What are context effects?
Have people wear headphones and then present a noise - asked people to identify as many words as possible
Presented in noise: some sentences syntactically and semantically correct, syntactically correct but not semantically correct, and syntactically and semantically incorrect
People did the best when the sentences were syntactically and semantically correct and the worst when both were incorrect
What is phonemic restoration?
People hear the phoneme that is removed from a word if it is replaced by something - people restore it
Multiple restorations are possible
Prosody
the music of language
Prosodic factors
Affect the overall utterance meaning
Stress, intonation, tone, rate/length, pausing