Speech Recognition Flashcards
what are acoustic phonetics?
study of the physical properties of speech
what is sound?
a vibration that propagates as an acoustic wave
(based on the perception of its characteristics)
what is frequency?
the number of times per second a sound wave cycles from the highest to the lowest point.
what is amplitude?
height of the wave
taller the wave = louder the sound
what is a sound spectrogram?
is a visual representation of the spectrum of frequencies of sound
axis’s of sound spectrogram?
frequency of sound on vertical axis, time on horizontal axis, intensity shown by darkness
sound spectrogram formats?
dark bands (i.e., most intensity)
▪ Steady state formant (stays same over time)
▪ Formant transitions (changes over time)
problems posed for speech recognition?
- lack of invariance
- problem in speaker variability
- segmentation problem
what is lack of invariance?
no one-to-one correspondence between speech cues and perception
what is the problem in speaker variability?
People differ in production of speech sounds –across people and occasions
what is a segmentation problem?
people typically do not leave breaks between words when speaking.
what is categorical perception?
We do not discriminate sounds within a phonemic category
- ex: we classify speech sounds as one phoneme or another
modularity (revisited) and categorical perception?
- Some people have taken categorical perception as evidence for a speech perception module
- chinchillas show categorical perception
what are some speech segment strategies?
- possible word constraint: tendency to segment speech so that each segment is a possible word
- Bilingual speakers tend to use strategies that are consistent with their dominant language
what does context and speech recognition involve? when are people better at identifying words?
- people are better at identifying words when presented in sentences than when presented in isolation
- speech recognition involves bottom-up and top-down processing
what are the two views of context and speech recognition?
- Autonomous view: context has effect after lexical access
- Interactionist view: allows context to affect earlier (lower) levels of processing
context and speech recognition examples?
- When shadowing, Ps have a tendency to correct speech errors (e.g., Marslen-Wilson & Welsh)
- Phonemic restoration effect (e.g., Warren)
- Semantic & syntactic factors in speech perception (e.g., Miller & Isard)
what does it mean when shadowing, Ps have a tendency to correct speech errors?
Ps are more likely to correct when the..
- context is highly predictable (role of semantic & syntactic factors)
- presented phoneme differed from the target phoneme by fewer distinctive features (role of bottom-up processing = gar v. car)
what is the phonetic restoration effect?
the illusion that a phoneme deleted from a string of speech is actually there.
(ex: coughing replacing letter)
what are semantic & syntactic factors in speech perception?
Ps shadow word strings in varying degrees of background noise
- 3 types of word strings:
* grammatical
* anomalous
* ungrammatical
what is prosody?
tune and rhythm of speech
Prosodic factors in speech recognition?
- Stress
- speech rate
- characteristics of individual speakers
what is the mcgurk effect?
- Hear /ba/
- See /ga/
- Both –perceive /da/
*** Importance of visual & auditory information for speech perception
what are the two models of speech recognition?
- cohort
- TRACE