speech perception Flashcards
challenges of speech perception
- no clear gaps between words
- co-articulation: acoustic realisation on speech depends on what you’ve just said and what you are about to say > the same words can come out differently each time
also pronunciation varies from speaker to speaker, accents, etc
how do we produce speech
- lungs push air up the trachea
- which vibrate the vocal cords in the larynx
- sounds from the vocal cords are then shaped by the supraryngeal vocal tract
labial consonants
lips used/touch
alveolar consonants
tongue touches behind teeth
velar consonants
tongue toches back of mount
stop
air flow stops completely
voice
when you say them vocal cords vibrate
unvoiced
no vibration
fricative
constriction does not happen completely, friction involved
nasal
airflow redirected to nasal cavity
sound waves
- periodic displacement of air molecules, creating increases and decreases in air pressure
- when we plot changes of sound pressure over time
- molecules come closer or further apart - inc and decr pressure
–> forming waveforms
spectograms
split sound into different frequencies at each moment
amplitude: indicated by colour
- splits info into different frequency channels - depicts info that the brain gets
source and filter theory
source: the vibrations of the vocal cords
filter: superlaryngeal vocal tract structure that shapes the sound produced by the source
source only
can maybe interpret whether sound is a question? or a statement.
the gender, happy or sad
source AND filter
- intelligible speech
- filter (supralaryngeal vocal tract, lips, teeth) important for sounds - PHONEMES
- filtering appears as band of energy at certain frequencies in spectrograms Called FORMANTS
lowest three formants
F1 F2 F3
these are important cues for identifying vowels
- brain can know which vowel it is hearing by detecting these auditory CUES
formants for vowels
F1 F2 F3
formants for consonants
F2 F3
CATERGORICAL PERCEPTION
DEMONSTRATION
demonstrated:
continuum of sounds ‘ba’
one end: one sound
other end: another sound ‘da’
middle: sound that is ambiguous between the two cues
- task: where they heard each sound
1st signature of categorical perception = PHONEME BOUNDARY - where ps are equally likely to respond ba as da
1st signature of categorical perception
Phoneme boundary: where participants are equally likely to respond ‘ba’ than ‘da’
2nd signature of categorical perception
- discrimination peak near the phoning boundary
CATEGORICAL PERCEPTION
the tendency to perceive gradual sensory changes in a discrete fashion
3 hallmarks of categorical perception
- abrupt change in identification at phoneme boundary
- discrimination peak at phoneme boundary
- discrimination predicted from identification (only sound ‘different’ if different phoneme
context affects
- speech perception depends on prior knowledge and contexts
‘McGurk effect’: lipreading with different sound - what we hear is changed by what we see
McGurk effect
‘McGurk effect’: lipreading with different sound - what we hear is changed by what we see
Ganong effect
- continuum paradigm
- use the same sound but tell people its ‘giss’ to ‘kiss’ or ‘gift’ to ‘ kift’
bias to k and g changes for the one that is the real word
motor theory of speech perception
LIBERMAN
Component 1
- speech perception = result of specialised speech module that operates separately from the mechanisms involved in perceived non-speech sounds
AND is UNIQUELY HUMAN
EVIDENCE: Speech and not other sounds are perceived categorically e.g. yanny OR laurel - not both
basically be proven wrong
motor theory of speech perception
LIBERMAN
Component 2
- The objects of speech perception are intended articulatory events rather than acoustic events
EVIDENCE: speech sounds are highly variable
we are interpreting gestures rather than sounds
may still be right
motor theory fMRI evidence FOR
task: listen to meaningless monosyllables
outcome: auditory cortex activated (audio is being processed)
BUT
motor and premotor areas are also activated which is evidence for us interpreting sound gesturally
motor theory TMS evidence FOR
- TMS over premotor areas interferes with phoneme discrimination in noise but not colour discrimination
MOTOR AREAS ARE CAUSALLY INVOLVED IN SPEECH PERCEPTION
motor theory evidence AGAINST
- categorical perception can also be demonstrated for non-speech sounds (e.g. musical intervals)
> so not a result of a specialised speech module
- with training chinchillas show the same phoneme boundary for da/ta continuum as humans > not uniquely human
classic model of brain basis of speech perception
- superior temporal gyrus for speech perception (Wernicke’s area)
- inferior frontal gyrus for speech production (Broca’s area)
- left hemisphere dominant
more up to date model of brain basis of speech perception: dorsal and ventral streams
- 2 streams for speech processing that are engaged in a task dependent manner
dorsal stream: mapping speech sounds onto articulatory representations - activated for tasks focusing on perception of speech sounds - e.g. phoneme perception
ventral stream:
mapping speech sounds onto lexical representations - activated for tasks focussing on comprehension e.g. word recognition
- can explain why some aphasics can’t tell apart phonemes but can recognise words and vice versa
dorsal stream - brain basis or speech perception
- mapping speech sounds onto articulatory representations
- activated for tasks focusing on perception of speech sounds - e.g. phoneme perception
- left hemisphere dominant
- Broca’s area = involved in perception NOT JUST PRODUCTION
ventral stream - brain basis or speech perception
- mapping speech sounds onto lexical representations
- activated for tasks focussing on comprehension e.g. word recognition
- bilateral - left AND right hemispheres
evidence for ventral stream processing
- anterior temporal damage associated with semantic impairment (ventral)
inferior temporal damage associated with comprehension deficits (ventral)
evidence for dorsal stream processing
- listening to syllables activates motor and premotor areas (dorsal)
- TMS over premotor areas interferes with phoneme discrimination in noise but nor colour discrimination (dorsal)
process of recognising spoken words: cohort model
- set of word representations in your mind - what words should sound like - lexicum
- if you hear ‘c’ all words in lexicum that start with the sound ‘c’
- as time goes on and you hear more and more of the word - less potential words are activated
until… UNIQUENESS POINT:
time-point in the speech when inly one word become consistent with the speech input > word is recognised at UP even before whole word is produced
uniqueness point
time-point in the speech when inly one word become consistent with the speech input > word is recognised at UP even before whole word is produced = optimal efficiency
key features of cohort model
- words are activated immediately upon minimal input
- multiple words are activated
- words compete for recognition - lexical competition
cohort model: evidence from shadowing task
- average response latency was 250ms
- average duration of words was 375ms
- recognising at uniqueness point
limitations of cohort model
- verbal model
- so hard to evaluate
- solution: a computer model
TRACE computer model of speech perception
words
phonemes
acoustic features
- connections between levels are bi-directional and excitatory (TOP-DOWN EFFECTS)
- connections within levels are inhibitory producing competition between alternatives
TRACE model from eye tracking
- task: take one item and move it to a different location
as more of spoken word of object is revealed, the more they start to look at the object and even rhyming/similar named objects
can put this into a computer - similar results mean human trials are doing a good job at modelling dynamics
TRACE and context
for the gift kiss kift giss
- if you hear a sound in context of the rest of the word - you are biased to that because you recognise it thats why if told ist- gist is preferred and i told -iss kiss is preferred