W3: speech perception, production and errors Flashcards
What is speech?
A continuous stream of sound - no gaps between words yet we can still understand and link to meaning
Explain phonemes in speech?
They sound different in different contexts
Can vary due to: Loudness, excitement and the sounds of the words around it
What is the segmentation problem?
There are no clear boundaries between words - they blur and things can come out sounding very different
E.g. Isle of view - I love you
Mister Abbot - Mr Rabbit
How do we solve the segmentation problem?
Possible word constraint: we like to segment speech so that it maps onto whole, possible words
Meaning constraints: we also prefer the mapping to make sense
E.g. aspirin says Diana says - as princess Diana says
What is the invariance problem?
Phonemes are not always pronounced/perceived the same way, in every context
They vary according to: The surrounding sounds Speech speed Speaker accent Speech formality
Explain assimilation of the invariance problem
Sounds take on some of their neighbors properties
- ‘o’ is not normally a nasal sound but nasalised (air comes out of your nose) when with nasal consonants (song, gone)
Explain co-articulation effects of the invariance problem
Sounds can be produced more quickly/easily (modified to fit with the next sound along)
The listener gets a clue as to what sound is coming next
Co-articulating - producing together
E.g. Thompson (the p came from the noise made in between thom and son until it was eventually spelt like this)
Explain allophones in the invariance problem
Phonemes that are pronounced slightly differently but do not contribute to differences in meaning
Explain the categorical perception of phonemes
Despite many variations in phonemes, we only ever perceive speech sounds as one phoneme or the other
How do we categorise stop consonants?
It depends on the voice onset time (VOT)
Some consonants you can say for as long as you like (mmmm, ssss) because there is still air coming out - but there are some where you completely stop the air like (p, b, d, t)
When you produce a stop consonant, what is the VOT?
The time between the ‘burst’ which is when you let the air come out and when you start voicing (start to move your vocal cords)
VOT timing is the only difference between perceiving..
p or b
t or d
k or g
The only difference between these sounds is when you start moving your vocal cords after you have released the air
Explain context in understanding words
Single words alone are much harder to understand than if they were in a sentence - especially if there is a noisy background
The perception of words in speech is influenced by higher-level knowledge of semantics and syntax
Top-down processing in the understanding of spoken words?
Our context and expectations help us to understand the words - helps to decode words
Explain a study on how top-down processing helps to restore phonemes
Warren and Warren
- Presented coughed over the word to participants, but each time in a different sentence
E.g. It was found that the *eel was on the shoe
it was found that the *eel was on the orange
- People didn’t realise the phoneme was missing, they restored the phonemes in their mind - they thought they heard it
What is the process you go through for every word you hear?
‘Can’ - have sensory input to your lexicon
Have a selection phase in your lexicon - ‘can’ is activated in your lexicon more than other similar words
Recognise the words
Lexical access - what does ‘can’ mean?
Then integrate it into sentence
What are the 4 models of speech perception?
- Template matching
- Analysis by synthesis
- Cohort model
- TRACE model
What is the template matching model? are there any critiques?
Every word we hear is stored as a template in our lexicon
- when we hear a word we match it to our mental template
= recognition
BUT too much variation in speech for this to be plausible (so much variance in pronunciations, even accents - different versions that would need to be stored)
What is the analysis by synthesis model of speech perception? any strengths and weaknesses?
Motor theory
We interpret the speech we hear by matching it to how we would produce speech ourselves
It accounts for speaker differences unlike the template-matching model - if you hear someone say something different to us we would then just go through how we would pronounce it to understand
HOWEVER, no explanation as to how you turn articulated sounds into the heard target
If someone says something completely weird, and unexpected we can still understand it (driven by data, not hypothesis)
Is there any evidence of motor processes in speech perception?
When listening to others speak, brain imaging shows that the motor cortex is activated
What is the cohort model of speech perception?
When we hear speech we are setting up a cohort of possible words to decide what we heard
- items are eliminated from this cohort until there is only one word left which is assumed to be the word heard
What are the 3 stages of the cohort model of speech perception?
Access stage: when you hear a word it activates a set of words (cohort)
Selection stage: one item chosen from the cohort
Integration stage: words syntactic and semantic properties are used to integrate the word into sentence
What is the TRACE model of speech perception?
Connectionist model - interactive
Context of word and sentence can facilitate the perception of individual sounds
Processing occurs through excitatory and inhibitory connections between processing units called nodes
NODES:
- each node has a resting level and a threshold so when you perceive distinctive features, phonemes and whole words, nodes become activated
- If it gets above the threshold, it is considered for matching the input and may excite or inhibit other nodes
Which of the speech recognition models focuses on matching speech signals to phonetic segments?
Analysis by synthesis - motor theory