W3: speech perception, production and errors Flashcards
What is speech?
A continuous stream of sound - no gaps between words yet we can still understand and link to meaning
Explain phonemes in speech?
They sound different in different contexts
Can vary due to: Loudness, excitement and the sounds of the words around it
What is the segmentation problem?
There are no clear boundaries between words - they blur and things can come out sounding very different
E.g. Isle of view - I love you
Mister Abbot - Mr Rabbit
How do we solve the segmentation problem?
Possible word constraint: we like to segment speech so that it maps onto whole, possible words
Meaning constraints: we also prefer the mapping to make sense
E.g. aspirin says Diana says - as princess Diana says
What is the invariance problem?
Phonemes are not always pronounced/perceived the same way, in every context
They vary according to: The surrounding sounds Speech speed Speaker accent Speech formality
Explain assimilation of the invariance problem
Sounds take on some of their neighbors properties
- ‘o’ is not normally a nasal sound but nasalised (air comes out of your nose) when with nasal consonants (song, gone)
Explain co-articulation effects of the invariance problem
Sounds can be produced more quickly/easily (modified to fit with the next sound along)
The listener gets a clue as to what sound is coming next
Co-articulating - producing together
E.g. Thompson (the p came from the noise made in between thom and son until it was eventually spelt like this)
Explain allophones in the invariance problem
Phonemes that are pronounced slightly differently but do not contribute to differences in meaning
Explain the categorical perception of phonemes
Despite many variations in phonemes, we only ever perceive speech sounds as one phoneme or the other
How do we categorise stop consonants?
It depends on the voice onset time (VOT)
Some consonants you can say for as long as you like (mmmm, ssss) because there is still air coming out - but there are some where you completely stop the air like (p, b, d, t)
When you produce a stop consonant, what is the VOT?
The time between the ‘burst’ which is when you let the air come out and when you start voicing (start to move your vocal cords)
VOT timing is the only difference between perceiving..
p or b
t or d
k or g
The only difference between these sounds is when you start moving your vocal cords after you have released the air
Explain context in understanding words
Single words alone are much harder to understand than if they were in a sentence - especially if there is a noisy background
The perception of words in speech is influenced by higher-level knowledge of semantics and syntax
Top-down processing in the understanding of spoken words?
Our context and expectations help us to understand the words - helps to decode words
Explain a study on how top-down processing helps to restore phonemes
Warren and Warren
- Presented coughed over the word to participants, but each time in a different sentence
E.g. It was found that the *eel was on the shoe
it was found that the *eel was on the orange
- People didn’t realise the phoneme was missing, they restored the phonemes in their mind - they thought they heard it
What is the process you go through for every word you hear?
‘Can’ - have sensory input to your lexicon
Have a selection phase in your lexicon - ‘can’ is activated in your lexicon more than other similar words
Recognise the words
Lexical access - what does ‘can’ mean?
Then integrate it into sentence
What are the 4 models of speech perception?
- Template matching
- Analysis by synthesis
- Cohort model
- TRACE model
What is the template matching model? are there any critiques?
Every word we hear is stored as a template in our lexicon
- when we hear a word we match it to our mental template
= recognition
BUT too much variation in speech for this to be plausible (so much variance in pronunciations, even accents - different versions that would need to be stored)
What is the analysis by synthesis model of speech perception? any strengths and weaknesses?
Motor theory
We interpret the speech we hear by matching it to how we would produce speech ourselves
It accounts for speaker differences unlike the template-matching model - if you hear someone say something different to us we would then just go through how we would pronounce it to understand
HOWEVER, no explanation as to how you turn articulated sounds into the heard target
If someone says something completely weird, and unexpected we can still understand it (driven by data, not hypothesis)
Is there any evidence of motor processes in speech perception?
When listening to others speak, brain imaging shows that the motor cortex is activated
What is the cohort model of speech perception?
When we hear speech we are setting up a cohort of possible words to decide what we heard
- items are eliminated from this cohort until there is only one word left which is assumed to be the word heard
What are the 3 stages of the cohort model of speech perception?
Access stage: when you hear a word it activates a set of words (cohort)
Selection stage: one item chosen from the cohort
Integration stage: words syntactic and semantic properties are used to integrate the word into sentence
What is the TRACE model of speech perception?
Connectionist model - interactive
Context of word and sentence can facilitate the perception of individual sounds
Processing occurs through excitatory and inhibitory connections between processing units called nodes
NODES:
- each node has a resting level and a threshold so when you perceive distinctive features, phonemes and whole words, nodes become activated
- If it gets above the threshold, it is considered for matching the input and may excite or inhibit other nodes
Which of the speech recognition models focuses on matching speech signals to phonetic segments?
Analysis by synthesis - motor theory
Which of the speech recognition models describe the recognition of auditory words?
Cohort model
TRACE model
What is the subglottal system and what is its role?
Providing air for speech
Lungs and associated muscles
Trachea
What is the role of the larynx in speech production?
Used for phonation, making sounds
What is the role of the vocal tract in speech production?
The moving and non-moving parts determine how sounds are articulated
Explain the production of vowels and consonants?
Vowels: don’t obstruct the passage of air
Consonants: do obstruct the passage of air
What are pure vowels?
They have just one vowel sound
Bit, bat, bet, bomb, boot
What are diphthongs?
Two vowels produces in a smooth glide that moves from one vowel to another
Bake, bike
What 3 things can you vary to produce vowels?
Tongue height - raised or lowered
Tongue position - front or back (most involved part)
Lip position - rounded or unrounded
Explain the phonation process
Vocal folds open and close while puffs of air flow through the oral cavity
Voiced/phonated sounds are produced by this vocal fold vibration and unvoiced sounds are pronounced without it
Speech errors are..
Quite common
What are some types of speech errors? (2)
Spoonerisms: swapping word beginnings
Freudian slips: errors showing true thoughts
Why are speech errors interesting for linguistics?
Tells us how speech is represented in the brain
If you can make an error on it, its represented in the brain
How do you collect errors for research?
Recorded corpora (large amounts of recorded speech) record errors as they occur naturally e.g. radio audio BUT you have no control over what may be causing the errors
Experiments
researcher tries to elicit speech errors by making task difficult or stressful BUT tasks may be too artificial
What errors do we make with phonetic segments?
Voicing and nasality
What phoneme errors do we make?
Anticipation errors: Later sounds come earlier than intended
Perservation errors: sounds produced earlier reappear later on
Exchange errors: units of varying sizes change places (your shrine spinks)
Explain Garrett’s model of speech production
Conceptualisation: pre-verbal message
Formulation: translate it into linguistic form (words you need, what order they need to go in:syntax, sounds)
Execute: detailed phonetic and articulatory planning - articulation (say it)
What does garrett’s model suggest are the two stages of syntactic planning
Functional level:
- content words are selected
- specify meaning
- words assigned to syntactic roles
Positional level:
- function words are selected (a, the)
- words are put in the correct order
Sentence frames?
We specify a syntactic frame for our sentence - with slots for content words
What provides evidence towards the two levels of syntactic planning (functional and positional levels)?
Words only get exchanged if they’re involved at the same level of processing
- Content and function words never change places
- words can change across phrases but always keep their syntactic class (noun remains a noun)
- Sounds only change places over small distances
What shows that speech production isn’t serial as Garrett said?
Word-blend errors
2 words simultaneously retrieved from the lexicon - suggest happening in parallel
E.g. I expect - I suppose= I expose
Phrase-blend errors
Whole phrases seem simultaneously activated and cross over where they seem the most alike
‘I am making a cup of tea’ & ‘I am putting the kettle on’
= ‘I am making the kettle on’
Have activated and planned to and the ‘ing’ matches and causes a switch
Non-plan internal errors
Talking about something but thinking about something else at the same time and the thoughts intrude on the words and changes what you say
Environmental contamination
Changing individual sound in what you are saying
Trying to say: ‘it has been raining for a while’ but as you say this, someone runs towards you and you say ‘it has been running for a while’