computational models of speech perception Flashcards
challenges for lexical access
- speech is a continuous stream
- homonyms
–> spelt and said the same but different meanings - homophones
–> spelt differently, said the same, different meaning - co-articulation
- different Accents
- invariance problem
–> problems of definition of acoustic properties (e.g. phonemes, syllables, words) - ambiguity in word boundaries
–> e.g. four candles vs fork handles
ways to disambiguate the speech stream
- categorical perception
- voice onset time
- perceptual learning
- top down processing
categorical perception
ability to distinguish between sounds on a continuum based on voice onset times
voice onset time
vocal cord vibration
–> VVVVa vs FFa
perceptual learning
- adjust categorical perception based on sounds we hear
- seems to be hard wired
–> babies and primates can do this
top down processing
- e.g. “The state governors met with their respective legislatures convening in the capital city.”
- cough disguising one sound in “legislatures” - - top down processing allows us to recognise the word, and not identify which sound was blocked by the cough
spreading activation
- facilitates predictions of what may be coming up next via activation of items that are related to the acoustic input
- when hearing ‘apple’
–> appeal
–> apron
–> appear - all may be activated in the lexicon
what lexical characteristics affect the speed of lexical access?
- word length
–> long words are slower to process - neighbourhood density
–> words with lots of neighbours are processed more slowly - frequency
–> more frequently a word is accessed in the lexicon, the quicker you can access it
lexical access is based on?
- bottom Up (acoustic input)
- top down processing (disambiguating the speech stream)
- lexical characteristics
- context
- spreading activation that facilitates predictions
mechanics of lexical access (3 options)
- gradually activate the word that matches the acoustic input
- activate all words that start with the same sound as the acoustic input and gradually de-activate words that no longer match the sounds
- gradually activate the word that matches the acoustic input more than other words
option 1 for lexical access
- e.g. word = apricot
–> we break up each sound
–> ay - pri - cot
option 2 for lexical access
- e.g. word = apricot
- as soon as we hear ‘ay’ we might think of:
–> april
–> ape
–> apricot - as we continue to hear the word we de-activate words that no longer match
–> after ay - pri is heard ‘ape’ is de-activated
–> after ay - pri - cot is heard ‘april’ is de-activated
option 3 for lexical access
- e.g. word = apricot
- when we hear ‘ay’
- we might activate:
–> ape
–> apricot
–> april
–> say
–> pay - as we hear more of the word, we activate the word that matches the word more than others
–> e.g. apricot more than say
2 models of speech comprehension
- The Cohort Model
–> Marslen-Wilson (1987)
–> predicts that we access words in the lexicon via activation of all words sharing initial features and gradually de-activate words that stop matching the features (option 2) - The Trace Model
–> Elman & McClelland (1999)
–> predicts that features activate phonemes that activate words with a gradual increase in activation of words that match all features so that the word with the most activation wins (option 3)
The Cohort Model
- we activate words in the mental lexicon (the cohort) that match the input
–> e.g. the ‘ay’ in apricot can activate apple, apart, apricot, apex, april, ape - we then gradually de-activate items that fail to match the input
- we reach a uniqueness point (we now know the word - only one word)
- items that do not match the onset of the word (‘ay’ in apricot) are not activated
neighbourhood effects in The Cohort Model
- words that match the acoustic input compete for activation
–> neighbours compete with each other for recognition - learning the word ‘aprikol’ will slow down the recognition of the word apricot
frequency effects in The Cohort Model
- words with high frequency have high resting states
–> less activation required to recognise high frequency words
–> e.g. apricot would be recognised more quickly (ms) than a low frequency word (aprikol)
evidence of The Cohort Model
- Gating experiments
–> Ps are presented with fragments of words that gradually reveal the whole word and asked to guess what the word is after each presentation
–> e.g. john went to the zoo and saw a ‘ca…’
–> cap
–> cat
–> kangaroo - gradually more of ‘ca’ is told to help guess the word
early findings of Gating Paradigm (Grosjean, 1980)
- presented the word stretcher using the paradigm
what do Gating experiments suggest?
- recognition of a word is a gradual process that starts from word onset and continues until the end of the word
- candidate words that no longer fit the acoustic input are eliminated
architecture of The Cohort Model
- facilitatory signals are sent to words that match the speech input
- inhibitory signals are sent to words that do not match the speech input
- bottom-up processing has priority
–> but what about when we can activate a word with missing sounds? (e.g. the legislature cough example)
what is the phoneme restoration effect?
- a sound is missing
- part of the word is missing or blocked by noise
- BUT we can still process the full word
- this can’t be bottom up processing
–> top-down processing has to be involved
3 stages to word recognition in The Cohort Model
- access
–> acoustic-phonetic information is mapped onto lexical items - selection
–> candidate words that mismatch the acoustic input are de-selected - integration
–> semantic and syntactic properties of the word are integrated and checked against the sentence
the impact of context on The Cohort Model
- sentence context does not influence the process of lexical access
- lexical selection is based on activation of phonology and semantic information
- integration is affected by sentence context
the priming paradigm
- lexical decision task
- asked to determine if a presented word is a word or non-word
- a prime can be presented before the target word
- if the prime is semantically similar to the target word, we are quicker to determine the word is a word
- prime is rapid (we don’t even see the prime)
- related words = faster response
cross modal priming (Zwitserlood, 1989)
- prime word is auditory
- target word is visual
- had related/unrelated words and did the lexical decision / priming task
- found that reaction time was faster for related words than unrelated words
cross modal priming (Zwitserlood, 1989) - word fragments
- given a word fragment
–> e.g. ‘capt’ - then given slave, ship and wicket as a prime
- BOTH slave and ship should be activated in the lexicon
- wicket should not
- would predict to see faster response to slave and ship compared to wicket
cross modal priming (Zwitserlood, 1989) - neutral vs bias
- neutral:
–> sentence where ‘captain’ and ‘captive’ would be fine - bias:
–> sentence like ‘men had spent many years serving under their capt’ lead to activation of captain and not captive
–> thus we would expect to see priming effect for ONLY the ship prime and NOT slave or wicket
–> BUT STILL FOUND PRIMING FOR BOTH SLAVE AND SHIP
cross modal priming (Zwitserlood, 1989) - complete sentence
- bias sentences were then completed
- “The men had spent many years serving under their captain”
- NOW priming was ONLY found for ship
1978-1997 Cohort Model predictions
- items that match acoustic input but do not match sentence context are activated
- items that match acoustic input but do not match sentence context are deactivated once the word is selected
revised Cohort Model (1994)
- context influences selection/integration of word into sentence
- the word with semantic activation that fits the context of the sentence will be integrated into the sentence
- semantic representation of captain is a better fit to the sentence (men served under their___) than the semantic representation of captive and helps to single out ‘captain’ as the appropriate word
summarise The Cohort Model
- speech perception is based on matching acoustic input to stored representations of words in the lexicon
- words are recognised via a competitive process that activates a word ‘cohort’
- cohort candidates do not actively engage with each other
- words are identified when they reach their uniqueness point
- cohort candidates that do not match acoustic input are eliminated
- context does not constrain activation of initial cohorts but allows for rapid elimination of candidates that do not match sentence context
the TRACE model
- in TRACE words are recognized “incrementally by slowly ramping up the activation of the correct units at the phoneme and word levels”
- gradual activation of item that matches the input
competition in the TRACE model
- lexical competitive inhibition
–> inhibitory signals are sent to words that no longer match
–> more activation/signals sent to the word that does match
architecture of TRACE model
- implemented computational model based on connectionist principles
- processing units (nodes) correspond to mental representations of:
1. features (voicing, manner of production)
2. phonemes
3. words
architecture of TRACE model - continued
- bottom-up processing
–> each level is connected via faciliatory connections - activation spreads up from features to lexical items
- top down processing
–> facilitatory connections between levels also travel down from the lexical level to the phoneme level and the feature level - connections between nodes within each level are inhibitory
differences between TRACE model and Cohort model
- bottom-up and top-down processing in trace model
- top down processing reinforces activation of the nodes selected in previous levels
–> this accounts for the phoneme restoration effect
–> Cohort does not
is the TRACE model a radical activation model?
- yes
- any consistency between input and representation may result in some degree of activation
summarise the TRACE model
- nodes influence each other according to their activation levels & strengths of connections
- activation develops as a pattern of excitation from facilitation and inhibition
- candidate words are activated based on the pattern of activation
- bottom up and top down processes
–> bottom up - activation from feature to word level
-> top down – activation from word to feature level
Allopenna et al (1998)
- using an eye tracking study demonstrated that words with overlapping phonology, that do not start with the same onset as the speech input (rhyme competitors), are activated in speech perception
–> i.e. endings are similar
Allopenna et al (1998) - method
- participants are presented with a grid that contains images of items such as this grid with a beaker, a beetle a speaker and a pram
- participants asked to click on the beaker and place it under the triangle
- participants eye movements are monitored whilst they complete the task
- if words related to beaker are active in the lexicon participants will look towards those items
cohort model predictions for Allopenna et al (1998)
- no activation for pram or speaker
- should have activation for beetle and beaker
–> they start with the same sound - should look at beaker and beetle
TRACE model predictions for Allopenna et al (1998)
- should look at beaker and beetle and speaker
- NOT pram
- 3 words are all similar in some way and so should be looked at
Allopenna et al (1998)- results
- participants looked at the Beaker the Beetle and the Speaker
–> participants looked at the Beaker and the Beetle in the first 400ms after the word was heard
–> participants also looked at the Speaker between 400-600ms after the word was heard
–> slower but it’s there (in line with TRACE model)
conclusions from Allopenna et al (1998)
- the evidence from Allopenna et al. (1998) and others suggests that words that rhyme with sounds in any part of a word may become activated
- the initial Cohort of words activated in response to the speech stream is not limited to words with the same onset
top down processing
- faciliatory links between words and phonemes should result in more accurate detection of phonemes in words compared to non-words
- participants asked to detect a /t/ or /k/ in words (e.g., heighten) and non words (e.g., vinten) should find it easier to identify the /t/ in heighten compared to vinten
Mirman et al (2008) results - top down processing
- faster identification of /t/ and /k/ in real words
- demonstrates the effect of top-down processing
is top down processing superior?
- participants were able to accurately detect phonemes in non-words that were word like (Fraudenfelder et al. 1990 - e.g. /t/ in vocabutary)
- participants failed to complete ambiguous phonemes with a phoneme that would create a word unless stimuli were degraded (McQueen, 1991 - e.g. identifying ‘sh’ as the final phoneme for the ‘fiss’)
TRACE vs Cohort
- the TRACE model emphasises top down processing
-the Cohort model minimises the impact of top down processing - the Cohort model predicts that lexical access is biased towards activation of words with shared onsets
- the TRACE model accommodates the activate of rhyming competitors
- the TRACE model does not provide an account of how context might affect speech perception
- the evidence also suggests that there is a tendency to activate words that start with the same sounds
conclusions
- models of speech perception agree that we access words in the lexicon via activation of lexical representations
- there is also agreement that activation is based on processes that involve facilitatory signals and competition
- however the models take different routes to comprehension