computational models of speech perception Flashcards

1
Q

challenges for lexical access

A
  • speech is a continuous stream
  • homonyms
    –> spelt and said the same but different meanings
  • homophones
    –> spelt differently, said the same, different meaning
  • co-articulation
  • different Accents
  • invariance problem
    –> problems of definition of acoustic properties (e.g. phonemes, syllables, words)
  • ambiguity in word boundaries
    –> e.g. four candles vs fork handles
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

ways to disambiguate the speech stream

A
  • categorical perception
  • voice onset time
  • perceptual learning
  • top down processing
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

categorical perception

A

ability to distinguish between sounds on a continuum based on voice onset times

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

voice onset time

A

vocal cord vibration
–> VVVVa vs FFa

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

perceptual learning

A
  • adjust categorical perception based on sounds we hear
  • seems to be hard wired
    –> babies and primates can do this
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

top down processing

A
  • e.g. “The state governors met with their respective legislatures convening in the capital city.”
  • cough disguising one sound in “legislatures” - - top down processing allows us to recognise the word, and not identify which sound was blocked by the cough
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

spreading activation

A
  • facilitates predictions of what may be coming up next via activation of items that are related to the acoustic input
  • when hearing ‘apple’
    –> appeal
    –> apron
    –> appear
  • all may be activated in the lexicon
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what lexical characteristics affect the speed of lexical access?

A
  • word length
    –> long words are slower to process
  • neighbourhood density
    –> words with lots of neighbours are processed more slowly
  • frequency
    –> more frequently a word is accessed in the lexicon, the quicker you can access it
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

lexical access is based on?

A
  • bottom Up (acoustic input)
  • top down processing (disambiguating the speech stream)
  • lexical characteristics
  • context
  • spreading activation that facilitates predictions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

mechanics of lexical access (3 options)

A
  1. gradually activate the word that matches the acoustic input
  2. activate all words that start with the same sound as the acoustic input and gradually de-activate words that no longer match the sounds
  3. gradually activate the word that matches the acoustic input more than other words
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

option 1 for lexical access

A
  • e.g. word = apricot
    –> we break up each sound
    –> ay - pri - cot
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

option 2 for lexical access

A
  • e.g. word = apricot
  • as soon as we hear ‘ay’ we might think of:
    –> april
    –> ape
    –> apricot
  • as we continue to hear the word we de-activate words that no longer match
    –> after ay - pri is heard ‘ape’ is de-activated
    –> after ay - pri - cot is heard ‘april’ is de-activated
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

option 3 for lexical access

A
  • e.g. word = apricot
  • when we hear ‘ay’
  • we might activate:
    –> ape
    –> apricot
    –> april
    –> say
    –> pay
  • as we hear more of the word, we activate the word that matches the word more than others
    –> e.g. apricot more than say
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

2 models of speech comprehension

A
  1. The Cohort Model
    –> Marslen-Wilson (1987)
    –> predicts that we access words in the lexicon via activation of all words sharing initial features and gradually de-activate words that stop matching the features (option 2)
  2. The Trace Model
    –> Elman & McClelland (1999)
    –> predicts that features activate phonemes that activate words with a gradual increase in activation of words that match all features so that the word with the most activation wins (option 3)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

The Cohort Model

A
  • we activate words in the mental lexicon (the cohort) that match the input
    –> e.g. the ‘ay’ in apricot can activate apple, apart, apricot, apex, april, ape
  • we then gradually de-activate items that fail to match the input
  • we reach a uniqueness point (we now know the word - only one word)
  • items that do not match the onset of the word (‘ay’ in apricot) are not activated
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

neighbourhood effects in The Cohort Model

A
  • words that match the acoustic input compete for activation
    –> neighbours compete with each other for recognition
  • learning the word ‘aprikol’ will slow down the recognition of the word apricot
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

frequency effects in The Cohort Model

A
  • words with high frequency have high resting states
    –> less activation required to recognise high frequency words
    –> e.g. apricot would be recognised more quickly (ms) than a low frequency word (aprikol)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

evidence of The Cohort Model

A
  • Gating experiments
    –> Ps are presented with fragments of words that gradually reveal the whole word and asked to guess what the word is after each presentation
    –> e.g. john went to the zoo and saw a ‘ca…’
    –> cap
    –> cat
    –> kangaroo
  • gradually more of ‘ca’ is told to help guess the word
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

early findings of Gating Paradigm (Grosjean, 1980)

A
  • presented the word stretcher using the paradigm
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

what do Gating experiments suggest?

A
  • recognition of a word is a gradual process that starts from word onset and continues until the end of the word
  • candidate words that no longer fit the acoustic input are eliminated
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

architecture of The Cohort Model

A
  • facilitatory signals are sent to words that match the speech input
  • inhibitory signals are sent to words that do not match the speech input
  • bottom-up processing has priority
    –> but what about when we can activate a word with missing sounds? (e.g. the legislature cough example)
22
Q

what is the phoneme restoration effect?

A
  • a sound is missing
  • part of the word is missing or blocked by noise
  • BUT we can still process the full word
  • this can’t be bottom up processing
    –> top-down processing has to be involved
23
Q

3 stages to word recognition in The Cohort Model

A
  1. access
    –> acoustic-phonetic information is mapped onto lexical items
  2. selection
    –> candidate words that mismatch the acoustic input are de-selected
  3. integration
    –> semantic and syntactic properties of the word are integrated and checked against the sentence
24
Q

the impact of context on The Cohort Model

A
  • sentence context does not influence the process of lexical access
  • lexical selection is based on activation of phonology and semantic information
  • integration is affected by sentence context
25
Q

the priming paradigm

A
  • lexical decision task
  • asked to determine if a presented word is a word or non-word
  • a prime can be presented before the target word
  • if the prime is semantically similar to the target word, we are quicker to determine the word is a word
  • prime is rapid (we don’t even see the prime)
  • related words = faster response
26
Q

cross modal priming (Zwitserlood, 1989)

A
  • prime word is auditory
  • target word is visual
  • had related/unrelated words and did the lexical decision / priming task
  • found that reaction time was faster for related words than unrelated words
27
Q

cross modal priming (Zwitserlood, 1989) - word fragments

A
  • given a word fragment
    –> e.g. ‘capt’
  • then given slave, ship and wicket as a prime
  • BOTH slave and ship should be activated in the lexicon
  • wicket should not
  • would predict to see faster response to slave and ship compared to wicket
28
Q

cross modal priming (Zwitserlood, 1989) - neutral vs bias

A
  • neutral:
    –> sentence where ‘captain’ and ‘captive’ would be fine
  • bias:
    –> sentence like ‘men had spent many years serving under their capt’ lead to activation of captain and not captive
    –> thus we would expect to see priming effect for ONLY the ship prime and NOT slave or wicket
    –> BUT STILL FOUND PRIMING FOR BOTH SLAVE AND SHIP
29
Q

cross modal priming (Zwitserlood, 1989) - complete sentence

A
  • bias sentences were then completed
  • “The men had spent many years serving under their captain”
  • NOW priming was ONLY found for ship
30
Q

1978-1997 Cohort Model predictions

A
  • items that match acoustic input but do not match sentence context are activated
  • items that match acoustic input but do not match sentence context are deactivated once the word is selected
31
Q

revised Cohort Model (1994)

A
  • context influences selection/integration of word into sentence
  • the word with semantic activation that fits the context of the sentence will be integrated into the sentence
  • semantic representation of captain is a better fit to the sentence (men served under their___) than the semantic representation of captive and helps to single out ‘captain’ as the appropriate word
32
Q

summarise The Cohort Model

A
  • speech perception is based on matching acoustic input to stored representations of words in the lexicon
  • words are recognised via a competitive process that activates a word ‘cohort’
  • cohort candidates do not actively engage with each other
  • words are identified when they reach their uniqueness point
  • cohort candidates that do not match acoustic input are eliminated
  • context does not constrain activation of initial cohorts but allows for rapid elimination of candidates that do not match sentence context
33
Q

the TRACE model

A
  • in TRACE words are recognized “incrementally by slowly ramping up the activation of the correct units at the phoneme and word levels”
  • gradual activation of item that matches the input
34
Q

competition in the TRACE model

A
  • lexical competitive inhibition
    –> inhibitory signals are sent to words that no longer match
    –> more activation/signals sent to the word that does match
35
Q

architecture of TRACE model

A
  • implemented computational model based on connectionist principles
  • processing units (nodes) correspond to mental representations of:
    1. features (voicing, manner of production)
    2. phonemes
    3. words
36
Q

architecture of TRACE model - continued

A
  • bottom-up processing
    –> each level is connected via faciliatory connections
  • activation spreads up from features to lexical items
  • top down processing
    –> facilitatory connections between levels also travel down from the lexical level to the phoneme level and the feature level
  • connections between nodes within each level are inhibitory
37
Q

differences between TRACE model and Cohort model

A
  • bottom-up and top-down processing in trace model
  • top down processing reinforces activation of the nodes selected in previous levels
    –> this accounts for the phoneme restoration effect
    –> Cohort does not
38
Q

is the TRACE model a radical activation model?

A
  • yes
  • any consistency between input and representation may result in some degree of activation
39
Q

summarise the TRACE model

A
  • nodes influence each other according to their activation levels & strengths of connections
  • activation develops as a pattern of excitation from facilitation and inhibition
  • candidate words are activated based on the pattern of activation
  • bottom up and top down processes
    –> bottom up - activation from feature to word level
    -> top down – activation from word to feature level
40
Q

Allopenna et al (1998)

A
  • using an eye tracking study demonstrated that words with overlapping phonology, that do not start with the same onset as the speech input (rhyme competitors), are activated in speech perception
    –> i.e. endings are similar
41
Q

Allopenna et al (1998) - method

A
  • participants are presented with a grid that contains images of items such as this grid with a beaker, a beetle a speaker and a pram
  • participants asked to click on the beaker and place it under the triangle
  • participants eye movements are monitored whilst they complete the task
  • if words related to beaker are active in the lexicon participants will look towards those items
42
Q

cohort model predictions for Allopenna et al (1998)

A
  • no activation for pram or speaker
  • should have activation for beetle and beaker
    –> they start with the same sound
  • should look at beaker and beetle
43
Q

TRACE model predictions for Allopenna et al (1998)

A
  • should look at beaker and beetle and speaker
  • NOT pram
  • 3 words are all similar in some way and so should be looked at
44
Q

Allopenna et al (1998)- results

A
  • participants looked at the Beaker the Beetle and the Speaker
    –> participants looked at the Beaker and the Beetle in the first 400ms after the word was heard
    –> participants also looked at the Speaker between 400-600ms after the word was heard
    –> slower but it’s there (in line with TRACE model)
45
Q

conclusions from Allopenna et al (1998)

A
  • the evidence from Allopenna et al. (1998) and others suggests that words that rhyme with sounds in any part of a word may become activated
  • the initial Cohort of words activated in response to the speech stream is not limited to words with the same onset
46
Q

top down processing

A
  • faciliatory links between words and phonemes should result in more accurate detection of phonemes in words compared to non-words
  • participants asked to detect a /t/ or /k/ in words (e.g., heighten) and non words (e.g., vinten) should find it easier to identify the /t/ in heighten compared to vinten
47
Q

Mirman et al (2008) results - top down processing

A
  • faster identification of /t/ and /k/ in real words
  • demonstrates the effect of top-down processing
48
Q

is top down processing superior?

A
  • participants were able to accurately detect phonemes in non-words that were word like (Fraudenfelder et al. 1990 - e.g. /t/ in vocabutary)
  • participants failed to complete ambiguous phonemes with a phoneme that would create a word unless stimuli were degraded (McQueen, 1991 - e.g. identifying ‘sh’ as the final phoneme for the ‘fiss’)
49
Q

TRACE vs Cohort

A
  • the TRACE model emphasises top down processing
    -the Cohort model minimises the impact of top down processing
  • the Cohort model predicts that lexical access is biased towards activation of words with shared onsets
  • the TRACE model accommodates the activate of rhyming competitors
  • the TRACE model does not provide an account of how context might affect speech perception
  • the evidence also suggests that there is a tendency to activate words that start with the same sounds
50
Q

conclusions

A
  • models of speech perception agree that we access words in the lexicon via activation of lexical representations
  • there is also agreement that activation is based on processes that involve facilitatory signals and competition
  • however the models take different routes to comprehension