lecture 4 - speech perception Flashcards
comprehension skills =
reading (visual word recognition)
speech perception harder than reading because…
greater memory demands
ambiguous signal
harder to segment
transitory (not permanent)
categorical perception
an abrupt boundary between categorisation of 2 phonemes and stimuli that intermediate between the 2
decoding =
extracting discrete elements (phonemes or sounds)
5 stages of speech perception
1) decoding
2) phoneme/syllable identification
3) word identification
4) interpretation
5) meaning of current sentence is integrated with preceding speech to construct overall message
Mattys et al: 2 types of adverse conditions =
1) energetic masking = distracting sounds cause intelligibility to be degraded (other voices/noise)
2) informational masking = cog load makes speech perception harder (affects top-down processing)
co-articulation
pronunciation of phoneme depends on preceding and following phonemes e.g. ‘bill, ‘rub’ - increases variability in signal BUT allows you to predict next phoneme
more co-articulation within words than between them
assimilation
phonemes take on acoustic properties of neighbouring phonemes
stress
in English, initial syllable of most words is stressed - strings of words without initial stress are misperceived
Mattys et al’s Hierarchical approach to segmentation: 3 categories of cues
1) lexical (syntax, word knowledge or semantics) - optimal interpretative conditions
2) segmental (e.g. co-articulation) - poor lexical info
3) metrical prosody (e.g. word stress) - poor segmental info
(LSM - i love matty so much)
McGurk effect
when watching someone say /ga/ with sound /ba/ we hear /da/
multimodal perception - relying on speech - triggered by automatic bottom-up proces triggered by discrepant visual and auditory signals
effect is stronger when crucial word is presented in semantically congruent way
2 extreme positions of context effects
‘interactionist account’ = context affects processing at early stage & influences word perception
‘autonomous account’ = context affects later processing - can only contribute to evaluation and integration of lexical processing not its generation
phonemic restoration effect (context effect)
evidence that sentence context can influence phoneme perception e.g. phoneme replaced with a cough - perception of that word was influenced by the sentence it was in
Ganong effect (AKA lexical identification shift) (context effect)
tendency to perceive ambiguous sound as a phoneme that would complete a real word rather than completing nonsense words
TRACE model (McClelland & Elman)
bottom-up and top-down pricessing interact flexibly in word recognition - all sources of info are used at same time
TRACE model assumptions:
1) there are prcessing units/nodes at 3 different levels:
1) features (voicing/manner of production); 2) phonemes; 3) words
feature words are connected to phoneme nodes which are connected to word nodes - connection between nodes operate in both directions and are always facilitatory
connections between nodes at same level = inhibitory & bidirectional (once a unit is activated it inhibits its competitors)
nodes influence each other in the strength of their interconnections
TRACE model: process
as excitation and inhibition spread across nodes, a pattern of activation develops
the word identiied = determined by activation level of possible candidate words
TRACE model: bottom-up and top-down activation
bottom-up activation = proceeds upwards from feature level to phoneme to word level
top-down = from words to features
activation from word level to phoneme level would facilitate phoneme detection =
word superiority effect (evidence for top-down processing)
TRACE model can explain Ganong effect
top-down activation from word level is responsible
TRACE model can explain categorical perception
discrimination boundary becomes sharper because of mutual inhibition between phoneme units at phoneme level - one phoneme becomes increasingly activated whilst others become inhibited (evidence for top-down processing)
TRACE model criticisms
1) attaches excessive importance to top-down processes (mispronunications have strong negative effect)
2) may be too flexible to be tested
3) model was tested only on small set of short words and is yet to be tested on more complex vocabularies
TRACE model support
1) copes well with noisy input (emphasis on top-down)
2) accounts for:
Categorical speech perception
Lexical identification shift
Word superiority effect in phoneme naming
Word frequency effect
Cohort model (Marslen-Wilson & Taylor) - 3 stages
focus more on bottom-up processes
3 stages:
1) access stage (all words conforming to sound sequence so far become active = the cohort)
2) selection stage (words are eliminated if cease to match further info from presented word or inconsistent with context)
3) integration stage (semnatic and syntactic properties of chosen word are used to integrate it into sentence
uniqueness point (cohort model)
point when only one word is consistent with accoustic signal
parallel interactive processing (cohort model) =
various sources of info are processed in parallel, allowing for their interaction
isolation point (cohort model) =
point in a word where proportion of listeners identify word correctly but may not be confident about it (typically earlier than uniqueness point - biased by context)
lexical access (cohort model) =
point at which all info about a word becomes available following recognition
integration (cohort model) =
at start of comprehension process, proper, semantic and syntactic proerties are integtrated into higher level sentence representation
revised cohort model:
words vary in their activation
original = context influences selction stage; revised = context only influences later integration stage (greater emphasis on bottom-up)
Word-initial cohort may include words with similar (rather than exact) matches to the initial phoneme
shadowing task (cohort model) =
pps listen to speech containing distortions & repeat it back
50% of the time participants repeat back as it should be, without the distortion
Most frequent when distortion was slight, on final syllable and word predictable from context
Listening for mispronunciations task (cohort model) =
pps listen to speech where a sound is distorted & detect changes
Participants are more sensitive to changes to the beginning of the words (e.g., poot – boot)
common ground and egocentric heuristic
we make more use of common ground when talking to people we are familiar with than listening to strangers