Week 3-Language Part 2 Flashcards
Define Speech Processing
A process of progressively extracting invariant, discrete representations from a variable, continuous input.
What are 2 features of a speech signal? and how can they be an issue with regards to speech processing?
- Continuous. Distributed in time. Fast-fading Words are not neatly segmented so (e.g., by pauses i.e., when it starts and ends) so we cannot resample what was said. Consecutive speech sounds blend into each other due to mechanical constraints on articulators.
- Variable Speaker differences; pitch affected by age and sex; different accents, talking speeds, often heard in noise.
Give a simple definition of the word segmentation problem
When do words start and end?
Define Word Segmentation (Cutler & Norris, 1988)
The rhythmic structure of English is stress- timed (some syllables are emphasised)
LEttuce TROUsers CiGAR
What is the Metrical Segmentation Strategy (MSS)?
In English, stressed (strong) syllables are likely onsets of words. Continuous speech is
segmented at stressed syllables. Cutler & Norris (1988)
What are stressed syllables?
Full vowels e.g., LEttuce
What are unstressed syllables?
Reduced vowels e.g., beHIND
What are content words?
Nouns, verbs and adjectives
What are Grammatical words?
Articles, pronouns, prepositions and conjunctions
What evidence is there in favour for the MSS? (Cutler & Carter, 1987)
The 74% of stressed syllables in English
corresponds to sole or initial syllable of a content word. This is not the case
for unstressed syllables - only 5% corresponds to content words
Why is the Metrical segmentation strategy not infallible?
■ Because it is a strategy (* examples of incorrectly
segmented words with complex syllabic structure).
-Listeners need other source of information to segment successfully.
Alert *lert
Assassinate *sassinate
■ MSS is language specific. Other languages may use different strategies.
■ It solves the child’s paradox (how could a child segment a word, if the child does not know the word?)
What is the Hierarchy of segmentation cues? (Mattys et al., 2005)
Tier 1: Lexical
-Lexical
-Sentential context (pragmatics, syntax, semantics) –> Lexical knowledge
-Interpretive conditions are optimal
Tier 2: Segmental
-Sub-lexical
-Phonotactics Acoustic-phonetics (coarticulation, allophony)
-Interpretive conditions contain poor lexical information
Tier 3: Metrical Prosody
-Sub-lexical
-Word stress
-Interpretive conditions contain poor segmental information
What is Lexical Selection?
■ Segmented stream is the input for lexical selection.
– searching process that determines the best fit in our mental lexicon between the input and the abstract lexical representations.
– fast: it starts as soon as there is some information about the word, and can finish before the word has been fully pronounced.
■ Words in context can be recognised within 175-200ms of their onset, or when only a part of their acoustic content has been
presented.
What evidence is there for Lexical Selection from Shadowing? (Marslen-Wilson, 1975)
Task: Participants hear a sentence and they repeat aloud what they heard.
Results: Participants corrected the words (such as mispronunciations) when repeating them.
The corrections occurred before the incorrect word was presented in full.
We are fast in recognising words - not mere repetition of sounds but they access known lexical representations
What evidence is there for Lexical Selection from gating? (Tyler & Wessels, 1983)
Task: Participants are given a word to listen. The word is chopped in different fragments/gates of different durations.
The gates start from the beginning of the word, and become increasingly larger (e.g., +25ms every time).
The task is to say what is the word.
-Listeners consider multiple word candidates that are consistent with the incoming speech.
-Listeners can also recognise a word if it has a uniqueness point.
What is the Cohort model? (Marslen-Wilson & Welsh, 1978)
A word is recognised at the point where it is the only word still consistent with the input (Recognition Point).
Optimally efficient system: maximally effective use of incoming signal, a word will be
recognized as soon as the info is available to differentiate it from competitors, even before the end of the word.
What are the 3 stages of the Cohort model? (Marslen-Wilson & Welsh, 1978)
- Access: activation of initial set of candidates based on word-initial cohort.
- Selection: words that mismatch the incoming signal removed from the cohort.
- Integration: their syntactic and semantic properties integrated with the context.
What does the Cohort model suggest?
The Cohort model suggests that word onsets, i.e., information that we have at the very beginning helps to set up the search and the cohort.
This is consistent with evidence showing that word onsets are particularly salient (Cole
and Jakimik, 1980) but information coming at later points can also activate lexical entries.
What evidence from Allopenna et al. (1998) supports the salience of word onsets and additional late activation?
Task: Pick up the beaker;
put it below the diamond
HOWEVER their are other objects which are there to act as a distractor (BEetle), prime destructor (speakER) and a baseline (pram - no overlap of sounds here).
Measure: Eye-movements (interested in the proportions of fixations)
Results: Rhymes compete! people got distracted by beetle and speaker at some point. This resulted in modifications of the Cohort Model
This is helpful since word boundaries are not always reliably detected and word onset may not be available (e.g., because of noise)
What is meant by Access to meaning? (Swinney, 1979)
■ Ambiguous context
■ “Rumour had it that, for years, the government building had been plagued with problems. The man was not surprised when he found bugs in the corner of his room.”
■ Disambiguating context
■ “Rumour had it that, for years, the government building had been plagued with problems. The man was not surprised when he found several spiders, cockroaches and other bugs in the corner of
his room.”
Swinney (1979):
Task: lexical decision
on the target word
Presentation point of the written target: early or late (200ms later)
ANT
(related to dominant meaning)
SPY
(related to non- dominant meaning)
SEW
(unrelated control)
What was Swinney’s (1979) findings?
■ Different meanings are initially activated: contextual info is not used to determine which words are considered for recognition
■ Contextual information is critical for the selection of the appropriate meaning amongst the activated alternatives.
*significantly faster lexical
decision, compared to the
response to the unrelated target word (sew)
What procedure did Marslen Wilson, Brown, & Tyler (1988) do for Context effects (in monitoring)?
Task: Listening to sentences & monitoring for specific words
Results:
1. Word in isolation: guitar ~300ms
2. Normal: The boy held the guitar. ~240ms (quicker due to contextual information).
3. Pragmatic Anomalous: The boy buried the guitar. ~268ms
4. Semantic Anomalous: The boy drank the guitar. ~291ms
5. Categorical Anomalous: The boy slept the guitar. ~320ms
-Multiple types of contextual information are integrated during spoken word recognition
Where does speech processing occur in the brain?
Bilateral activity in Heschl’s gyrus (low level processing), STG (superior temporal gyrus) and MTG (medial temporal gyrus) for simple mapping of sound to meaning
Left-lateralised activation in the dorsal stream, and especially left
IFG (inferior frontal gyrus), for more complex spoken inputs (e.g., sentences)
What is the evolutionary context for speech processing in the brain? (Gil-da-costa et al., 2006)
Non-human primates also communicate by exchanging meaningful
calls
This triggers comparable bilateral activity in the brain of a macaque
Suggests evolutionary continuity of the bilateral system that supports mapping from sound to meaning (ventral stream)
Strong dorsal connections between temporal and frontal areas in the left hemisphere are unique to humans (important for syntax).