Introduction to speech and word recognition Flashcards
Speech segmentation
the process of breaking continuous speech into individual words or meaningful units
Challenges in Speech Segmentation
A lack of clear pauses in speech.
Homophones and ambiguity - words can sound the same but have different meanings.
Variability in speech
How does speech vary in speakers?
- Variability in voice - accent, speech rate
- Variability in how clear the speech is - outside noise
- Variability in how carefully words are pronounced; some speakers are lazy and won’t say the word in full.
- Variability in how phonemes sound around words.
assimilation
Variability in how phonemes sound. They differ on the surrounding phonemes.
How do we segment words?
- using pauses
- stress patterns
- phonotactics
- prosodic cues
stress patterns
In English the beginning of words are often emphasised / stressed
phonotactics
some speech sounds can only occur at some parts within a word. E.g., /nd/ is allowed at end (‘end’), but not in onset (‘nde’ is illegal).
prosodic cues
syllables at the start of the word are longer than medial syllables, the length of initial syllable influenced by world length ( longer when part of a longer word)
Stages of spoken word recognition
Activation ~ selection ~ integration
what are the two pathways of lexical activation?
serial or parallel processing
describe serial processing of language
Is it after we have heard or seen the whole word? ~ serial processing, find exact match then retrieve meaning
describe parallel processing
Do we try guess the word as quickly as we can? ~ parallel processing, Identify first sounds/letters and look for (partial) matches, Modify shortlist as more input comes in, incremental multiple process at once.
evaluate serial
+ Accuracy
- slow and word endings aren’t marked clearly when to stop?Accuracy
evaluate parallel
+ Fast and don’t have to wait for ‘snow’ when you hear ‘sn’
- You might commit to wrong word and then revision would be needed
is it more likely we use serial or parallel processing of language?
parallel
What happens during the Activation stage?
Recognize phonemes (speech sounds)
Activate possible lexical candidates (words that match the sounds so far)
What happens during the Selection stage?
Choose the best-fitting word from the activated candidates
What happens during the Integration stage?
Use stored knowledge about the chosen word
Integrate it with sentence context for meaning
What is the overall process of spoken word recognition?
Spoken input → Recognize phonemes → Activate word candidates → Select the best match → Integrate into sentence meaning
what are gating studies?
Gating studies are a research method used to investigate how listeners recognize spoken words over time as they hear more of the speech signal.
how do gating studies work
Participants hear partial word fragments (e.g., “c-“, “ca-“, “cam-“).
After each fragment, they guess the word.
Researchers measure when the word is first recognized.
What are the two key measures in gating studies?
Isolation Point – The moment a listener first correctly identifies the word.
Recognition Point – The moment the listener is fully confident in their guess.
What do gating studies reveal about word recognition?
Words can be recognized before they are fully heard.
Context speeds up recognition.
Frequent words are recognized faster than rare ones.
Words with many similar-sounding competitors take longer to recognize.
How does context affect gating studies?
Words in sentences are recognized earlier than isolated words because listeners use context clues to predict meaning.
isolation point
listener chooses target word, but with little confidence
recognition point
Listener recognises target word with confidence
uniqueness point
THEORETICALLY derived point, target word is only possibility
How does knowledge of words aid speech segmentation?
Listeners use existing word knowledge to identify word boundaries.
Example: speechsoundstendtoruntogether → Recognizing “speech” as a word helps break it apart.
How do listeners determine word boundaries in continuous speech?
Listeners rule out unlikely words to make sense of speech.
Spee → Not a word
Speech → Word ✅
Speechs → Not a word
When does context become more important in word recognition?
In noisy environments (e.g., a loud café)
With unfamiliar accents (e.g., “bear” vs. “beer”)
When the speech signal is unclear
Why is context crucial in phoneme and word recognition?
Context allows listeners to:
Predict upcoming words
Resolve ambiguities
Compensate for unclear speech
What are the two main models of spoken word recognition?
Autonomous (Independent) Model
Interactive Model
How does the Autonomous Model explain word recognition?
Bottom-up processing: Speech sounds are recognized first.
Context is only used after phoneme and word recognition.
Example: “s s s snow” → Recognize “snow” → Then activate related concepts (e.g., winter, polar bears).
How does the Interactive Model explain word recognition?
Top-down & bottom-up processing: Context and phonemes interact.
Listeners use prior knowledge to predict and interpret sounds.
Example: “s s s” in a winter discussion → Predict “snow” before full word is spoken.
What is an example of top-down processing in spoken word recognition?
Hearing an unclear word in background noise but recognizing it based on context.
Example: “b–r” in a bar or zoo setting → “beer” or “bear” depending on context.
Participants were asked to say where they heard the cough/buzz and whether that sound replaced the phoneme (Warren, 1970). - what did they find?
- Participants indicated that all phonemes were heard and misplaced the cough/buzz
- People ‘hear’ the full word despite cough/buzz
phoneme restoration effect?
listeners “fill in” missing sounds in speech, often replacing a sound with noise, but still perceiving the original word as if the sound were present
Gangong effect
phenomenon in speech perception where listeners are influenced by their knowledge of words when hearing ambiguous sounds. This is a top down process.
McGurk effect
an audiovisual illusion where the brain perceives a sound different from the one heard, due to conflicting visual and auditory information, leading to a fused percept.
a words recognised faster or slower within context?
faster when they appear in a meaningful context
“Camel” (No Context)
Takes the longest time to recognize because there’s no surrounding information to help.
“The kids rode on a camel”
Faster recognition because the sentence structure gives some expectation about the word.
“At the zoo, the kids rode on a camel”
(Long Context) → Fastest recognition because the setting (“At the zoo”) makes “camel” even more predictable.
no context long syllable length
longest to recognise
do long or short syllable lengths quicker to recognise?
short
why do we recognise words quicker in sentences/context ?
This is because of interactive processing, where context and prior knowledge play a crucial role in word recognition. If you’re reading a sentence and come across an ambiguous word (like “bat”), your brain can use the surrounding words and context to help figure out which meaning of “bat” is being used (the flying mammal or the sports equipment). Context helps you process the word more quickly and accurately because it reduces uncertainty and guides you toward the correct interpretation.