Word Recognition Flashcards

1
Q

COHORT’s assumptions (per TRACE)

A
  • Uses first sound or first CV to determine words in initial candidate set (“cohort”).
  • Eliminates words from cohort as successive phonemes arrive.
    • Phoneme-to-word inhibition
    • Words can also be eliminated based on semantic content, but initial cohort is determined by acoustics.
  • Word recognition occurs when there is a single item ileft n candidate set.
  • Word recognition can influence phoneme identification after the word has been recognized.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

COHORT person

A

Marslen-Wilson, circa 1980

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Good evidence for COHORT

A

In gating, the uniqueness point usually matches the participant’s acceptance/certainty point.

Confirms COHORT’s prediction that words are recognized when just one item remains in cohort.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

COHORT’s problems (per TRACE)

A
  • Cannot cope with distortion or underspecified onsets
  • No way to recover items that were removed from cohort
    • Want to reject “present” for pleasant but accept “blacelet” for bracelet.
  • Assumes listeners always knows where a word’s onset is.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

COHORT’s three stages of word recognition

A
  1. Access (when the bottom-up perceptual input first activates lexical representations)
  2. Selection (narrowing down the activation candidate set)
  3. Integration (retrieve and integrate semantic/syntactic details)

COHORT

  1. accesses all items consistent with input and
  2. evaluates the multiple words in parallel,
  3. using information as it becomes available.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

COHORT’s segmentation strategy

A

Segmentation is implicit.

Utterance onset marks onset of first word.

Offset of each word marks onset of next word.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Controversial features of TRACE

A

Spatializing time, so lots of units are duplicated.

Assuming interactive effects between layers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Activation in TRACE

A

It’s continuous, based on how acoustic/phonetic features map onto lexical representations.

Supports partial activation rhymes because they are bottom-up matches.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Interactivity in TRACE

A
  • Top-down connections from lexical items to phonemes.
  • The top-down connections from phonemes to feature detectors are usually disabled.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Competition in TRACE

A

Temporally overlapping units in phonemic and lexical layers inhibit one one another.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How TRACE models time

A

Units in phoneme and lexical layers are repeated every few time slices.

It spatializes time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Coarticulation in TRACE

A

Input phoneme’s features are spread over 11 steps, but the centers of adjacent input phonemes are 6 steps apart.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

TRACE’S acoustic features

A
  • Acute
  • Burst
  • Consonantal
  • Diffuse
  • Power
  • Vocalic
  • Voiced

Each with nine levels of activation, each with a feature detector at every timestep.

So there would be Voice0, Voice1, …, Voice8 feature detector units at each step.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Shortlist’s main idea

A
  • At each phoneme time-step, a shortlist of matching words is generated
  • The words in the shortlists that overlap each other compete with each other via lateral inhibition
  • Separates lexical access (shortlist formation based on match scores) from competition (overlapping words across the lists have to compete with each other)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Shortlist’s scoring system

A
  • 1 point for each matching phoneme
  • -3 points for each mismatching phoneme
  • Strong mismatch penalty will keep mostly onset-matching items in the shortlist.
  • Rhymes will only appear in the list
    • when shortlist is sparse and
    • when there have been multiple matching phonemes to overcome initial mismatch.
  • There is a shortlist at each phoneme timestep, consisting of words with top match scores.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Simple recurrent networks

A
  • Learning network
  • Layers for input units, hidden units, context units, output units
  • Context units are exact copy of last time steps hidden units
  • Hidden units combine information from input and previous state.
  • Interactive in the sense that the context interacts with the input units.
    • Recurrence is self feedback
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Overtraining in SRNs

A
  • If you train an SRN until error asymptotes, it will not show rhyme effects
  • If you train until each target reaches from recognition threshold, rhyme effects will remain intact.
  • Adults learning novel neighborhoods look like these SRNs (Magnuson et al. 2003)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Localist representations

A

One unit for each word.

Competing units compete as their activation changes over time.

19
Q

Distributed representations

A
  • All items are represented by a shared set of units.
  • Competition shows up in the blend of hidden representations.
  • Makes predictions priming effects of ambiguous inputs activates a blend of competitors.
20
Q

Distributed Cohort Model

A

SRN but two output layers: phonological and lexical semantics

Distributed representation of phonological and semantic features in the hidden units

21
Q

Cross-Modal Semantic Priming

A
  • Castle-candy-sweet priming
  • Castle activates its cohort (including candy) which in turn activate semantic cohorts (including sweet).
  • Works for cohorts but not for rhymes
    • Cohorts don’t get much inhibition during start of word
    • When rhymes finally have perceptual support, the onset-cohort are strongly inhibiting them
22
Q

Naming / Repetition / Shadowing

A
  • Play a wordlike stimulus. Listener repeats word.
  • Measure accuracy and response latency.
  • Word frequency and neighborhood density influence outcome measures.
  • Unlike real recognition
    • bc it’s post-perceptual
    • bc listener may pay close attention to the speech sounds (without semantic processing).
23
Q

Lexical Decision

A
  • Play a wordlike stimulus. Listener decides whether stimulus is a word or not.
  • Measure accuracy and response latency.
  • Sensitive to
    • word frequency (more false rejects of less common words)
    • neighborhood density (probably, slower to make decision when stimulus overlaps with a lot of words).
  • Unlike real recognition bc
    • it’s a post-perceptual judgment about stimulus.
    • Also a listener can make decision w/o knowing meaning of word (I can recognize tamp but I don’t know what it means).
24
Q

Gating

A
  • Early gates have a wide variety of guess (many words activated)
  • Uniquess point: When there is only one completion of a word
  • Recognition point: When correct guessing occurs
  • Recognition can precede uniqueness, especially for higher frequeny words
25
Q

DAS similarity

A
  • Neighborhood consists of all words that differ by Deletion, Addition, or Substitution of just one phoneme
  • Treat rhymes as just as neighborly as cohorts.
26
Q

Cohort similarity

A

Neighborhood consists of all words that share first 1-2 phonemes

27
Q

Graded similarity metric

A
  • Neighborhood based on phonetic features
  • cat/pat (1 feature diff) more similar than cab/bat (2 feature diff)
  • bull/veer are neighbors (3 feature diff)
28
Q

Frequency-Weighted Neighborhood Probability

A
  • Frequency weighted similarity score: Similarity of candidate to input times candidate’s log frequency: FWSS(t,x) = f(t) * S(t,x)
  • Neighborhood probability is a candidate’s FWSS divided by sum of FWSS of all other words in lexicon
  • For DAS similarity, S(t,x) is 1 or 0, so it reduces to FWNP(t) = f(t) / ∑ f(w)
    • Proportion of neighborhood frequency contributed by the word t
  • Frequency-weighting applies a prior probability
  • If 2 words matched on neighborhood size, more frequent one is easier to recognize
  • If 2 words matched on frequency, one in less dense neighborhood is easier to recognize
29
Q

Principles of spoken word recognition

A
  • Multiple activation: Multiple candidate words are activated
  • Activation of candidate word is based on
    • Similarity: degree of fit between speech and candidate
    • Priors: prior probability of word
  • Compeition: Competition among candidates leads to recognition
30
Q

How to test model against data

A
  • Establish linking hypothesis between model and data
  • Figure out if success/failure is due to theory, implemenation (input representation, number of units, etc), model parameters, or linking hypothesis
31
Q

How to model frequency in TRACE

A
  1. Post-lexical (adjust choice rule that transforms activations)
  2. Resting activations
  3. Manipulate phoneme-to-word weights (effect of frequency is proportional to amount of word heard)

All very similar results (Dahan et al., 2001) but bottom-up weight scheme had best fit to data.

32
Q

Arguments for feedback (as in TRACE)

A
  • We see lexical effects on sublexical tasks
  • Makes model robust to noise
  • Implicitly encodes sublexical probabilities (diphone etc)
  • TRACE more accurate and faster if feedback on
  • Top-down knowledge can guide perceptual recalibration
  • Interactive activation systems can perform optimal Bayesian inference
33
Q

Arguments against feedback (in TRACE)

A
  • Lexical effects can be captured if perceptual and lexical info integrated postlexically
  • No way to increase information in raw signal, so best bottom-up signal is best guess
  • Predicts perceptual hallucinations
    • But listeners to show lexically influence perceptions, like phoneme restoration or failing to recognize some mispronunciations
34
Q

Magnuson, Dixon, Tanenhaus, and Aslin, 2007

A
  • Eyetracking
  • Demonstrated competition effects even though competitors were not displayed
  • Manipulated frequency, neighborhood density (one phoneme DAS), and cohort density
    • High frequency: Early, continuous facilitating effect
    • High cohort density: Early, continuous inhibitory effect
    • High neighbor density: Early facilitating, late inhibitory effect (rhymes kick in)
  • Competitor set dynamically changes as word unfolds
35
Q

Dahan, Magnuson, Tanenhaus, and Hogan, 2001

A
  • Subcategorical phonetic mismatches
  • Ne(ck)t slower than ne(p)t in eyetracking
  • Marslen-Wilson and Warren, 1994, found no RT difference for these kind of data.
    • Concluded part of “neck” could not be inhibiting “net”
  • Here, eyetracking data shows the predicted difference.
  • Compatible with lateral inhibition
36
Q

Phonotactics in TRACE

A

Implicit

More frequently a phoneme or n-phone appears in TRACE lexicon, the more top-down feedback it gets

37
Q

Hannagan, et al., 2013

A
  • Implemented an activation model inspired by TRACE but without the spatial duplication of time
    • Used string kernels
    • Achieves spatial invariance for visual word recognitions, so generalize to time.
  • Layers
    • Input phonemes over time
    • 1-Phone and 2-Phone levels
    • Word level (lateral inhibition here)
    • /d/o/g/ activates /do/, /dg/, /og/ but not /gd/
  • Proof of concept
38
Q

Temporal problems for spoken word recognition models

A
  1. Temporal order over phonemes: dog vs. god
  2. Multi-token independence problem: do vs. dude
  3. Temporal order over words: man bites dog vs. dog bites man
  4. Segmentation problem
39
Q

Phenomena simulated by TRACE

A
  • Lexical effects
    • Main Ganong effect: Xlug as /p/, Xlood as /b/
    • Time pressure eliminates effect
    • Late lexical effects (targeX as /t/). Context has stronger effect for ambiguous phonemes late in word.
    • No lexical effects for unambiguous phonemes
  • Phoneme monitoring data
    • No lexical effect on RT for word-initial targets (respond when you hear word starting with /g/)
    • Lexical effect on RT for later targets (faster on secreT vs. gulduT)
  • Lexical conspiracy: Phonotactic effects from lexical statistics
    • Xluly (X in p-t continuum). Interprets as “Xl” early on then shifts to “tl” by end, a phonotactically illegal sequence.
  • Simulating cohort and rhyme competition in eyetracking
  • Sensitivity to subcategorical mismatches
  • Lexical basis for segmentation
40
Q

The Shortlist Models

A
  • Shortlist A, Merge, Shortlist B
  • Differentiates time-specific tokens and time-invariant words
  • Generates a shortlist of tokens and lexical lattice on the fly
    • How could lattice be wired on the fly?
  • Core theoretical Assumptions
    • Prelexical to Lexical paths are feedforward only
    • Candidate selection is based on matching and mismatching information
    • Implicit segmentation via competition among candidates
41
Q

similarity in neighborhood activation model

A
  • Global similarity
  • Static set of competitors
  • Recognition RT related frequency-weighted neighborhood probability
    • More neighbors, harder.
    • More frequent neighbors, harder.
  • Naming, decision, recognition in noise allow global similarity to matter
42
Q

why input to spoken word models should not be phoneme units

A

They overlap too much.

Phoneme certainty is never 100% at any one point in time.

43
Q

some ways to make non-phoneme input to word recognition models

A
  1. Fake acoustics
  2. Real acoustics
  3. Simulate output of prelexical processing
44
Q

Shortlist B

A
  • Norris and McQueen 2007
  • Most radical move: Abandons activation entirely
    • What is an activation, anyway?
  • No inhibition among competing hypotheses
  • Competition reflected in differing path probabilities
  • Assumes word recognition is optimal, in a Bayesian sense
  • As ambiguity increases, prior beliefs matter more.
  • Priors require us to encode frequency.
  • Likelihood functions are more important than similarity metrics