Lecture 8 - Categorical Perception and Learning Flashcards by andrea shelton

Statistical Learning

Through mere exposure, we seem to learn
what kinds of things go with other kinds of things.

we do learn contingencies over time

lines start to blur: associative or non-associative

How well did you know this?

Not at all

Perfectly

Through perceptual learning, we seem to BUILD and STORE

specific stimulus distinctions.

• These stimulus features can be used to identify and
categorize different types of things.

Once established, these feature categories become the basis for top-down perceptual processing (e.g. recognizing feathers on male and female chicks).
have a store house of different objects and can filter than down to the environment

How well did you know this?

Not at all

Perfectly

Example: Perceiving breaks between words

The segmentation problem

how do you find the breaks where theres always a continuous signal?

there are no physical breaks in the continuous acoustic signal of speech. [High computational complexity]

– Top-down processing, including knowledge a listener has about a language,
affects perception of the incoming speech stimulus (parse the speech as it’s coming in).

– Segmentation is affected by context, meaning, and our knowledge of word structure.

How well did you know this?

Not at all

Perfectly

non-associative learning

helps us see how we respond to and distinguish stimuli inn the environment and our responses

perceptual learning - we become better and better at telling things apart

How well did you know this?

Not at all

Perfectly

associative learning

find different contingencies (two different stimuli - classical) (response and outcome - operant )

just building contingencies between two things (learning language)

How well did you know this?

Not at all

Perfectly

What kind of learning reviewed so far seems specifically

useful for speech segmentation?

Statistical learning

helps us know when the breaks are coming: knowing the probabilities of when certain syllables tend to follow other syllables

How well did you know this?

Not at all

Perfectly

Saffran, Aslin & Newport (1996)

demonstrated that

infants can detect word boundaries with
different transitional probabilities. [innate tendency]
- we have the innate ability to track different contingencies

• A continuous stream of sounds becomes segmented.
…bidakupadotigolabubidakutupiro…
…bidaku/padoti/golabu/bidaku/tupiro…

• And this should apply to natural speech.

…lookattheprettybaby…

…look/at/the/pretty/baby…

High likelihood PRE–>TTY
High likelihood BA –> BY
Low likelihood TTY–>BA

How well did you know this?

Not at all

Perfectly

Perceiving features

In order to track probabilities, we need to first distinguish

basic features (e.g. syllables) of the stimulus.

have to be able to ID syllables and be able to build those categories up

How well did you know this?

Not at all

Perfectly

Some feature detection seems to be innate.

contraints

• Frogs have ‘bug’ detectors: group of cells that detect the size and shape and movement pattern of bugs that induces them to flick out their tongues (Lettvin et al., 1959).

• Visual system has simple and complex edge detectors: straight lines, edges: occur as early as you can train the system
(Hubel & Wiesel, 1959, 1962).

• Babies have phonetic discrimination for all language
sounds up to 10 months of age.

How well did you know this?

Not at all

Perfectly

But all of these feature detectors seem to be shaped by both experience and ‘topdown’ influences.

we have all these innate abilities to detect things in the environment but we can shape them with topdown knowledge
- experience dependent plasticity

Critical periods (e.g. phonetic discrimination)
Mere exposure and discrimination training
we can form many many different types of representations

How well did you know this?

Not at all

Perfectly

How do we (as babies) initially discriminate the different

phonemes (speech sounds) that make up syllables?

Acoustic Speech Waveform 
        |
       V
Phonemes
[d]
[da] [di] [du]
Words
Don
dean
dune

babies can make discrimination from the acoustic signals that make up syllables

we pull out phonemes (smallest perceived sound from a sound signal)

phonemes can be attached to vowel sounds which creates a syllable and those syllables create words

How well did you know this?

Not at all

Perfectly

Sound spectrograms

are often used to show changes in frequency
and intensity for speech.

– These are plotted by frequency (and amplitude) over time.

– Formants are the enhanced
(darker) bands of frequencies.

How well did you know this?

Not at all

Perfectly

Consonants

are produced by a constriction of

the vocal tract (using the articulators).

How well did you know this?

Not at all

Perfectly

Formant transitions

rapid changes in frequency preceding or following
consonants as you’re producing a sound

when you produce a “duh” or “buh”

This results in production of the basic unit of
speech sound – the phone.

How well did you know this?

Not at all

Perfectly

phone

speech signal

the basic unit of
speech sound

How well did you know this?

Not at all

Perfectly

phoneme

Study These Flashcards

thing you understand

smallest unit of perceived speech stimulus that changes meaning of a word (bad vs pad). These are defined by your language.

if you change the phoneme you’re changing the meaning of the word that it’s attached to

The variability problem

Study These Flashcards

there is no simple
correspondence between the acoustic signal (phones) and perceived phonemes.
- no one thing in the signal that you can “key in on”

Perceiving features in speech… is hard

Variability from context:

Study These Flashcards

the acoustic signal associated with a phoneme
varies with acoustic context.

what the phoneme or phone is being attached to

coarticulation

Coarticulation:

Study These Flashcards

overlap between
articulation of neighboring
phonemes causes variation in formant transitions. Yet, we still perceive the same /d/.

while you’re articulation one phone, it’s attached to other phones and you’re trying to articulate that next phone as well

articulating all those things, all together at once

you’re always paring that acoustic info with other acoustic info

Variability from different

speakers

Study These Flashcards

– Speakers differ in pitch,
accent, speed in speaking, and pronunciation.

– This acoustic signal must be
transformed into familiar
phonemes and words.

How?

One way we deal with the
variability problem is through
categorical perception.

(one of the ways)

it leads us through the valley

Study These Flashcards

– This occurs when a continuum of stimulus energies ( a lot of acoustic signals coming out at you) are perceived as a limited number of sound categories (you don’t hear a continuous stimulus, it’s broken down).

– This can be accomplished through the use of acoustic cues (sets different syllables and phonemes apart).

acoustic cue

example

Study These Flashcards

– An example of this comes from experiments on voice onset time (VOT): time delay between when a sound starts and when voicing (vocal cord vibrating) begins.

• Stimuli are /ba/ (short VOT)
and /pa/ (long VOT)

CogLab #40

VOT

Study These Flashcards

You (n = 224) heard 9 different synthetic speech stimuli with a range of VOTs from short (0 ms) to long (80 ms).

• Task: What do you hear? (pa
or ba – identification).

dependent on the critical period: 10-12 months of exposure to these phonemes
Thus, we experience perceptual constancy for the phonemes within a given range of VOT.

Perhaps, as babies, we perceive basic speech information by*:

Study These Flashcards

Using innate (species-specific) perceptual abilities to identify phones by acoustic cues (e.g. VOT).
Relying on mere exposure to allow these categories to become (and remain) clear.
Once we have those categories we can track which sounds go together to form words using statistical learning.
Later, improving performance when speaking using discrimination training (with operant conditioning).
highly dependent on the environment: feedback that helps train the system

phonetic boundary.

As you increase VOT, listeners do not hear the incremental changes. Instead they hear a sudden change from /da/ to /ta/ if they're on other sides of phonetic boundary then you hear two different things great constraint

Is there a theoretical model that shows how this might be done (and is biologically plausible)? McClelland & Rummhart (1981) 's Interactive Activation Model

developed a connectionist model which may account for some patterns in language learning. * Originally developed for printed language (but can be used for acoustics as well). * Start off: Feature detectors are activated when they match the stimulus. (Note: they can be spatially sensitive.) - sensitive to a certain line of a certain orientation: if it's part of a letter that letter node becomes active ( T ) - excitatory connection excites a T (activate the "T" words) - inhibitory connection: L: we're not an L (don't activate "L's" * They excite letter nodes when the detected feature is part of the represented object (otherwise inhibit). * Letter nodes excite word nodes if they are a part of the word representation (otherwise inhibit). * All letter stimuli are evaluated individually.

• R and K are equally likely letters in the fourth position, based purely on features. The D doesn’t match at the feature level.

all the letter part of WORK: activate those nodes because they're highly likely (we can track that) individual letters are primed or pre-activated • The “WORK” node is already activated and sends feedback to K to pre-activate it (priming?). • We might explain this behaviorally, noting that R has a low probability of following R.

Lecture 8 - Categorical Perception and Learning Flashcards

(27 cards)