Speech Perception & Comprehension Flashcards

1
Q

SPEECH = VARIABLE

A
  • every word takes dif acoustic shape each time it’s uttered; due to:
    1) speaker (vocal track size/regional accent/socio-economical tier)
    2) articulation rate (4/5 syllables/sec in sentences)
    3) prosody (music of speech ie. rhythm/melody/amplitude)
    4) mode (voiced/whispered/creaky)
    5) coarticulation (individual phonemes influenced by preceding/upcoming segments ie. regressive/progressive assimilation)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

VISUALISING SOUND

A
  • 2 main ways:
    1) WAVEFORM
  • y-axis represents amplitude (w/0 on horizon); x-axis represents time
    2) SPECTROGRAM
  • derived from Fourier transform to represent time on x-axis
  • y-axis = frequency/energy (ie. amplitude)
  • colour = 3rd dimension (aka. brighter = stronger)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

SPEECH = QUASI-CONTINUOUS

A
  • no unique/systematic way to flag word boundaries aka. rarely silence between 2 words
  • short silences (100ms) typically correspond to vocal tract closing to produce so-called plosive/STOP consonant in “pocket”
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

SPEECH = LEXICALLY AMBIGUOUS

A
  • words = made of limited number of sounds/syllables aka. embedded words = everywhere inside other words
  • ie. captain -> cap
  • ambiguity also arises due to straddling words as soon as we put 2 words together
  • ie. clean ocean -> notion
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

SPEECH = AUDIOVISUAL

A
  • visual info given by lips/adjacent facial areas about articulation = integral to speech perception when available
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

MCGURK & MCDONAL’S ILLUSION (1976)

A
  • visual signal should be weakly constraining for it to work aka. visual /ga/ = ^ ambiguous > visual /ba/
  • /ga/ = don’t actually see if speaker is closing glottis
  • so visual cues = also compatible w/ /da/
  • visual /ba/ = unambiguous as you see lips closing preventing illusion from occurring
  • visual signal must be compatible w/both back/medial closure of vocal track (/ga/ VS /da/); conflict w/front closure implied by auditory /ba/ attracts perception towards mid-point between front/back of mouth (/da/)
    FUSION
  • /ga/ (vision) + /ba/ (audition) = /da/ (perception)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

INFO FOR IDENTIFYING WORDS

A

PHONEMES
SUPRA-PHONEMIC INFO

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

PHONEMES

A
  • building blocks of vocab
  • smallest units in signal allowing meaning distinction (ie. bat/mat have 3 phonemes & differ by 1st one)
  • limited number so words are created by combining them in unlimited ways specific to language
  • English = 20 vowels & 24 consonants
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

SUPRA-PHOENEMIC INFO

A
  • prosody/music of speech (ie. rhythm/melody/energy) ie:
    1) lexical stress/accentuation (ADmiral/admiRAtion)
    2) tones (same strong of phonemes can have dif meanings depending on pitch contour in some languages ie. ma in Mandarin (horse/mother/scold)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

SUPRA-PHOENMIC INFO: DAHAN ET AL. (2001)

A
  • carried by larger chunks > phonemes ie. syllables
  • languages vary in terms of importance of supra-phonemic info for recognising words (ie. French < English < Mandarin)
  • phonemic/prosodic info is needed for lexical distinctions BUT word recognition = also sensitive to subtle articulatory details ie. co-articulation cues
  • the way in which vowel is pronounced/sounds depends on identity of following consonant
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

SPEECH = MENTAL CATEGORIES

A
  • when presented w/exemplars along continuum of syllables between 2 end-points (ie. gi-ki) we perceive whole continuum section as 1 category (ie. gi) while the other is a separate category (ie. ki) despite physical changes in category
  • aka. step-like shift indicating category boundary at some point in continuum
  • we experience stimulus as either 1 or other BUT not as in-between aka. categorical perception
  • most obvious in consonants (ie. rapid acoustic changes) > vowels/tonal info (steadier/continuous)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

CATEGORICAL PERCEPTION IN DISCRIMINATION TASKS

A
  • can also occur in discrimination tasks
  • hearing dif between 2 adjacent exemplars in continuum is maximal at category boundary (ie. across categories) BUT at chance within category
  • category boundary lies roughly at location of continuum for all speakers of given language
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

CATEGORICAL PERCEPTION IN CONSTNANT CONTRASTS

A
  • cannot be easily demonstrated on all contrasts as you need to identify key parameters involved in contrast & latter must be easily manipulated
  • ie. voicing distinction (pa/ba; ga/ka) = regulated by 1 acoustical parameter aka. Voice Onset Time (VOT) corresponding to noisy segment from consonant explosion release up to start of periodicity in vowel
  • aka. voiced consonants (b/d/g) = shorter VOT > voiceless counterparts (p/t/k) in English
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

VOICE ONSET TIME (VOT)

A
  • can be manipulated to create continuum from voiced consonant to voiceless counterpart (ie. gi VS ki) & see if perception follows progression along continuum linearly VS showing mental categories
  • pps asked if 2 stimuli adjacent on continuum = same/dif acoustically -> maximal discrimination occurs at perceptual boundary & would be at chance for all other adjacent comparisons
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

WERKER & TEES (1984)

A
  • examined ability of English infants to discriminate non-native (ie. Hindi/Salish) contrasts during 1st year of life
  • cross-sectional/longitudinal approaches using conditioned head-turn paradigm
  • newborns come to life equipped to deal w/any possible phonetic contrast
  • non-native contrasts disappear w/exposure to language BUT native contrasts = maintained
  • aka. infants transform language-general phonetic skills -> language-specific phonological abilities via “winnowing” (aka. narrowing down) initial set of “innate” discrimination abilities
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

SPEECH -> WORD MAPPING

A

DIRECTIONALITY OF LEXICAL ACCESS
ACTIVE COMPETITION BETWEEN WORDS
INTERACTIVITY FROM MEMORY -> PERCEPTION (VICE VERSA)

17
Q

DIRECTIONALITY OF LEXICAL ACCESS

A
  • auditory memories for words “open up” only if initial sound = perceived
  • left -> right processing aka. first few sounds carry most of info weight (word endings = easily “guessed”)
  • contrasts w/parallel processing in visual/orthographic modality
  • importance of Uniqueness Point
18
Q

ALLOPENNA ET AL. (1998)

A
  • used eye fixations to determine which words was being evoked by incoming speech (“and now click on the beaker”)
  • more fixations of onset-overlapping words (ie. beetle) > rhyme-overlapping words (ie. speaker)
  • so word memories = more easily evoked by initial sounds > endings (ie. directionality)
19
Q

MARSLEN-WILSON & WELSH (1978)

A

THE COHORT MODEL
STEP 1) Activation
- first sound of word activates all words in memory beginning w/said sound aka. cohort
STEP 2) DE-ACTIVATION
- words no longer match signal as it unfolds = progressively rejected from cohort
- initial cohort gets smaller as more info arrives
STEP 3) UNIQUENESS POINT
- aka. word identification just occurred
- info afterwards barely has role in word recognition; system already committed itself

20
Q

COMPETITON BETWEEN LEXICAL CANDIDATES

A
  • each phoneme in input can logically only belong to 1 word at a time
  • so sound-overlapping memories (ie. succeed/seed) = necessarily enemies
21
Q

INTERACTIVITY

A
  • 2 processes (A/B) said to interact if B receives A output as input & able to send result of computations back to A before A = completed
  • similar to “Larsen” acoustical effect
  • 2 conditions must be met:
    1) info must be allowed to travel both ways
    2) info should be passed to next lvl immediately pre current lvl finishes own computation (ie. cascading NOT seriality)
22
Q

THE GANONG EFFECT (1980)

A
  • lexical content dictating how preceding ambiguous phoneme should be heard = compatible w/notion that perception is no autonomous process but 1 that LT lexical memory influence
  • Ganong effect motivated inclusion of top-down connections in TRACE (McClelland & Elman (1986))
  • BUT could be that Ganong doesn’t reflect influence of memory on perception per se but simply combination of perception + memory influencing conscious decision 1 needs to take to complete labelling task
23
Q

SPEECH STREAM -> WORD SEGMENTATION

A

LEXICAL SOLUTIONS
PRELEXICAL CUES

24
Q

LEXICAL SOLUTIONS

A

1) word offset anticipation
2) lateral inhibition between word memories
- solution proposed in Cohort model
- segmentation = by-product of word recognition
- lexical boundaries perceived as consequence of recognising words in speech

25
Q

LS: WORD OFFSET ANTICIPATION

A
  • useful for words that have their uniqueness point per/on last sound (ie. catheDRALrenovated)
  • BUT many cases of short words embedded at start of longer words aka. clearly not ideal (ie. CATerp…/ etic VS illar?)
26
Q

LS: LATERAL INHIBITION BETWEEN WORD MEMORIES

A
  • if overlapping words = enemies & adjacent words = friends -> segmentation outcome = optimal:
    1) elected words don’t overlap by any of their sound (aka. 1-to-1 mapping only)
    2) no phonetic segment is left unaccounted for (aka. exhaustivity)
  • “shipinquiry” = ONLY “ship” + “inquiry” despite other words fully compatible w/portions of signal (ie. in/ink/shipping)
  • BUT ship/inquiry = only 2 words w/which conditions 1 & 2 are met
  • some memories could have reaction lvl pushed below baseline via lateral inhibition from overlapping competitors; contrasts w/machine parsing “recognise speech”
27
Q

PRELEXICAL CUES (TRUBETZJOY (1939))

A
  • listeners learn that certain properties of language = associated w/presence/absence of word boundaries
  • these cues modulate lexical competition to favour certain segmentation outcomes
  • cues can be proximal (located at word boundary)/distal (further away)
  • 2 types:
    1) allophonic cues
    2) rhythmic cues
    3) phonotactic cues
28
Q

ALLOPHONIC CUES

A
  • phonemes take on particular shape/quality depending on position relative to word/syllable boundaries
    IE) ENGLISH
  • voiceless stop consonants = aspirated at syllable/word onset
  • vowels can be preceded by glottal stop at word onset (ie. grey t(h)ape VS great (?)ape)
  • speakers universally lengthen word-initial sounds (ie. great aaaape VS great tttttape)
  • syllables tend to be shorter when part of multisyllabic word compared to being a monosyllabic word in own right
29
Q

RHYTHMIC CUES (CUTLER & BUTTERFIELD (1992))

A
  • relate to beat of language & relative syllable weight
    IE) ENGLISH
  • listeners take STRONG syllables as word onsets
  • reflects that STRONG-weak = most common stress pattern in said language
30
Q

PHONOTACTIC CUES

A
  • each language has own rules on sequencing sounds inside words/syllables & on which sound occupies which position; need not be all or nothing (aka. restrictions) but based on probabilities (ie. position specific frequencies)
    IE) ENGLISH
  • “th” (ie. that) = always followed by vowel aka. must be word boundary right after “th” when consonant follows (ie. “bathe more”)
    IE) FINNISH
  • vowels = either all of same category (all back OR front) or neutral; changing back -> front (vice versa) in running speech signifies word boundary
31
Q

SUMMARY

A

-word recognition = strongly directional; follows unfolding of signal
- lexical competitors = words matching signal from OWN onset; don’t have to be aligned w/word onset
- segmentation can be solved by:
1) reorganising words in signal (lexical solution)
2) learning that some linguistic events correlate w/presence/absence of word boundaries (pre-lexical solution)
- phoneme perception = influenced by words we know (interactivity)