Final Flashcards
difference between consonants and vowels
consonants have lower amplitude and are more constricted
dysarthria vowel sapce
reduced
precipitous HF HL has more difficulty with glides or vowels?
glides
spectrum of /l/
energy dip at frequency of side branch (anti-resonance)
formant transition
changing resonant frequencies of vocal tract due to movement of the articulators during voicing
release spectra for stops
depends on size of cavity in front of occlusion; alveolars: high freq. energy; velars: mid freq.; bilabials: low or flat (due to coupling of release with cavity BEHIND occlusion)
acoustic invariance
constant presence of acoustic cue that uniquely specifies an element of speech; Blumstein & Stevens tried to prove this with the spectra of stop bursts, but found this to be true only in 85% of bursts, so either perception relies on something other than acoustic invariance or the stop burst is just one of many acoustic cues used
formant transitions for CV
F1 always rises (vowels more open); F2 (tongue movement) rises, falls, or stays flat
VOT
time between release of occlusion and onset of voicing: negative if starts before release, zero if same time (mostly in English), positive if voicing after release
voiced/voiceless stops in English
voiced VOT usually 25 ms or less; voiceless 40 ms or more
pressure in cleft palate
can’t build up pressure behind occlusion, so trouble with stops and fricatives
difference between affricates and fricatives
rise time for amplitude onset in affricate is shorter
nasals
closed side branch forms anti-resonances of vocal tract; all resonant frequencies above anti-resonance lowered; spectral valleys in spectrum due to anti-resonances; occlusion in oral cavity and lowering of velum; overall low frequency with dominant low frequency emphasis nasal murmur
are perceptual judgments of nasality reliable when compared to nasometer?
yes
fricatives
source is turbulent airflow; spectrum determined by size and shape of cavity in front of constriction; aperiodic; amplitude determined by velocity of air molecules passing through constriction: narrow constriction impedes airflow and increases speed, so narrower=louder; contains broadband energy; fricatives are lowest amplitude element of speech
strident fricatives
z, s, zh, sh; sharper constriction and more energy (louder) in spectra; also, voiced fricatives are louder and shorter; voiced fricative has low frequency drop at beginning
how many syllables per second and sounds per second?
6-7 syllables and 10-14 sounds
Lindblom’s short term memory model for coarticulation
- speech movements preprogrammed in short term storage; 2. storage continually updated and changed; 3. size of storage limited, so instructions become more compressed as length increases, but limit to compressibility since each segment must remain minimally distinct
prosody
changes in duration, f0, and amplitude (+timbre)
stress definition
relative salience of syllables; perceptual, not acoustic measure
cross-linguistic prosody
English predominantly strong-weak; Finnish similar to English; French has accent at end of phrase; Welsh has stress opposite that of English
Why does increasing subglottal pressure result in increasing rate of glottal pulsing?
dynamics of more forcefully separating VF + increased tension to keep VF together = louder speech and higher f0; note that these CAN be controlled separately, but often are not (for example, in question intonation f0 rises without subglottal pressure)
Lieberman’s breath group theory
utterance that occurs between two respiratory inspirations; unmarked maintains constant VF tension; marked does not
stress patterns in baby babble
not very reliable marker, but if the language is more consistent, the patterns may be acquired earlier
British vs. American mothers
British had longer duration and was less exaggerated and more consistently loud; American had higher pitch, more isolated words, more repetition, more salience (before boundary, using pauses)=American babies learning more words
Shepard’s vowel space
not acoustic, but based on perception of misidentified vowels (closer together=more likely confused); but same location as Peterson & Barney
simple target models for vowel perception
only formant values are cues (first two formant values, which may sometimes be averaged)
elaborated target models
contain an element of normalization, for example formant ratios; intrinsic normalization =vowel has sufficient acoustic info. for i.d., while extrinsic =acoustic information and other cues are used to i.d. vowel (such as information about speaker or information from surrounding speech or dialect, etc.)
priming
influence perception of vowel by preceding speech, for example by giving speech with lower F1 first, vowel then heard as “bit” (lower F1) instead of “bet”
evidence against simple and elaborated target models
from Strange and colleagues, who took out formant info. from middle of vowels, to suggest that dynamic information, specifically formant transitions, are important
dynamic specification models
vowel perception based on formant trajectories (type of transition with knowledge of where you’re going)
Stevens & House said that
formant values dependent on consonants coarticulated with
why would you want to combine three theories?
message redundancy so that all cues can converge on one phoneme, so that if you lose a cue, you can still rely on others for understanding
what has the highest frequency amplitude peak?
f0
Haskins lab
painting spectrographs in 1952 to develop prosthetic device for blind veterans to read to them
Lieberman et al. study 1952
found that same frequency burst would be judged as p, t, k depending on F2, but didn’t anticipate vowel environment (burst emphasis of stop), but then looked at transitions
Kewley -Port then found that
changing spectrum of burst over time could identify stop; but data remained inconclusive
acoustic cues for perception of stops
place: F2, changing spectral characteristics of burst over time, spectral emphasis of burst; manner: F1, transition, silent period (or voice bar), rapid onset of energy
acoustic cues for perception of all categories
place: F2; manner: F1
acoustic cues for perception of liquids and glides
place: dynamic formants, specific configuration of formant transitions (e.g., /r/ F3 drops); manner: dynamic formants (glides more than liquids)
acoustic cues for perception of nasals
place: F2; manner: murmur, antiresonances, damped frequencies above antiresonance
acoustic cues for perception of fricatives
place: spectral peak/emphasis; manner: high energy, aperiodic, broadband energy
categorical perception
identification (label stimuli along continuum) and discrimination (judge whether sounds are same or different); how we categorize different acoustic signals as the same thing; can train self to hear smaller differences, but why do so if not phonemic contrast?; insensitive to differences within phonetic category, but hypersensitive across boundary
criteria for categorical perception
sharp slope in identification at phonetic category boundary; discrimination of acoustic differences above chance across phonetic category boundaries; discrimination of acoustic differences is at chance within phonetic categories
are we better or worse at discriminating vowels than consonants within a category?
better
two extremes for categorical perception
speech is special (Lieberman, etc.) and categorical is not categorical (Pisoni, etc.)
What is the voiced/voiceless boundary for humans?
30 ms (however, chinchillas also close)
infant speech perception and perceptual narrowing
6-8 mos. infants can easily discriminate non-native contrasts, but 10-12 mos. old perform more like adults
what happens if you insert silence between s and lit?
perceive split as long as you make the silence long enough (trading off occurring; need longer silence)
other examples of perceptual integration of acoustic cues
lower F1 onsets cue voiced stops but also require longer VOT
b and w similar acousticsally
although w has steeper slope; but if you said w faster and shrank the slope, it would sound more like b
when listeners use acoustic cues to i.d. speech, do they compensate for rate of speech?
yes; different processing (when speech is fast, listeners change percept to /w/ more quickly, compared to /b/)
motor theory
- objects of speech perception are the intended phonetic gestures of the speaker (abstract type of invariance embedded in motor control of articulators) 2. if perception and production share same invariants, must be intimately linked (since acoustic cues aren’t sufficient)
more motor theory
different sound –same percept; same sound–different percept (same freq. burst judged as diff. stop depending on F2 value); you can override cues, for example changing silent duration in slit with lead to split
duplex perception
for motor theory–separate formants and transition, end up with chirp–speech and non-speech sounds processed differently?
black box of motor theory
acoustic signal undergoes mysterious transformation (unexplained) into intended phonetic gesture before it comes out as a phoneme
McGurk Effect
integrate visual and auditory cues into one percept that is between them; argument for perceiving phonetic gestures
quantal theory
think of this as categorical production; composed of discrete units and varies by steps, not continuously; because of this quantal nature of speech, invariant cues to speech are either acoustic or arise from processing acoustic signal in auditory system; invariance due to quantal nature of speech or quantal way that aud. system responds to speech; this theory can be tested in lab
more quantal theory
ultimate units of perception are distinctive features (ex: +/-bilabial)
quantal theory units of perception
place: burst freq. emphasis (but only 80% of time); spectrum of burst over time; nasality: spectra before closure and after and during the murmur; consonantal: change in spectrum over time (more dramatic for stop than glide); strident: high frequency energy
theories of speech perception
motor theory way on left under motor; quantal in middle; auditory on right
Can production lead to perception in infants?
Maybe. Think about timeline for expressive vs. receptive skills–similar. Also consider infants’ interest in listening to sounds; if they have one favorite babble pattern (ex: ba) and you play that vs. other sounds, they are more interested in the one they produce; if have two favorite sounds and you play two out of three, then more interested in novel sound (in same category that is in reach)–roots of phonology?
look at actual slides and
study quizzes
categorical perception, quantum theory, motor theory
- word-final devoicing in Russian; 2. minimal pairs–note how small change has big consequences; 3. focus on placement when working with artic–want gesture to be right