Unit 4 Flashcards
Consonant production misc.
A consonant is produced with a constricted or closed vocal tract
Most all consonants except for nasals and semi vowels- the sound that will distinguish one place of articulation from another is going to be aperiodic noise. As we change the place where that constriction occurs and the amount of, the sound of that aperiodic noise will take on different characteristics
The voiceless sound is generated by shoving air through a small place
Resonant consonants are periodic and extremely vowel like
Only vowels and some semi vowels can serve as the nucleus of the signal
Cannot have a syllable that is made up of just a consonant
Consonants sometimes can be a nucleus
Acoustic features of consonants
Creating a relatively closed vocal tract by constricting the tract somewhere along the tube
Air going from the large part of the tube is shoved down into a small opening in which the air speeds up and comes out the other side of that opening an as it hits the pressure on the other side, the air particles start to spin randomly (called eddies). The swirling is called turbulence (there is no periodicity to it)
The aperiodic sound is like noise (shhhh), the characteristics of the noise will be dependent on factors:
- Place of where the constriction takes place in the vocal tract
- How the degree is the space constricted (air pushed through a small pin hole or flowing like air under a door) ttttk (complete constriction, shhh/ fffhhh (does not)
- Duration, how long the air is being pushed through the constriction
- Voicing overlay, turning the voice off and on, gives us a combination of noise laying on top of each other
- Rise time: how rapidly you go from 0 amplitude to maximum amplitude of the consonant (plosives have short rise times p,t, b), fricatives have very long rise times (sh), affricates are in the middle
- Formant transitions: what consonants are being used?
Semi- vowels are not obstruents and are produced a little differently than obstruent (pressured) consonants
Nasals and semi-vowels are part consonants and part vowels
It is the pressure of consonants that are primarily responsible to figure out what is being said
By losing consonants by distorting them, you lose the meaning and intelligibility
Stop Gap
Stop gap occurs in initial positions, when you just start to create the p, and you can have a stop gap in the middle (buttercup) and the end
In order to produce the plosive you have to completely obstruct the flow of air
Stop Gaps in connected speech
Depending on which consonant your producing, your speaking rate and the sound that follows it, can vary from 50-150 msec
If it goes more than 150, it will sound distorted
You can create pulses from buildups of pressure
There is a voiced bar for raggedy because g is voiced and we leave the voiced on and in buttercup we turn the voiced off
We always take the path of least resistance
We voice voiceless sounds if it proceeds a vowel?
Stops: noise burst
Now we have just anterior to the point of constriction, we create eddies and because of the pressure built up we hear the eddies and with enough pressure they get big enough and you can actually hear it. It a sudden explosion of pressure as it is escaping constriction.
It can range from 5 to 40 msec.
On average, the duration of plosives is about 10 msec and the rise time to maximum amplitude is about 10 msec
The duration and rise time allows us to determine if it is a stop
Frequencies are determined where they are produced in the oral cavity.
Theres more energy between 500 to 1500 Hz for bilabials
If we push tongue back into alveolar potion, the primary energy is above 4000 Hz
If we go all the way back to velar, the spectrum comes down a little bit, the energy is between 1500- 4000
Stops: Voice onset time
Now we are going to take that plosive burst and combine it with another vowel
Since the vowel is voiced we will call it the voiced onset time: the time when the plosive bursts and the onset of the following vowel. Very critical feature in the speech of all languages. A primary acoustic cue for whether that plosive is voiced or voiceless. If the delay is between 40-80 msec long, youll hear it as p and say it’s a voiceless plosive. However if the time is very short between -10 and +20 msec, youll hear it as voiced b. You can have a -10 by beginning voiceless before you release. The slower you go, the longer they will be.
The VOT’s do develop and get better with age and the times become more precise.
Children are able to receive the acoustic timing relationships and are important in understanding the development of speech and language.
Children and adults who stutter have longer VOT
Lips are closed and the time building up pressure is stop gap
Maximum amplitude for frequencies between a specific range about 500 Hz
When you hear the maximum amplitude and the 10 ms duration your ear will know it’s a duration
Your training your ear to hear greater than 4,000
What plosives will they not hear if client has hl of above 3,000 Hz /tit/ vs /k/ /p/ [t and d have more energy above 4,000 hz]
Ca distinguish between p and b because different onset times. Voiceless 40-80 msec
Categorical perspective of VOT and Consonants
Brain processes and categorizes consonants and vowels differently
Consonants are supposed to be processed categorically but there are limits of their acoustic boundaries, you can blend from one category to another
Vowels processed continuously
stops: Aspiration during voice onset time
Description: Audible release of air between noise burst and following vowel.
VOT for aspirated consonant typically longer. Sounds “breathy”.
Voiced-voiceless cue
Formant Transitions and Consonant Perception: stops
Started vocal fold vibrating, will see the bending of the formants
Formant transitions provide a great deal of information about what the vowel and consonant is
Formant transition for formant 1 gives your brain information about the manner of production; stop. The formant will always start low and will usually curve up. More constriction: more starts at a lower point and will always be going up
Formant 2: gives information if its bilabial, starting point for f2 is always pointing to the spectral energy of that consonant. The blue circle is reflecting the peak energy, will it would be . For d, it has to go down and the degree of bend will depend on how far it has to move
primary acoustic cue for identifying place of articulation for non released plosive consonant
- formant 2 is bends toward the spectral energy for where the consonant would be
primary acoustic cue for the manner of articulation for syllable or word initial stop
- formant 1 starting from lower level moving up to position, terminal consonant: formant 1 would be high to low, pointing down to 0
Formants of plosives
Formants 2 and 3 always “bend” toward (or away from) the primary consonant energy to the respective formant positions for the following vowel.
F1 always bends from (CV) or towards (VC) 0 Hz
Stops: formant transitions in VC structure
The second formant will point towards the where the spectral energy where the following consonant is going to be
We hear pip without the last p because the formants are transitioning out
Sound is dynamic
Fricatives
Where you constrict the airflow in the oral cavity, you are changing the length of tube and will determine where the primary spectral energy is for that fricative
Manner of production: produced by creating aperiodic random airflow
If you hold air out for 130 msec, fricative type sound
Clusters: when you put it with another vowel or sound because of coartciulation relation will be much shorter, 50 msec
Final: 200 msec
Duration Average: 130 msec. Clusters (“flow”): 50 msec Phrase final (“bath): 200 msec. Rise Time approx. 76 msec.
Fricatives have to be in the range, lengthening it isn’t a primary marker but it has to be atleast 50 sec or it will sound like something else
If you change the length or change placement of time, you are manipulating spectral energy
Strident fricatives: those that have concentrated energy in a smaller frequency range, diffuse fricatives energy is spread out over a wider band
Spectral energy for english fricatives
More concentration of energy in the higher frequency bands
Th voiceless is a small amplifier so were gonna have our stridents (s,z,sh) will be produced with more precision and greater power and concentrated energy, concentrated through small place
Know bilabial, velar, and alveolar where the energy is concentrated (s is 4,000 hz)
Affricates
The manner of articulation identified by duration, affricate: between 75- 130 msec (ch, dg)
Placement for affricates are the same and same spectral characteristics, difference is duration and rise time (between plosive and fricative)
Spectral Energy = similar to “sh” (>2k)
Duration = 75130 msec
Rise Time = 33 msec (10 msec, 76 msec
Transitions 75-150 msec (stops = 50-75 msec)
Spectrograms = look like fricatives, but shorter
Nasals
Production
Occluded oral cavity
Split air flow and sound = anti-resonances
Voiced continuants
Consonants (degree of constriction)
Syllable nucleus = yes
Weak formants
Anti-resonances=weak formants
Nasal cavity damping = weak formants
The nasal and oral cavity both serve to resonant sound, you have the block mouth exit for nasal speech
You divide the sound into two columns, that creates a void so instead of having just resonances, by dividing the airstream in the nasal cavity, we also have an anti resonants- sucks the energy out of the resonants, suppression of certain harmonics
A nasal by its nature is a voiced continuant
Difference between ing and k is dropped the soft palate
All bones lined with mucus membrane, youre gonna get anti resonants, take the amplitude of the resonant sin the nasal cavity and then decrease it?
Also because of the nasal cavity and its absorption characterstics, it is going to create weaker formants, the bandwidths in nasals get wider
The lowest formant, going to be extrememly low, below 500 hz is what we hear as the murmer
Formant 1: very weak
Formant 2 and 3, the f2 transition will distinguish m,n,nj
Nasal emission: audible escape of air through a nasal cavity
Vowel coloring: where one sound takes on the charcateristics of another osund as you are coarticulating, nasals have a strong impact on the nasal characterisitcs of vowels, the vowels that come eofre the nasals with a nasa coloring to it
Nasals
Production
Occluded oral cavity
Split air flow and sound = anti-resonances
Voiced continuants
Consonants (degree of constriction)
Syllable nucleus = yes
Weak formants
Anti-resonances=weak formants
Nasal cavity damping = weak formants
The nasal and oral cavity both serve to resonant sound, you have the block mouth exit for nasal speech
You divide the sound into two columns, that creates a void so instead of having just resonances, by dividing the airstream in the nasal cavity, we also have an anti resonants- sucks the energy out of the resonants, suppression of certain harmonics
A nasal by its nature is a voiced continuant
Difference between ing and k is dropped the soft palate
All bones lined with mucus membrane, youre gonna get anti resonants, take the amplitude of the resonant sin the nasal cavity and then decrease it?
Also because of the nasal cavity and its absorption characterstics, it is going to create weaker formants, the bandwidths in nasals get wider
The lowest formant, going to be extrememly low, below 500 hz is what we hear as the murmer
Formant 1: very weak
Formant 2 and 3, the f2 transition will distinguish m,n,nj
Nasal emission: audible escape of air through a nasal cavity
Vowel coloring: where one sound takes on the charcateristics of another sound as you are coarticulating, nasals have a strong impact on the nasal characteristics of vowels, the vowels that come before the nasals with a nasa coloring to it
Acoustic features of consonants
Creating a relatively closed vocal tract by constricting the tract somewhere along the tube
Air going from the large part of the tube is shoved down into a small opening in which the air speeds up and comes out the other side of that opening an as it hits the pressure on the other side, the air particles start to spin randomly (called eddies). The swirling is called turbulence (there is no periodicity to it)
The aperiodic sound is like noise (shhhh), the characteristics of the noise will be dependent on factors:
- Place of where the constriction takes place in the vocal tract
- How the degree is the space constricted (air pushed through a small pin hole or flowing like air under a door) ttttk (complete constriction, shhh/ fffhhh (does not)
- Duration, how long the air is being pushed through the constriction
- Voicing overlay, turning the voice off and on, gives us a combination of noise laying on top of each other
- Rise time: how rapidly you go from 0 amplitude to maximum amplitude of the consonant (plosives have short rise times p,t, b), fricatives have very long rise times (sh), affricates are in the middle
- Formant transitions: what consonants are being used?
Semi- vowels are not obstruents and are produced a little differently than obstruent (pressured) consonants
Nasals and semi-vowels are part consonants and part vowels
It is the pressure of consonants that are primarily responsible to figure out what is being said
By losing consonants by distorting them, you lose the meaning and intelligibility
Stop Gaps in connected speech
Depending on which consonant your producing, your speaking rate and the sound that follows it, can vary from 50-150 msec
If it goes more than 150, it will sound distorted
You can create pulses from buildups of pressure
There is a voiced bar for raggedy because g is voiced and we leave the voiced on and in buttercup we turn the voiced off
We always take the path of least resistance
We voice voiceless sounds if it proceeds a vowel?
Stops: noise burst
Now we have just anterior to the point of constriction, we create eddies and because of the pressure built up we hear the eddies and with enough pressure they get big enough and you can actually hear it. It a sudden explosion of pressure as it is escaping constriction.
It can range from 5 to 40 msec.
On average, the duration of plosives is about 10 msec and the rise time to maximum amplitude is about 10 msec
The duration and rise time allows us to determine if it is a stop
Frequencies are determined where they are produced in the oral cavity.
Theres more energy between 500 to 1500 Hz for bilabials
If we push tongue back into alveolar potion, the primary energy is above 4000 Hz
If we go all the way back to velar, the spectrum comes down a little bit, the energy is between 1500- 4000
Stops: Voice onset time
Now we are going to take that plosive burst and combine it with another vowel
Since the vowel is voiced we will call it the voiced onset time: the time when the plosive bursts and the onset of the following vowel. Very critical feature in the speech of all languages. A primary acoustic cue for whether that plosive is voiced or voiceless. If the delay is between 40-80 msec long, youll hear it as p and say it’s a voiceless plosive. However if the time is very short between -10 and +20 msec, youll hear it as voiced b. You can have a -10 by beginning voiceless before you release. The slower you go, the longer they will be.
The VOT’s do develop and get better with age and the times become more precise.
Children are able to receive the acoustic timing relationships and are important in understanding the development of speech and language.
Children and adults who stutter have longer VOT
Lips are closed and the time building up pressure is stop gap
Maximum amplitude for frequencies between a specific range about 500 Hz
When you hear the maximum amplitude and the 10 ms duration your ear will know it’s a duration
Your training your ear to hear greater than 4,000
What plosives will they not hear if client has hl of above 3,000 Hz /tit/ vs /k/ /p/
Ca distinguish between p and b because different onset times. Voiceless 40-80 msec
Categorical perspective of VOT and Consonants
Brain processes and categorizes consonants and vowels differently
Consonants are supposed to be processed categorically but there are limits of their acoustic boundaries, you can blend from one category to another
Vowels processed continuously
stops: Aspiration during voice onset time
Description: Audible release of air between noise burst and following vowel.
VOT for aspirated consonant typically longer. Sounds “breathy”.
Voiced-voiceless cue
Formant Transitions and Consonant Perception: stops
Started vocal fold vibrating, will see the bending of the formants
Formant transitions provide a great deal of information about what the vowel and consonant is
Formant transition for formant 1 gives your brain information about the manner of production; stop. The formant will always start low and will usually curve up. More constriction: more starts at a lower point and will always be going up
Formant 2: gives information if its bilabial, starting point for f2 is always pointing to the spectral energy of that consonant. The blue circle is reflecting the peak energy, will it would be . For d, it has to go down and the degree of bend will depend on how far it has to move
primary acoustic cue for identifying place of articulation for non released plosive consonant
- formant 2 is bends toward the spectral energy for where the consonant would be
primary acoustic cue for the manner of articulation for syllable or word initial stop
- formant 1 starting from lower level moving up to position, terminal consonant: formant 1 would be high to low, pointing down to 0
Formants of plosives
Formants 2 and 3 always “bend” toward (or away from) the primary consonant energy to the respective formant positions for the following vowel.
F1 always bends from (CV) or towards (VC) 0 Hz
Stops: formant transitions in VC structure
The second formant will point towards the where the spectral energy where the following consonant is going to be
We hear pip without the last p because the formants are transitioning out
Sound is dynamic
Fricatives
Where you constrict the airflow in the oral cavity, you are changing the length of tube and will determine where the primary spectral energy is for that fricative
Manner of production: produced by creating aperiodic random airflow
If you hold air out for 130 msec, fricative type sound
Clusters: when you put it with another vowel or sound because of coartciulation relation will be much shorter, 50 msec
Final: 200 msec
Duration Average: 130 msec. Clusters (“flow”): 50 msec Phrase final (“bath): 200 msec. Rise Time approx. 76 msec.
Fricatives have to be in the range, lengthening it isn’t a primary marker but it has to be atleast 50 sec or it will sound like something else
If you change the length or change placement of time, you are manipulating spectral energy
Strident fricatives: those that have concentrated energy in a smaller frequency range, diffuse fricatives energy is spread out over a wider band
Spectral energy for english fricatives
More concentration of energy in the higher frequency bands
Th voiceless is a small amplifier so were gonna have our stridents (s,z,sh) will be produced with more precision and greater power and concentrated energy, concentrated through small place
Know bilabial, velar, and alveolar where the energy is concentrated (s is 4,000 hz)
Affricates
The manner of articulation identified by duration, affricate: between 75- 130 msec (ch, dg)
Placement for affricates are the same and same spectral characteristics, difference is duration and rise time (between plosive and fricative)
Spectral Energy = similar to “sh” (>2k)
Duration = 75130 msec
Rise Time = 33 msec (10 msec, 76 msec
Transitions 75-150 msec (stops = 50-75 msec)
Spectrograms = look like fricatives, but shorter
Nasals
Production
Occluded oral cavity
Split air flow and sound = anti-resonances
Voiced continuants
Consonants (degree of constriction)
Syllable nucleus = yes
Weak formants
Anti-resonances=weak formants
Nasal cavity damping = weak formants
The nasal and oral cavity both serve to resonant sound, you have the block mouth exit for nasal speech
You divide the sound into two columns, that creates a void so instead of having just resonances, by dividing the airstream in the nasal cavity, we also have an anti resonants- sucks the energy out of the resonants, suppression of certain harmonics
A nasal by its nature is a voiced continuant
Difference between ing and k is dropped the soft palate
All bones lined with mucus membrane, youre gonna get anti resonants, take the amplitude of the resonant sin the nasal cavity and then decrease it?
Also because of the nasal cavity and its absorption characterstics, it is going to create weaker formants, the bandwidths in nasals get wider
The lowest formant, going to be extrememly low, below 500 hz is what we hear as the murmer
Formant 1: very weak
Formant 2 and 3, the f2 transition will distinguish m,n,nj
Nasal emission: audible escape of air through a nasal cavity
Vowel coloring: where one sound takes on the charcateristics of another sound as you are coarticulating, nasals have a strong impact on the nasal characteristics of vowels, the vowels that come before the nasals with a nasa coloring to it
Semi vowels (glides and liquids)
Characteristics of both obstruent characteristics and vowels
Semi constriction of the vocal tract combined with an open voicing portion that’s more like a vowel
The articulators will move to semi-occluded vocal tract: going open to close for “are”
Now we are going open to close or close to open, whereas dipthongs were open open
Semi vowels do not serve as a syllable nucleus position except for the vocalic, light colored /er/ bird
Semi vowels have a vowel next to it
Constriction interval needs to be less than 100 msec in order for it to be perceived as a semi vowel
Primarily formant 2 that carries the primary acoustic energy of the same- vowels and with one it will be F3, F1 is not important
Glides (w and J)
Combination of occlusion of lips and then move it from that point into what vowel comes next
If it is too long it sounds like 2 different sounds
F1 is not major, it falling in frequency or rising out of the constriction interval
F2 for w falls into it as well
with j, you get the opposite. The f2 transition rises to the constriction interval, it distinguishes the j from all the other 3 semi vowels
The rate of the transition is crucial
/w/ (as in “we”)
b + u + transition to vowel
/j/ (as in “yes”)
[d + i] + transition
Perception of Transition Duration (next slide)
40-60 msec = stop
60-100 msec = glide
>100 msec = vowel + vowel
liquids (r and l)
For l, formant 3 isn’t moving much, doesn’t play a role in L
With /er/ sound : red, rat, bird: rapid rising or falling F3, one of the few times F3 plays a role
Dark r: red, root, serves as a consonant (tongue far back, posterior) Light r (vocalic): can serve as a syllable nucleus, early,
To teach r: take blade of tongue put it on alveolar ridge (or make d), phonate, take tongue and run it on roof of mouth while phonating (sounds like l) go back as far as you can and drop it and then round lips and pop a vowel
Tongue position of l is more complex, blow sound outside tongue, will get a low first formant
Dark” /r/ - CV (“root”) Posterior tongue “Light” /r/ - VC (“early”) Palatal tongue like high front vowels Formant Structure /l/ Complex (lateral emission of air) Similar to /r/ without lowering of F3
Primary acoustic cue for manner of articulation for a word initial stop?
- F1 → will always be starting at lowest level (toward 0) moving up into formant position for following vowel
- Plosive: high to low
What is the primary acoustic cue for place of articulation for a non released plosive consonant?
- F2/F3 coming out of vowel and bends toward spectral energy for where that ocnsonant energy would be
- Ear hears form of transition and fills in blanks
Suprasegmentals
Linguistic information and structure:
Question vs statement (intonation)
Noun vs verb (stress)
Intent ( “I want YOU to walk the dog”)
Affect (emotion) of the speaker not always conveyed in the segments (words) themselves
Speech Rhythm based on variations in stress locations and pauses
Prosody, melodies of speech
Segments are individual sound categories of segments (phonemes) combine phonemes through blending and transitions to create meaningful utterances
Segmental characteristics of our language is part of our code
Laying phonemes on the rhythmic prosody pattern
Acoustic features that transcend or move across these segmental boundaries to provide the melodies and rhythms of natural speech
These prosodic patterns common to our language can reflect significant amounts of information beyond just what the segments can give you
Can take segments and move them around and convey different linguistic info (permit and permit)
Speech are rhythmic patterns that you cannot change
Acoustic features of suprasegmentals
Acoustic Features of Suprasegmentals Pitch (Fo) and pitch variation Loudness (intensity) and loudness variations Duration (length) Pausing Suprasegmental Patterns Tonation Intonation Stress/Emphasis Duration Rate
Variation in pitch during speech is intonation
Stress can impact what word is heard (permit as before license, and I permit you to do that)
We can change and affect where we pause
Neural processing of suprasegmentals
Depends on language and task. What is the listener trying to process
Left Hemisphere (English) = Syntactic information for
Right Hemisphere = Prosodic Information at phrase and sentence level
Superior temporal and inferior frontal cortex
Tonal Languages (e.g. Mandarin, Vietnamese Note: Japanese is considered to have a combination of segmental and tonal features)
Supra-segmentals at the word, phrase, and sentence level carry significant syntactic information
Processed in the left hemisphere
Examples of intonation and emotions
Neutral: Fundametal frequency: neutral emotion
Disgust: more pauses, downward inflection
Uncertain
Excited: upward
Utterance level in declination of Fo
“F0 Declination” is the tendency for F0 to decrease over the course of an utterance
The vocal folds tends to speak up as the breath accelerates
Pitch contour for statements and questions
Statement = Falling Fo Question = Falling + Rising Fo
No (angry) is down
Uncertain is down and up
Within Utterance or Fo Contour Variations
Individual syllables may receive a slight upward inflection, whereas the overall pitch of the utterance decline or remain relatively flat
Typically associated with the length and complexity of the utterance as well as intent
Linguistic stress
Linguistic stress: putting more emphasis on certain portions of the utterances than others, in order to put more emphasis, put more physiological effort to make the sound
Acoustic parameters that our brain uses that one utterance has more emphasis than another: duration, stressed syllables are longer than non stressed
Fo (pitch), amplitude, and duration allow our brain to put specific emphasis on one part of the word and our brain translates it to some kind of meaning
Speaking rate
Speaking rate can be changed in a lot of different ways
As you speak slower, the duration of the sounds get longer so to speak slower you can make your durations longer (typically vowels tend to be more stretched out than others)
Increase pause durations can slow down the speaking rate
when we get particularly fast, we tend to drop out a few syllables, when they go slower we can understand them
We can change the speed of movement, we can make the movement longer or shorter or released and unrelased formants
In the processing of learning, may often take the path of least resistance
Development of Suprasegmentals
Fetus: Responds to sound stimuli and prosody during third trimester
0-6 months: response to biologically driven needs
2-3 months: linguistic discrimination emerges based on adult prosody
6 months: production of wide range of suprasegmentals
6-12 months: learned prosodic patterns of pitch, rhythm and pausing
>12 months: integration into adult like patters
The sounds that are generated respond to humans needs, transfer of information from child to parent through segmental information
0-6 months: the child could receive tone of voice
6 months: not making phonemes yet, creating adult like suprasegmental patterns
>12: first word came first