Speech Acoustics Flashcards
Voiced sounds
- Voiced sounds: the vocal folds shut, air pressure build up behind them, folds forced open, air flows through, folds snap shut and the cycle begins again- for voiced sounds the airflow is periodically interrupted.
Voiceless sounds
- Voiceless sounds: the vocal cords are open all the time, there is no periodic structure to the sounds produce- these are often noise like sounds.
Where does air flow during speech production after the vocal cords
- Air flows up through the pharynx into the mouth and for nasal sounds where the soft palate is lowered, into the nasal cavity.
Source filter model
- The simplist description of speech production is called the source filter model.
- For a voiced sound the periodic airflow from the larynx is the source and has a certain spectral content (for unvoiced sounds the source is the turbulence)
- This is then modified by the articulators.
Period and frequency
- Period= time required to complete one cycle
- Equations: T= 1/f
- Thus frequency (f) and period (T) are related.
- It is an inverse relationship: the larger the period, the slower the cycle and the fewer the cycles per second.
Complex periodic sounds
Complex periodic sounds have a repetition frequency (fundamental) and higher frequency’s components at multiples of the fundamental (harmonies)
Spectrograms
- To view the changes in speech spectrum through time we use a spectrogram.
- This shows the acoustic energy at each frequency and time.
- The darker the display, the more energy is present at that frequency.
- Allows us to view the dynamic changes in the speech spectrum.
Vowels on spectrograms
- Relatively long duration, usually voiced, higher energy sounds.
- Tend to be dominated by low frequency energy.
- Characterised acoustically by peaks in the spectrum- formants.
- Formants are produced as a result of the acoustic resonances in the vocal tract.
- Changes in shape of the vocal tract produce different formant frequencies- which characterise the different vowels.
- Formants abbreviated here as f1 f2 f3.
Voice pitch on spectrograms
- For many periodic sounds, we hear a subjective quantity known as pitch.
- Generally, the shorter the periodicity in the waveform, the higher the perceived pitch and vice versa.
- Fundamental frequency (and hence voice pitch) varies between men, women and children.
- Voice pitch- intonation- is used to carry linguistic information.
Vowel transitions
- Formants stay at the same frequency for a steady state vowel.
- This rarely happens in speech.
- Usually the formant frequencies are in a state of change from one sound to the next.
- Eg diphthongs.
Context dependency
- When vowels follow or precede different consonants the vocal tract changes shape during part of the vowel
- ## The vocal tract has one shape for the consonant and then has to change shape for the vowel
Fricatives
- Fricatives /s/ /f/ /ʃ/ and the voiced equivelant /z/ /v/ /ʒ/
Acoustic of consonant contrasts- fricatives
- Aperiodic signal occurring as a result of turbulence at the constriction in the vocal tract.
- Frication extends from 1kHz upwards to as high as 8-10 kHz for /s/
- Frication has different intensities, frequency ranges and duration .
- Non silibants (f, θ,v,ð) are weaker than their silibant (s, ʃ, z, ʒ) counterparts
- Voiceless non silibants (f, θ) are the weakest sound of English
- /s/ and /z/ have energy between 4 and 8 kHz.
- /ʃ and ʒ/ have energy at lower frequencies
- /h/ is produced by aspiration turbulence rather than by fricative turbulence.
Plosives
Plosives /p/ /b/ /t/ /k/ /g/
Affricates
Affricates /tʃ/ dʒ/
Acoustic of consonant contrasts- affricates and plosives
- Plosives have a temporary interruption of airflow
- Can see interruption more clearly when plosive is between syllables.
- Plosive release associated with formant transitions as vocal tract changes shape (eg f1 rise)
- Shape of f2 transition is highly indicative of the place of articulation of the preceding plosive.
- Affricates are like plosives with a more slowly opening release so that some friction occurs.
Nasals
Nasals / m/ n/ ŋ/
Acoustic of consonant contrasts-nasals
- Main acoustic cue to nasals is a strong nasal formant at around 300 Hz with F2 and higher resonances due to energy passing through the nasal cavity rather than the buccal cavity.
- F2 transitions important in distinguishing between nasals.
Aproximants
Approximants /w/ /r/ /y/ /l /
- Main acoustic cue for approximants are the changes in formant frequency as production moves from the consonant into the following vowel.
Audibility and acoustic cues
Audibility and acoustic cues
- Many speech perception studies carried out in ideal conditions.
- Many reasons for degradation of conditions in the real world.
Reasons for degradation of speech
Reasons for degradation of speech
- Loss of frequency components of signal
- Addition of non speech background noise
- Addition of competing talkers
- Reverberation
- Distortion by processing devices
- Distortion by damages ears
- Non fluent speakers
Redundancy
- The problems of variability and listening to speech in difficult conditions are made easier because the speech signal contains a lot of redundant information.
- In other words, there is usually much more information in the signal than we need to identify the utterance.
- This can e demonstrated by
Sine wave speech
Sine wave speech
Sine wave speech
- Take natural speech and extract only the frequencies of the first three formants.
- Replace these three formants with sinusoids that vary in frequency and amplitude according to the original formants.
Non acoustic cues
- These non acoustic cues include:
- Semantic cues (the meaning of preceding and following words and the subject matter)
- Syntactic cues (grammatical rules)
- Circumstantial cues (speaker identity, listening environment)
- Visual cues ((eg lip reading)
Audio visual intergration
- Movements of a speakers lips and face provide important cues for speech perception.
- What we hear is heavily influenced by what we see.
- For example the McGurk effect.