Speech Acoustics Flashcards

1
Q

Voiced sounds

A
  • Voiced sounds: the vocal folds shut, air pressure build up behind them, folds forced open, air flows through, folds snap shut and the cycle begins again- for voiced sounds the airflow is periodically interrupted.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Voiceless sounds

A
  • Voiceless sounds: the vocal cords are open all the time, there is no periodic structure to the sounds produce- these are often noise like sounds.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Where does air flow during speech production after the vocal cords

A
  • Air flows up through the pharynx into the mouth and for nasal sounds where the soft palate is lowered, into the nasal cavity.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Source filter model

A
  • The simplist description of speech production is called the source filter model.
  • For a voiced sound the periodic airflow from the larynx is the source and has a certain spectral content (for unvoiced sounds the source is the turbulence)
  • This is then modified by the articulators.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Period and frequency

A
  • Period= time required to complete one cycle
  • Equations: T= 1/f
  • Thus frequency (f) and period (T) are related.
  • It is an inverse relationship: the larger the period, the slower the cycle and the fewer the cycles per second.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Complex periodic sounds

A

Complex periodic sounds have a repetition frequency (fundamental) and higher frequency’s components at multiples of the fundamental (harmonies)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Spectrograms

A
  • To view the changes in speech spectrum through time we use a spectrogram.
  • This shows the acoustic energy at each frequency and time.
  • The darker the display, the more energy is present at that frequency.
  • Allows us to view the dynamic changes in the speech spectrum.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Vowels on spectrograms

A
  • Relatively long duration, usually voiced, higher energy sounds.
  • Tend to be dominated by low frequency energy.
  • Characterised acoustically by peaks in the spectrum- formants.
  • Formants are produced as a result of the acoustic resonances in the vocal tract.
  • Changes in shape of the vocal tract produce different formant frequencies- which characterise the different vowels.
  • Formants abbreviated here as f1 f2 f3.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Voice pitch on spectrograms

A
  • For many periodic sounds, we hear a subjective quantity known as pitch.
  • Generally, the shorter the periodicity in the waveform, the higher the perceived pitch and vice versa.
  • Fundamental frequency (and hence voice pitch) varies between men, women and children.
  • Voice pitch- intonation- is used to carry linguistic information.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Vowel transitions

A
  • Formants stay at the same frequency for a steady state vowel.
  • This rarely happens in speech.
  • Usually the formant frequencies are in a state of change from one sound to the next.
  • Eg diphthongs.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Context dependency

A
  • When vowels follow or precede different consonants the vocal tract changes shape during part of the vowel
  • ## The vocal tract has one shape for the consonant and then has to change shape for the vowel
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Fricatives

A
  • Fricatives /s/ /f/ /ʃ/ and the voiced equivelant /z/ /v/ /ʒ/
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Acoustic of consonant contrasts- fricatives

A
  • Aperiodic signal occurring as a result of turbulence at the constriction in the vocal tract.
  • Frication extends from 1kHz upwards to as high as 8-10 kHz for /s/
  • Frication has different intensities, frequency ranges and duration .
  • Non silibants (f, θ,v,ð) are weaker than their silibant (s, ʃ, z, ʒ) counterparts
  • Voiceless non silibants (f, θ) are the weakest sound of English
  • /s/ and /z/ have energy between 4 and 8 kHz.
  • /ʃ and ʒ/ have energy at lower frequencies
  • /h/ is produced by aspiration turbulence rather than by fricative turbulence.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Plosives

A

Plosives /p/ /b/ /t/ /k/ /g/

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Affricates

A

Affricates /tʃ/ dʒ/

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Acoustic of consonant contrasts- affricates and plosives

A
  • Plosives have a temporary interruption of airflow
  • Can see interruption more clearly when plosive is between syllables.
  • Plosive release associated with formant transitions as vocal tract changes shape (eg f1 rise)
  • Shape of f2 transition is highly indicative of the place of articulation of the preceding plosive.
  • Affricates are like plosives with a more slowly opening release so that some friction occurs.
17
Q

Nasals

A

Nasals / m/ n/ ŋ/

18
Q

Acoustic of consonant contrasts-nasals

A
  • Main acoustic cue to nasals is a strong nasal formant at around 300 Hz with F2 and higher resonances due to energy passing through the nasal cavity rather than the buccal cavity.
  • F2 transitions important in distinguishing between nasals.
19
Q

Aproximants

A

Approximants /w/ /r/ /y/ /l /
- Main acoustic cue for approximants are the changes in formant frequency as production moves from the consonant into the following vowel.

20
Q

Audibility and acoustic cues

A

Audibility and acoustic cues
- Many speech perception studies carried out in ideal conditions.
- Many reasons for degradation of conditions in the real world.

21
Q

Reasons for degradation of speech

A

Reasons for degradation of speech
- Loss of frequency components of signal
- Addition of non speech background noise
- Addition of competing talkers
- Reverberation
- Distortion by processing devices
- Distortion by damages ears
- Non fluent speakers

22
Q

Redundancy

A
  • The problems of variability and listening to speech in difficult conditions are made easier because the speech signal contains a lot of redundant information.
  • In other words, there is usually much more information in the signal than we need to identify the utterance.
  • This can e demonstrated by
    Sine wave speech
23
Q

Sine wave speech

A

Sine wave speech
- Take natural speech and extract only the frequencies of the first three formants.
- Replace these three formants with sinusoids that vary in frequency and amplitude according to the original formants.

24
Q

Non acoustic cues

A
  • These non acoustic cues include:
  • Semantic cues (the meaning of preceding and following words and the subject matter)
  • Syntactic cues (grammatical rules)
  • Circumstantial cues (speaker identity, listening environment)
  • Visual cues ((eg lip reading)
25
Q

Audio visual intergration

A
  • Movements of a speakers lips and face provide important cues for speech perception.
  • What we hear is heavily influenced by what we see.
  • For example the McGurk effect.