SPEECH: Speech signal analysis Flashcards

1
Q

how do we analyse speech and any other sound?

A

using a time domain

-you can measure how the amplitude rapidly change over time (called temporal fine structures, TFS).

-You can also measure how the slower amplitude fluctuations over time (called temporal envelopes, E).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what is temporal fine structures (TFS)?
what information does the temporal fine structures (TFS) give?

A

how the amplitude changes over time

-The fine structures give you the pitch information in the speech, determining things like fundamental frequencies and harmonics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what is temporal envelope (E)?
what information does E give?

A

how the slower amplitude fluctuations over time

-The envelopes are quite important information for speech understanding.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what is a time domain waveform?
axis?

A

a visual representation of amplitude change over time

X axis: time. Y axis: amplitude.

e.g. a modulated sinewave is periodic whereas noise is aperodic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is voice onset time (VOT)?
how do we view it?

A

Voice onset time (VOT) is the duration of the period of time between the release of a plosive and the beginning of vocal fold vibration.

It’s a feature best viewed using time-domain waveform.

e.g. Voiced stop consonants (b, d, g) usually have longer VOT than voiceless stop consonants (p, t, k), and they can be very easily told by looking at the time-domain wave form.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

why do we show time domain wave forms with their corresponding frequency domain spectrums?

A

speech sounds like vowels differ mostly in their frequency components.

You might find it relatively difficult to characterize them in the time-domain as they are all periodic and there doesn’t seem to be a clear way to describe them in time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what does a frequency spectrum show?

A

Visual representation of amplitude change across frequency

X axis: frequency (Hz) Y axis: amplitude

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what are formants?

A

Formants are the prominent peaks observed in the smoothed spectrum of speech sounds in the frequency domain.

They represent resonant frequencies in the vocal tract and are shown as F1, F2, F3, and so on.

These formants shift in frequency depending on the spoken vowel.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what are harmonics?

A

Harmonics are smaller peaks under the smoothed lines in the spectrum.

They are equally spaced on the frequency axis and are integer multiples of the fundamental frequency (F0).

While F0 can vary with changes in pitch and intonation, the spacing of harmonics remains consistent.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

speaking vowels while keeping pitch unchanged will impact the formant frequencies and harmonics how?

A

If you speak different vowels while holding your pitch unchanged, then the formant frequencies will change but the harmonics will not change

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is a spectrogram? its axis?

A

Visual representation of amplitude change over time and across frequency

X axis: time
Y axis: frequency
Color of the pixels: amplitude

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what is the disadvantage of using a spectrogram?

A

it only provides spectral information at one moment in time or averaged over a period o time but you won’t see how frequency information changes OVER TIME.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

when looking at a spectrogram what does it mean when you see stripe like patterns or no pattern at all?

A

-stripe like pattern with their spacing changing from time to time means that the are spectral peak at these frequencies which are harmonically related. SO this mean the persons pitch goes up and down over time while they speak

-no pattern=no harmonics so its either noise (aperiodic) or noise like sounds. Many consonants are voiceless so they don’t require any vocal fold vibrations which produces aperiodic signals like noise.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly