Speech Acoustic Measurement & Analysis Flashcards

1
Q

Sound Spectrograph

A

Developed during WWII during the process of trying seeking methods to encode and decode messages
Able to display formants as a continuous function of time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Spectrum Analysis

The spectrum analyzer performs its analysis

A

The spectrum analyzer performs its analysis by moving a fixed-width analysis band, or filter, across the entire frequency range.

If the analysis band is swept continuously across the entire frequency range, it will provide overall voltage outputs as a continuous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Schematic diagram showing how the spectrograph

A

Schematic diagram showing how the spectrograph performs spectral analysis by sweeping an analysis band of fixed width (e.g., 300 Hz) across the frequency range of interest, and recording the average voltages from the analysis band as a continuous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

The Original Sound Spectrograph

The invention of the spectrograph allowed for the study

A

The invention of the spectrograph allowed for the study of the time-varying acoustic results of articulatory processes
▪ If articulator movements are changing as a function of time, and therefore changing vocal tract configuration as a function of time, the changes are reflected in formant transitions—formant frequencies that change over time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

The Original Sound Spectrograph

The time-varying patterns of electromagnetic strength are

A

The time-varying patterns of electromagnetic strength are submitted to a spectrum analyzer in the form of time- varying voltages, where voltage is proportional to sound intensity (greater voltage = greater intensity) and the speed with which the voltage changes is proportional to frequency (faster voltage changes [shorter periods] = higher

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

The Original Sound Spectrograph

The energy in the spectrum is sampled using an analysis band

A

The energy in the spectrum is sampled using an analysis band, or filter, that has a bandwidth of 300 Hz that is swept continuously across the entire frequency range of interest.
Because the voltage output from the analysis band is available for all frequencies and at every point in time, the spectrograph creates a total picture of the

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Digital Spectrograms

Produced almost instantaneously after

A

Produced almost instantaneously after an utterance has been recorded.
▪ The principles previously discussed for the original spectrograph are basically the same in digital spectrograms; the frequency and amplitude analysis are performed by moving a digital filter from low to high frequencies

The output is a digital magnitude that is proportional to amplitude as a function of frequency.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Interpretation of Spectrograms

Important features

A

Important features of the spectrographic display include:
the x-, y-, and z-axes
glottal pulses
formant frequencies
silent intervals
stop bursts aperiodic intervals.

The spectrogram shows a series of chunks, or segments, as the pattern is inspected from left to right.
The chunks, or segments, are important because they often correspond roughly to speech sounds.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Axes

A

X-axis: time
Y-axis: frequency
Z-axis: intensity (third dimension of the spectrogram)
coded by the darkness of the pattern at any point on the spectrographic display

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Glottal Pulses

Dark vertical lines that are

A

Dark vertical lines that are an acoustic result of vocal fold vibration
Each individual line reflects a single glottal pulse–a point of excitation, when the vocal folds close quickly at the end of a glottal cycle and create a pressure wave whose spectrum is shaped by the vocal tract filter.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Formant Frequencies

The dark bands seen in the patterns

A

The dark bands seen in the patterns with regularly spaced glottal pulses This pattern is seen for any speech sound produced with a relatively open vocal tract and voicing, including vowels, diphthongs, and semivowels (/l/, /w/, /ɹ/, /j/)

Nasals are voiced and radiate sound

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Formant Frequencies

The formant frequencies change as

A

The formant frequencies change as a function of time in connected speech. The constant movement of the formants during speech production reflects the constant change in the configuration of the vocal tract.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Silent Intervals and Stop Bursts

Appears as a

A

Appears as a brief blank spot, or gap on
the spectrogram
▫ Because the vocal tract is, in theory,
completely sealed during the closure, acoustic energy should not be radiated from the vocal tract

If intensity is scaled on a spectrogram as the darkness of the trace, a white or nearly white segment indicates a

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Silent Intervals and Stop Bursts

For voiced stops, there is a small amount of

A

For voiced stops, there is a small amount of periodic energy on the baseline due to vocal fold vibration during vocal tract closure
▪ Vocal fold vibration during a closure interval causes vibration of the walls of the vocal tract,
▪ This energy is seen only in the lowest freqs of the spectrogram because walls of the vocal tract vibrate only at the lowest freqs of vocal fold vibration, filtering out the higher source harmonics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Pauses vs Stop Closure

A

Pauses in speech are typically at least 150 ms and voiceless stop closure intervals are typically less than 120 ms.
▪ Many scientists have adopted a criterion of 200 ms–Silent intervals 200 ms or greater are identified as pauses, those less than 200 ms are subject to further evaluation
▫ This criterion is less reliable in certain speech disorders

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Aperiodic Intervals

A

Shown in a spectrogram as an interval of energy having no repeating pattern
These aperiodic intervals are most commonly associated with fricatives and the release phase of stops and affricates. Aperiodic energy may also be mixed with periodic energy for certain sound segments (such as voiced fricatives) or phonation types (such as breathy voice).

17
Q

Segmentation of Spectrograms

A

Identification of pieces of the display that correspond roughly to phonemic or phonetic units
Glottal pulses play an important role in the measurement of various attributes of the spectrogram


Often used to identify onsets and offsets of segments in a spectrogram

18
Q

Segmentation of Vowels

A

Segmentation of vowels is fairly simple when they are located between two obstruents
▪ The beginning of the vowel is taken as the first “full” glottal pulse, and the end of the vowel is the last full glottal pulse.
▪ A full glottal pulse is one that extends from the
baseline at least through F2 of the display.
▫ distinguishes it from the glottal pulses seen in
voiced stops, affricates, and fricatives, as well as some low-intensity pulses that extend only through f1

19
Q

Segmentation of Vowels & Nasals

A

When vowels are located before or after nasals, the change from an oral to nasal filter function (a murmur) can be used to find a vowel-nasal or nasal-vowel boundary
▫sudden change in intensity at the boundary between a nasal and a vowel and the sudden appearance of the low-frequency F1 characteristic of nasal cavity resonance

20
Q

Segmentation of Vowels & Semivowels

A

When vowels are located before or after semivowels (/ɹ,l, w,j/) or diphthongs (/ɑɪ, ↄɪ, ɑʊ, eɪ, oʊ/) the segmentation problem is very difficult because there are no natural boundaries.
▫Examples: “yellow” (/jεloʊ/) or “I honor” (/ɑɪjɑnɚ/)

21
Q

Segmentation of Obstruents

Often straightforward to segment obstruent

A

Often straightforward to segment obstruents
▪Stops: the offset (or end) of the closure interval is taken as the burst (same for the closure offset of affricates)
▪Fricatives: the offset is taken as the end of the frication noise, or the first full glottal pulse of the following vowel

22
Q

Segmentation of Obstruents

When two fricatives follow each other,

A

When two fricatives follow each other, the changing spectrum must be used to identify the end of one and beginning of the other
▫Can be challenging
▪When two stops follow each other, the only way to distinguish the closure intervals is if the first stop is released, producing a burst.

23
Q

Suprasegmentals

A

Prosodic variables
▪Fundamental Frequency (F0) contour
▫Across segments and sentence-level utterances
▪Can be computed by pitch tracker analysis programs
▪General declination F0 pattern for declarative utterances
▪Fall-rise pattern of F0 common for question utterances

24
Q

Suprasegmentals

Shape of an F0 contour carries

A

Shape of an F0 contour carries grammatical and affective information
▪F0 variation conveys emotional state by providing signals concerning the rules of turn-taking and by marking important aspects of an individual’s personality

25
Q

Tonal Languages

A

Examples: Thai, Igala, Mandarin, Cantonese
▪Shape of the F0 contour across a vocalic segment has phonemic contrastive function
▫in Mandarin the sequence /bɑ/ produced with a flat F0 contour means “eight,” but /bɑ/ spoken with a rapid fall-rise contour means “to hold

26
Q

Digital Analysis of Speech

A

Goal is to produce an accurate spectral analysis as a function of time. (similar to spectrogram)
▪There are many different algorithms and hardware solutions for this purpose
▪Original spectrographs used a continuous, analog process
▪Computer analysis of speech requires a conversion of analog speech waveforms to digital representation

27
Q

Speech Analysis by Computer: From Recording to Analysis to Output

A

All computers take an analog signal and digitize it, converting it to a series of discrete numbers in binary code (sequences of zeros and ones)
▪The analog signal is digitized with filtering at a specific sampling rate and bit level

28
Q

Sampling rate

A

Sampling rate: number of times per second the computer takes a “snapshot” of the signal and stores it in digital form
▫Standard sampling rate is 44.1 kHz
▫Highest signal frequency that can be analyzed reliably is one-half the sampling rate.

29
Q

Filters

A

Filters: eliminates all freqs above one-half of the sampling rate
▫Anti-aliasing filters

30
Q

Bit Rate

A

Bit Rate: number of amplitude levels available for storing amplitude variations in an analog signal
▫process of coding amplitude information during A to D conversion is called quantization

31
Q

Digital Parameters for Speech Analysis

A

Speech analysis programs have algorithms for display, editing, and analysis of speech waveforms.
▪Linear predictive coding: used to generate spectra to estimate formant frequencies
▪Programs can make errors during LPC analysis and F0 analysis, especially with clinical populations (e.g., hypernasality, breathy voice)

32
Q

Review

Speech waveforms provide information

A

Speech waveforms provide information on amplitude fluctuations as a function of time and fundamental frequency (F0) for voiced sounds, but do not provide direct access to formant frequencies
▪The spectrogram shows formant frequencies as a function of time & allows a user to infer changes in vocal tract configuration resulting from movement of the articulators.

33
Q

Review

It is possible to segment phonemic and phonetic

A

It is possible to segment phonemic and phonetic information from spectrograms
▫segmental durations, vowel formant frequencies, obstruent spectral characteristics,
▪Can analyze suprasegmental characteristics such as F0 and intensity contours
▪Computer-based speech analysis requires consideration of sampling rate, anti-aliasing filters, and quantization.

34
Q

Review

Analysis of formant frequencies is

A

Analysis of formant frequencies is generally fast, reliable, automatic, and accurate except in cases where a speaker has a breathy voice quality and/or excessive nasality.