03 - Speech Signal Representations Flashcards by Joachim Andreasen

What is the ‘Zero-Crossing Rate’?

It is the rate at which a signal crosses the horizontal (x) axis or the zero level.

How well did you know this?

Not at all

Perfectly

When is it useful to check the Zero-Crossing Rate?

The Zero-Crossing Rate can be useful to check in many scenarios. It contains information such as the frequency content and overall changes in the signal. We often check it during speech analysis.

How well did you know this?

Not at all

Perfectly

What does the ‘Autocorrelation’ measure?

The similarity of a signal compared to its time-delayed self.

How well did you know this?

Not at all

Perfectly

What does the term ‘Windowing’ refer to?

Answer from research:
Windowing is the process of multiplying a frame with a function so that the signals characteristics are modified, so that the amplitude of the signal is gradually reduced towards the edge of the window. It is often used on frames to achieve a quasi-stationary analysis.

From the slides (I partially disagree with this):
It is the process of splitting the input signal into temporal segments where the signal can be considered quasi-stationary. That is, not really stationary, but in the essence of the analysis, they are.

How well did you know this?

Not at all

Perfectly

What does the term ‘Frame’ refer to?

It is a continuous segment of a signal that is isolated for analysis and processing.

How well did you know this?

Not at all

Perfectly

What is the Overlap-add algorithm?

It is an algorithm that describes the process of recombining overlapping frames of a signal after some processing.

How well did you know this?

Not at all

Perfectly

What are the steps of the Overlap-add algorithm (think LAB 2)?

Input signal is windowed into frames of a certain frame length with a certain interval (hop length)
We compute the output from the input frame
We apply a window to the output frame (for example Hamming)
We recombine the overlapping output frames

How well did you know this?

Not at all

Perfectly

Why do we use the Overlap-add algorithm?

It is an efficient computation of the convolution of two signals by breaking them into smaller segments, convolving each segment separately, and then combining the results.

How well did you know this?

Not at all

Perfectly

What is the Short-Time Fourier Transform?

The short-time Fourier transform is a type of Fourier analysis used to determine the frequency content of a signal over short, fixed-length time intervals.

How well did you know this?

Not at all

Perfectly

What is a ‘Spectrogram’?

A spectrogram is a visual representation of the frequency content of a signal over time. It is a plot that displays the intensity of different frequencies of a signal as they change over time, typically represented as a heat map or color map. The x-axis is time and the y-axis is frequency (Hz)

How well did you know this?

Not at all

Perfectly

What is a Source-Filter Model?

In speech signal processing, the source filter model is a mathematical model that represents the speech signal by a combination of a sound source with a linear acoustic filter.

How well did you know this?

Not at all

Perfectly

What is a ‘Second Order All-Pole Filter’?

It is a system that has two poles and no zeros in the transfer function.

How well did you know this?

Not at all

Perfectly

What is ‘Linear Prediction’?

Linear prediction is a method that tries to predict a signal sample using a linear combination of the signal’s past samples.

How well did you know this?

Not at all

Perfectly

What is the ‘residue’ of Linear Prediction?

It is the prediction error, sometimes referred to as the residuals in other models.

How well did you know this?

Not at all

Perfectly

What does Linear Prediction Optimization refer to?

The minimization of the energy of the prediction error.

How well did you know this?

Not at all

Perfectly

What is a ‘cepstrum’?

Study These Flashcards

The cepstrum is defined as the inverse discrete Fourier transform (DFT) of the log magnitude of the DFT of a signal.

What is the ‘mel frequency spectrum’?

Study These Flashcards

The mel spectrum is a frequency representation where the frequencies are scaled to better match the human perception of sound. This scaling is accomplished using the mel frequency scale, which is a non-linear transformation of frequency that emphasizes lower frequencies more than higher frequencies.

Why do we transform using the mel frequency scale?

Study These Flashcards

It is easier for human perception.

What are the steps of the ‘Mel-Frequency Cepstral Coefficients’ (MFCC) ?

Study These Flashcards

We compute the DFT (Discrete Fourier Transform) of the signal
We compute the log-mel-spectrum (simply the log of the mel-spectrum)
We compute the DCT (Discrete Cosine Transform) of the log-mel-spectrum

The Linear Prediction Coefficients (LPC) are what?

Study These Flashcards

The linear prediction coefficients are a set of coefficients that are used in linear prediction analysis to estimate future values of a time series based on past observations.

What is the Discrete Cosine Transform (DCT)?

Study These Flashcards

The discrete cosine transform is a frequency representation similar to the Fourier transform, but it uses only real-valued functions and is more efficient for processing data that has symmetric properties.

MFCC is an abstract domain that is difficult to interpret. True or false?

Study These Flashcards

True

The log-mel spectrum is not perceptually motivated. True or false?

Study These Flashcards

False

The Discrete Cosine Transform decorrelates sequentially correlated data. True or false?

Study These Flashcards

True

MFCC's is an efficient representation for computation. True or false?

True

We can obtain dynamic information with delta and delta-delta MFCC's. True or false?

True

Summary question: Time-Domain Features refers to?

Root Mean Squared (RMS), zero-crossing rate and auto-correlation.

Summary question: Windowing refers to?

Handling non-stationary signals.

Summary question: Source-Filter Model and LPC (Linear Predictive Coding) refers to?

Estimation of vocal tract filter and excitation (making the signal more active).

Summary question: Frequency Representations refers to?

Log-spectrum, log-mel-spectrum, cepstrum and MFCC's.

What is 'formant'?

In speech processing, a formant is a resonance in the vocal tract that results in a peak of energy in the speech signal at a particular frequency. Formants are important for distinguishing between different vowel sounds and correspond to high-energy areas of the spectrum of a vowel.

What is a 'cascade combination'?

In signal processing, the cascade combination of two systems means that the output of the first system is fed as input to the second system.

03 - Speech Signal Representations Flashcards

(32 cards)