L5 SOUND PROCESS SUM Flashcards
● The basic parameters of sound.
The speech sound is a time-varying signal, created by a mechanical oscillation of pressure transmitted through air Some basic features of a sound signal are its speed, sound pressure level (intensity), loudness (volume) and frequency
● What means Fourier analysis? What is FFT? Inputs, outputs?
Any signal can be decomposed into a series of sinusoidal waves with different frequencies, amplitudes and phase shifts. This decomposition is called Fourier analysis, because the concept was first introduced by the French mathematician Jean-Baptiste Fourier in 1807. If the signal is sinusoidal then it has only one component with one frequency but usually the signal has more than one component and is a mix of different frequencies called a spectrum of frequencies. A plot with the frequency on x-axis and the amplitude on the y-axis is called a frequency spectrum plot. If you ever looked at a stereo equalizer, you actually watched the Fourier analysis results. Fourier transform is an algorithm to transform a time domain representation of a signal into a frequency domain one. The Fast Fourier Transform (FFT) is an efficient way to do the Fourier transform. Imagine FFT as a black box. This box has two inputs: a vector with the signal samples, x(n) and the number of samples N. The output is a vector a with the amplitudes of the component frequencies denoted by ak.
● How to represent signals in frequency domain (sinusoidal signals, sum of sinusoids, periodic signals, short time Fourier analysis, spectrogram).
A spectrogram is a visual rendering of the signal’s frequency spectrum as a function of time. A spectrogram is essentially a set of short-time Fourier transforms plotted in parallel. It is a 3D-plot with frequency on y-axis, time on x-axis and amplitude on z-axis, given by the color of the pixel. Most spectrograms are displayed in shades of gray. The level of gray represents the amplitude while darker means louder.
● What means windowing? Why do we need it?
Short-time signal processing is practically always done using windowing. In short-time signal processing, signals are cut into small pieces called frames, which are processed one at a time. Frames are windowed with a function in order to improve the frequency-domain representation. The problem with windowing is that by cutting a signal (for example a vowel) in frames, a certain feature might be lost because a part of the vowel signal is in one segment and a part of a signal in another. The solution for the lost feature problem is to overlap frames. In order to smoothen the signal at the edges of the window, other windows can be applied, ex. Hamming window.
● How can we filter a sound signal?
In time domain, in frequency domain Possibly the most obvious approach to reduce the noise is to take an average. Here we show two options: simple averaging over a number of readings and taking an exponentially weighted moving average (EWMA). In a simple averaging, we have to know the number of samples to consider. The EWMA filter places more importance to more recent data by discounting older data in an exponential manner (hence the name). The moving average is the most common filter in DS, mainly because it is the easiest digital filter to understand and use. In spite of its simplicity, the moving average filter is optimal for a common task: reducing random noise retaining a sharp step response. This makes it the premier filter for time domain encoded signals. However, the moving average is the worst filter for frequency domain encoded signals, with little ability to separate one band of frequencies from another