Lecture 4 - Time Domain Flashcards
What are the three broad categories of speech sounds?
- voiced
- unvoiced
- silence
Give the different intervals for short, medium and long in milliseconds and their applications.
- Short intervals; 5-20 msec
- uncertainty due to small amount of data, varying pitch, varying amplitude. - Medium intervals; 20-100 msec
- uncertainty due to changes in sound quality, transitions between sounds, rapid transients in speech - Long intervals; 100-500 msec
- Uncertainty due to large amount of sound changes. Used in cases like finding the audio quality in a google hangout.
What’s the typical window overlap in speech.
50%; when we have a hamming window, we want 50% overlap as it gets the rest of the values.
Give the difference between a rectangular and hamming window for speech signals.
- Rectangular is simple to implement but in frequency domain -> narrow wideband and large sidebands.
- Hamming window; BW of hamming twice rectangular and attenuation greater outside passband.
What is the downside to zero-crossing?
When there is DC-offset, the signal is shifted up. This can decrease the amount of ZC in a system.
What value is the most dominant energy for voiced and unvoiced speech?
Voiced speech; 700 Hz
Unvoiced speech; 2.5 kHz
What is the range threshold between voiced and unvoiced.
voiced < 1.5 kHz
unvoiced > 1.5 kHz
What is autocorrelation?
Autocorrelation measures the relationship between a time series and a lagged version of itself over successive time intervals.
e.g. Tracking the temperature of a city every day. If today’s temperature is similar to yesterday’s, and yesterday’s temperature is similar to the day before, we say that the temperature data is autocorrelated.
Why is pitch tracking useful
- Pitch helps distinguish speakers or emotions.
- Can be used in tone analysis; in languages like mandarin.
What are the challenges in pitch tracking?
- Noise
- Multiple fundamental frequencies
- Is there a speaker?
Give reliable algorithms for pitch tracking?
- YIN
- RAPT