04 - Speech Pattern Classification Flashcards
What is meant by linguistic information?
Information that is explicitly in or almost uniquely inferable from the written message.
What is meant by paralinguistic information?
Information that is not inferable from the written message, but is added by the speaker to complement the linguistic information. Attitude, intonation, etc. “I am SOO excited!”
What is meant by Nonlinguistic information?
Information about other factors such as age, gender, idiosyncrasy (personal traits), physical emotion. In general, conditions that are not related to the linguistic contents and cannot be controlled by the speaker. Example: Crossing your arms in protest.
Speech Pattern Classification refers to?
The extraction of information from a speech such as language, accent etc. and to take an input and convert it into a sequence of class labels.
In speech classification we normally divide into 3 models. These are?
Acoustic model, language model, pronunciation model
The 2 main blocks of the speech pattern classifications process are?
Feature extraction and classification.
What is the local region of analysis?
This is the framing of the data.
What is the global region of analysis?
This is the functionals. For example the mean, median, max etc.
What does ASR stand for?
Automatic Speech Recognition
What is the segmental region of analysis?
This is the phonemes, voiced/unvoiced, word, etc.
Spectral features in feature extraction are?
Classical speech (ASR) features, spectral measures. This is most likely the MFCC’s.
Prosodic features in feature extraction are?
Pitch, energy, formants, timing, articulation etc.
What are the delta-coefficients?
Delta features (first-order derivatives) provide information about the rate of change of the acoustic features over time. They are obtained by computing the differences between consecutive frames of the acoustic features. Delta features capture the dynamics of the speech signal and can help in modeling the transitions between different phonetic units.
What are the double-delta coefficients?
Double-delta features (second-order derivatives) provide information about the acceleration or curvature of the acoustic features. They are computed by taking the differences between consecutive delta features. Double-delta features capture the changes in the rate of change of the acoustic features and can provide additional temporal information beyond the delta features.
When do we use the Cepstral Mean (Variance) Normalization (CMN/CMVN)?
We use it in the pre-processing before conducting the actual analysis and we do it to reduce variation in various channels.
Prosodic features refers to 4 things. These are?
Fundamental frequency (F0): mean, median, pitch contour etc.
Energy: shimmer, energy contours, voice level etc.
Duration: Speech rate, ratio of duration of voiced/unvoiced regions etc.
Formants: first to fourth formants, bandwidths etc.