01 - Speech Production, Perception, Phonetics Flashcards
What is an end-to-end ML model?
A ML model designed to directly map the input to the output, without relying on multiple stages or components.
What are the ethics for speech technologies?
Don’t record, don’t clone, consider
What is the main use of the respiratory system?
Breathing and to control the air pressure in speech production
What is the main use of the phonatory system?
As a system of throat valves and protective cartilage it stimulates the oral and nasal cavities to produce sounds (aka the human voice)
What is the main use of the articulatory system?
It uses the upper vocal tract to generate stimulation of the oral cavity and shape the spectral content of the voice. Aka it is how we articulate and pronounce different sounds.
How are vowel sounds produced?
By vibrating the vocal chords with no obstruction of air flow.
What is a node in a standing wave?
The place of no displacement.
What is an antinode in a standing wave?
The place of most displacement.
If the first frequency is 500 (F1 = 500) what are F2, F3 and F4?
1500 (F1x3), 2500 (F1x5) and 3500 (F1x7)
What is a consonant sound?
A sound produced by obstructing or restricting the air flow in the vocal tract.
What is decibel (dB) measuring?
It measures the ratio between two values of power on the logarithmic scale.
What is the Mel Frequency Scale?
It is a perceptually motivated frequency scale based on the human auditory system’s response to sound.
What is a phone and what is a phoneme?
A phone is a unit of sound produced by the human vocal apparatus. A phoneme is a unit of sound that distinguishes a word from another (in a given language).
What does the IPA stand for?
It stands for the International Phonetic Alphabet. NOT to be confused with the NATO Phonetic Alphabet (which is alpha, beta, charlie etc.)
IPA has two subsets in this class. They are?
ARPAbet = General American English
SAMPA = European Portuguese, English English, and GA English