SPEECH PROCESSING Flashcards
What are the three parts of Speech Processing
Speech Coding
Speech Synthesis
Speech Recognition
What is speech coding
compressing speech to mp3 format
What is speech synthesis
Construct speech waveform from words
can be speaker quality or accent
What is speech recognition
field of developing methodologies and technologies to translate spoken language into text by computers
(used in voice assistants)
What are the 5 components of speech recognition
1)Audio input
2)Feature extraction
3)Language modelling
4)Pattern matching
5)Output generation
What is audio input
This is the human speech that the user provides as input to the device
What is feature extraction
analyses the audio signal to extract relevant features that can be used for further processing. Features include:
pitch, intensity, and spectral properties of the
signal
What is Pattern matching
extracted features are compared to a database of speech patterns to identify the words spoken by the user
This database, “acoustic model” is created from ML on large amount of speech data
What is Language modelling
system uses a probabilistic model to predict which words are likely to occur next in the user’s speech
based on the context and grammar of the language
What is the Output generation
system converts them into text or actions based on the user’s intention, which can then be used for various applications
What are the 3 speech recognition techniques
Acoustic phonetic approach
Hidden Markov Model (HMM) based approaches
Deep learning approaches
Speech recognition techniques: HMM model
model speech as a sequence of states, where each state corresponds to a specific segment of speech
What are some speech-related applications
Speech processing (book flight over the phone)
Information extraction
Machine translation
question answering
summarisation
eg customer service, transcription services, language learning, automotive systems, accessibility aids
Speech recognition techniques: Acoustic phonetic approach
involves analysing the acoustics of speech to identify phonetic units, the smallest units of sound that make up words
requires a deep understanding of phonetics and relies on the analysis of a speech’s frequency components
Speech recognition techniques: Deep Learning
branch of Machine Learning based on a set of algorithms to model high level abstractions in data
uses deep graphs with multiple hierarchal processing layers, composed of multiple linear and non-linear transformations