SPEECH PROCESSING Flashcards

Question 1

Q

What are the three parts of Speech Processing

Answer

A

Speech Coding
Speech Synthesis
Speech Recognition

Question 2

Q

What is speech coding

Answer

A

compressing speech to mp3 format

Question 3

Q

What is speech synthesis

Answer

A

Construct speech waveform from words
can be speaker quality or accent

Question 4

Q

What is speech recognition

Answer

A

field of developing methodologies and technologies to translate spoken language into text by computers
(used in voice assistants)

Question 5

Q

What are the 5 components of speech recognition

Answer

A

1)Audio input
2)Feature extraction
3)Language modelling
4)Pattern matching
5)Output generation

Question 6

Q

What is audio input

Answer

A

This is the human speech that the user provides as input to the device

Question 7

Q

What is feature extraction

Answer

A

analyses the audio signal to extract relevant features that can be used for further processing. Features include:
pitch, intensity, and spectral properties of the
signal

Question 8

Q

What is Pattern matching

Answer

A

extracted features are compared to a database of speech patterns to identify the words spoken by the user
This database, “acoustic model” is created from ML on large amount of speech data

Question 9

Q

What is Language modelling

Answer

A

system uses a probabilistic model to predict which words are likely to occur next in the user’s speech
based on the context and grammar of the language

Question 10

Q

What is the Output generation

Answer

A

system converts them into text or actions based on the user’s intention, which can then be used for various applications

Question 11

Q

What are the 3 speech recognition techniques

Answer

A

Acoustic phonetic approach
Hidden Markov Model (HMM) based approaches
Deep learning approaches

Question 12

Q

Speech recognition techniques: HMM model

Answer

A

model speech as a sequence of states, where each state corresponds to a specific segment of speech

Question 13

Q

What are some speech-related applications

Answer

A

Speech processing (book flight over the phone)
Information extraction
Machine translation
question answering
summarisation
eg customer service, transcription services, language learning, automotive systems, accessibility aids

Question 14

Q

Speech recognition techniques: Acoustic phonetic approach

Answer

A

involves analysing the acoustics of speech to identify phonetic units, the smallest units of sound that make up words
requires a deep understanding of phonetics and relies on the analysis of a speech’s frequency components

Question 15

Q

Speech recognition techniques: Deep Learning

Answer

A

branch of Machine Learning based on a set of algorithms to model high level abstractions in data
uses deep graphs with multiple hierarchal processing layers, composed of multiple linear and non-linear transformations

Question 16

Q

Why is speech recognition difficult: linguistic prespective

Answer

Study These Flashcards

A

Many sources of variation
speaker tuned for a particular speaker or speaker-independent
environment noise
planned monologue or conversation

Question 17

Q

Why is speech recognition difficult: ML prespective

Answer

Study These Flashcards

A

as a classification problem: high dimensional output space
seq2seq: very long input sentence
data is often noisy
manual speech transcription is expensive