SPEECH PROCESSING Flashcards

1
Q

What are the three parts of Speech Processing

A

Speech Coding
Speech Synthesis
Speech Recognition

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is speech coding

A

compressing speech to mp3 format

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is speech synthesis

A

Construct speech waveform from words
can be speaker quality or accent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is speech recognition

A

field of developing methodologies and technologies to translate spoken language into text by computers
(used in voice assistants)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the 5 components of speech recognition

A

1)Audio input
2)Feature extraction
3)Language modelling
4)Pattern matching
5)Output generation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is audio input

A

This is the human speech that the user provides as input to the device

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is feature extraction

A

analyses the audio signal to extract relevant features that can be used for further processing. Features include:
pitch, intensity, and spectral properties of the
signal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is Pattern matching

A

extracted features are compared to a database of speech patterns to identify the words spoken by the user
This database, “acoustic model” is created from ML on large amount of speech data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is Language modelling

A

system uses a probabilistic model to predict which words are likely to occur next in the user’s speech
based on the context and grammar of the language

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the Output generation

A

system converts them into text or actions based on the user’s intention, which can then be used for various applications

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the 3 speech recognition techniques

A

Acoustic phonetic approach
Hidden Markov Model (HMM) based approaches
Deep learning approaches

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Speech recognition techniques: HMM model

A

model speech as a sequence of states, where each state corresponds to a specific segment of speech

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are some speech-related applications

A

Speech processing (book flight over the phone)
Information extraction
Machine translation
question answering
summarisation
eg customer service, transcription services, language learning, automotive systems, accessibility aids

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Speech recognition techniques: Acoustic phonetic approach

A

involves analysing the acoustics of speech to identify phonetic units, the smallest units of sound that make up words
requires a deep understanding of phonetics and relies on the analysis of a speech’s frequency components

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Speech recognition techniques: Deep Learning

A

branch of Machine Learning based on a set of algorithms to model high level abstractions in data
uses deep graphs with multiple hierarchal processing layers, composed of multiple linear and non-linear transformations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Why is speech recognition difficult: linguistic prespective

A

Many sources of variation
speaker tuned for a particular speaker or speaker-independent
environment noise
planned monologue or conversation

17
Q

Why is speech recognition difficult: ML prespective

A

as a classification problem: high dimensional output space
seq2seq: very long input sentence
data is often noisy
manual speech transcription is expensive

18
Q

How is speech produced?

A

Vocal chords vibrate
acoustic tube alters shape of vocal tract
creates a sound pressure wave

19
Q

Voice parameters: pitch-frequency?

A

Frequency of vocal chord vibration

20
Q

Voice parameters: Jitter?

A

Variation of pitch-frequency

21
Q

Voice parameters: Shimmer?

A

Variation in amplitude of glottal pulses

22
Q

Voice parameters: Tremor?

A

Combined jitter & shimmer

23
Q

Voice parameters: Harmonic/ noise-ratio?

A

Ratio of periodic to aperiodic energy or cycle to cycle differences in voice signal