SR Flashcards
What is Speech Recognition?
is the interdisciplinary subfield of computational linguistics that develops methodologies and technologies that enables the recognition and translation of spoken language into text by computers
ASR
automatic speech recognition
STT
speech to text
Speech Signal:
Amplitude/Time
Fundamental problem
1 Given:
2 Wanted:
3 Search:
1: an observation (ADC ,FFT) X = x1, x2, … , xT
2: the corresponding word sequence W = w1, w2, … , wm
3: the most likely word sequence W’
W’
= arg max(w) P(W|X)
P( W|X )
p( X|W ) * P( W ) / p( X )
P( X|W ) The acoustic modeI
how likely is it to observe X when W is spoken
P( W ) The language model
how likely is it W is spoken -priori
What is X ?
The Problem of Pre-Processing (Vorverarbeitung)
What is p( X|W ) ?
The Problem of Acoustic Modelling (Akustische Modellierung)
What is P( W ) ?
The problem of Language Modelling (Sprachmodellierung)
How do we find argmax W?
The Search problem (Suche)