Lecture 8 - From HMMs to End-to-End Systems Flashcards

Question 1

Q

In a Large Vocabulary Speech Contunuous Recognition, how many words are there?

Answer

A

80,000 - 100,000

Question 2

Q

When creating an ASR, some questions you might ask in designing would be is is constrained, natural speech, small or large vocabulary?

Explain the difference between a small and large vocabulary

Answer

A

Small vocabulary
- Isolated word, each word dedicated acoustic model

Large vocabulary
- Model at sub-word level
- Acoustic models for each phoneme
- Words recognised as sequences of models concatenated.

Question 3

Q

What is the disadvantage of HMFCCs?

Answer

A

HMFCCs are not noise robust.

Question 4

Q

When using HMMs for ASR, what can it be considered for?

Answer

A

HMMs can be considered as the acoustic model

Question 5

Q

Explain the difference between small and large vocabulary when using HMMs as the acoustic model

Answer

A

Small vocabulary
- Word-level HMM
Large vocabulary
- Phone-level HMM (40 monophones)

2-state HM is used to model a phoneme.
Words built from phonemes.

Question 6

Q

A problem for HMMs in ASR is that given an observation sequence, how to compute what is the most likely state sequence to produce that observation sequence

What is the solution to this?

Answer

A

Using the viterbi algorithm.

The viterbi algo defines best score along a single path, at time t, that accounts for the first t observations and ends in state Si

Question 7

Q

What are the challenges in an ASR system?

Answer

A

Atypical speakers (i.e. children, speech impediments)
Colloquiums, um, er, coughs
Noise - incorporate visual information
emotion and intent
Limits of current approaches -> use of DL

Question 8

Q

What is the McGurk effect?

Answer

A

We don’t perceive speech just from sound, it is audio-visual.

The shape that the mouth makes also has an influence on the type of sound being perceived.

Audio ‘ba’ + video ‘fa’ perceive ‘fa’

Audio ‘ba’ + video ‘ba’ perceive ‘ba’

Lecture 8 - From HMMs to End-to-End Systems Flashcards

(8 cards)