Lecture 8 - From HMMs to End-to-End Systems Flashcards

1
Q

In a Large Vocabulary Speech Contunuous Recognition, how many words are there?

A

80,000 - 100,000

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

When creating an ASR, some questions you might ask in designing would be is is constrained, natural speech, small or large vocabulary?

Explain the difference between a small and large vocabulary

A

Small vocabulary
- Isolated word, each word dedicated acoustic model

Large vocabulary
- Model at sub-word level
- Acoustic models for each phoneme
- Words recognised as sequences of models concatenated.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the disadvantage of HMFCCs?

A

HMFCCs are not noise robust.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

When using HMMs for ASR, what can it be considered for?

A

HMMs can be considered as the acoustic model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Explain the difference between small and large vocabulary when using HMMs as the acoustic model

A
  1. Small vocabulary
    - Word-level HMM
  2. Large vocabulary
    - Phone-level HMM (40 monophones)
  • 2-state HM is used to model a phoneme.
  • Words built from phonemes.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

A problem for HMMs in ASR is that given an observation sequence, how to compute what is the most likely state sequence to produce that observation sequence

What is the solution to this?

A

Using the viterbi algorithm.

The viterbi algo defines best score along a single path, at time t, that accounts for the first t observations and ends in state Si

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the challenges in an ASR system?

A
  1. Atypical speakers (i.e. children, speech impediments)
  2. Colloquiums, um, er, coughs
  3. Noise - incorporate visual information
  4. emotion and intent
  5. Limits of current approaches -> use of DL
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the McGurk effect?

A

We don’t perceive speech just from sound, it is audio-visual.

The shape that the mouth makes also has an influence on the type of sound being perceived.

Audio ‘ba’ + video ‘fa’ perceive ‘fa’

Audio ‘ba’ + video ‘ba’ perceive ‘ba’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly