Speech Recognition Flashcards

1
Q

what are acoustic phonetics?

A

study of the physical properties of speech

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what is sound?

A

a vibration that propagates as an acoustic wave
(based on the perception of its characteristics)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what is frequency?

A

the number of times per second a sound wave cycles from the highest to the lowest point.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what is amplitude?

A

height of the wave
taller the wave = louder the sound

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is a sound spectrogram?

A

is a visual representation of the spectrum of frequencies of sound

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

axis’s of sound spectrogram?

A

frequency of sound on vertical axis, time on horizontal axis, intensity shown by darkness

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

sound spectrogram formats?

A

dark bands (i.e., most intensity)
▪ Steady state formant (stays same over time)
▪ Formant transitions (changes over time)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

problems posed for speech recognition?

A
  • lack of invariance
  • problem in speaker variability
  • segmentation problem
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what is lack of invariance?

A

no one-to-one correspondence between speech cues and perception

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what is the problem in speaker variability?

A

People differ in production of speech sounds –across people and occasions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is a segmentation problem?

A

people typically do not leave breaks between words when speaking.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what is categorical perception?

A

We do not discriminate sounds within a phonemic category
- ex: we classify speech sounds as one phoneme or another

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

modularity (revisited) and categorical perception?

A
  • Some people have taken categorical perception as evidence for a speech perception module
  • chinchillas show categorical perception
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what are some speech segment strategies?

A
  • possible word constraint: tendency to segment speech so that each segment is a possible word
  • Bilingual speakers tend to use strategies that are consistent with their dominant language
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what does context and speech recognition involve? when are people better at identifying words?

A
  • people are better at identifying words when presented in sentences than when presented in isolation
  • speech recognition involves bottom-up and top-down processing
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what are the two views of context and speech recognition?

A
  • Autonomous view: context has effect after lexical access
  • Interactionist view: allows context to affect earlier (lower) levels of processing
17
Q

context and speech recognition examples?

A
  • When shadowing, Ps have a tendency to correct speech errors (e.g., Marslen-Wilson & Welsh)
  • Phonemic restoration effect (e.g., Warren)
  • Semantic & syntactic factors in speech perception (e.g., Miller & Isard)
18
Q

what does it mean when shadowing, Ps have a tendency to correct speech errors?

A

Ps are more likely to correct when the..
- context is highly predictable (role of semantic & syntactic factors)
- presented phoneme differed from the target phoneme by fewer distinctive features (role of bottom-up processing = gar v. car)

19
Q

what is the phonetic restoration effect?

A

the illusion that a phoneme deleted from a string of speech is actually there.
(ex: coughing replacing letter)

20
Q

what are semantic & syntactic factors in speech perception?

A

Ps shadow word strings in varying degrees of background noise
- 3 types of word strings:
* grammatical
* anomalous
* ungrammatical

21
Q

what is prosody?

A

tune and rhythm of speech

22
Q

Prosodic factors in speech recognition?

A
  • Stress
  • speech rate
  • characteristics of individual speakers
23
Q

what is the mcgurk effect?

A
  • Hear /ba/
  • See /ga/
  • Both –perceive /da/
    *** Importance of visual & auditory information for speech perception
24
Q

what are the two models of speech recognition?

A
  • cohort
  • TRACE
25
what is cohort?
Spoken word recognition occurs in stages: 1. Access stage: set up initial cohort, strictly bottom-up process 2. Selection stage: words are eliminated from cohort until 1 item is left 3. Integration stage: the selected item is integrated into the representation of the utterance
26
what is the selection stage based on?
- additional phonemic information - context of spoken sentence (early version of model)
27
what part of the word do we pay more attention to?
the beginning - supported through selection stage
28
what is the TRACE model?
take all of the various sources of information found in speech and integrate them to identify single words.
29
what are connectionist models composed of?
- they contain a system of interconnected nodes - they have excitatory (facilitatory) and inhibitory connections between nodes - processing is massively parallel - has both top-down and bottom-up processing
30
what is lexical access?
the retrieval of words from the mental lexicon, both in recognition and in production.
31
what is the TRACE model composed of?
- connectionist model - 3 levels in the network: word, phoneme, and feature - incorporates top-down effects on the activation of features
32
what is lexicon?
the vocabulary of a person, language, or branch of knowledge.
33
examples of lexicon?
"No-hitter," "go-ahead run," and "Baltimore chop" are part of the baseball lexicon.
34
example of prosidy?
"Yeah, that was a great movie," can mean that the speaker liked the movie or the exact opposite, depending on the speaker's intonation.
35
what is coarticulation (lack of invariance)?
process of articulating more than one phoneme at a time
36
what is segmentation?
the process of dividing the speech signal into component words
37
example of segmentation?
when we spell the word dog, we separate it into its three separate sounds: /d/-/o/-/g/
38
cohort model example?
discriminating between Crocodile and Dial, the point of recognition to discriminate between the two words comes at the /d/ in crocodile which is much earlier than the /l/ sound in Dial.