Lecture 5 & 6 Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

What are 4 acoustic differences between the speech of males and females? What is the source of each difference?

A

Females have a higher F0 (shorter vocal cords)
Females have an increased vowel space (higher F0 = higher resonance bands)
Females have increased breathiness (vocal folds remain open for longer during glottal pulsing).
Females have a relatively more dominant fundamental frequency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why should a bandwidth span at least two harmonics during spectral analysis?

A

So that harmonics aren’t mistaken for formants.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Why should a bandwidth be changed for males’ vs. females’ speech?

A

Because the value that is twice the harmonic value in males (e.g. 300Hz) may not be twice the harmonics in females (i.e. their F0 is above 150Hz). A bandwidth that is too narrow will resolve individual harmonics, and these may be confused with formants.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Acoustically, what characterises infants’ vocalisations?

A

A very HIGH F0 and highly VARAIABLE F0

A very large bandwidth

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What kind of developmental changes affect the acoustic characteristics of children’s speech?

A
  1. Biological/anatomical
  2. Neurological maturation
  3. Development/refinement of speech motor control and language processing (e.g. phonological knowledge)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

In what ways does the speech of children mature?

A
  1. Fundamental frequency decreases
  2. Average duration decreases (as they become faster/more efficient in production)
  3. Variability in production decreases
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the main acoustic differences between speech spoken clearly (with care) and more informal, conversational speech?

A
  1. Slower speaking rate, greater pausing, longer segment durations
  2. Fewer reduced forms of consonants (i.e. final stops are more likely to be realised)
  3. Intensity is greater for some times of consonants, and therefore is acoustically more distinctive.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is hyperspeech?

A

When the listening environment is demanding (i.e. high levels of background noise) then more resources are devoted to articulation to make speech more clear/distinctive.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is hypospeech?

A

When the listening demands are low (i.e. quiet environment) then fewer resources are applied to articulation. Hypospeech is characterised by reduced effort, and reduced deviation of the articulators from normal.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How can we apply the hyper- hypospeech tradeoff to children with intelligibility issues?

A

Children with reduced resource capacity may be biasing hypospeech habitually because of concurrent resource demands in other cognitive domains (e.g., comprehension, semantics, working memory).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How must we consider this resource capacity/demand in therapy?

A

Consider the speaking context (e.g. reduce background noise) so that the child may devote maximum resources to output.
Encourage child to consciously utilise the hyperspeech end of their articulatory continuum to a greater extent, e.g. minimal pairs (to highlight the pragmatic consequences of using misarticulated speech)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the difference between intonation and prosody?

A
Intonation = modulation of Fø 
Prosody = a broader concept relating to intonation rhythm, rate and intensity variations.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How is prosody measured acoustically?

A

F0 contour, amplitude contour, and duration

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How is stress realised acoustically?

A
  1. Longer duration
  2. Increase in 
fundamental frequency
  3. Increase in amplitude.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the two main steps in text-to-speech synthesis?

A
  1. Text to phoneme conversion

2. Speech synthesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Describe text to phoneme conversion

A

Words and punctuation are converted to a string of phoneme symbols that are:

  1. Prosodically marked to specify stress, intonation and duration variations).
  2. Syntactically marked to disambiguate words like “object”.
17
Q

Describe phoneme to speech conversion

A

Phoneme symbols are computed to a digital sound wave, which is converted (via A-D conversion) to an electrical current that drives a loudspeaker.

18
Q

What are the two main types of speech synthesisers?

A

Formant synthesis and concatenated speech

19
Q

Describe formant synthesis

A

Two types of sound scources are used: voice pulses (e.g. for vowels) and noise (e.g. for fricatives)
Each phoneme is specified by its acoustic features (fundamental frequency, duration, noise amplitude etc.).
Certain filters are applied in order to produce the specified sound.

20
Q

How are rules specified in formant synthesis?

A

Rules are set automatically, and include contextual constraints (i.e. adjacent phonemes and sentence prosody). For example, formant frequencies of a vowel will change in a particular direction according to the following vowel.

21
Q

How often are the acoustic parameters specified in formant synthesis? Why?

A

Every 5-6ms , to reflect the dynamic changes in the spectrum that are characteristic of human speech

22
Q

What is an advantage of formant synthesis?

A
  1. Relatively high efficiency (rules do not take up a large amount of memory)
  2. Fast output rates of up to 600 words per minute
  3. Highly flexible – can provide a reasonable pronunciation of unknown words
  4. Reasonable intelligibility
23
Q

What is a disadvantage of formant synthesis?

A
  1. Not natural sounding – sounds robotic (particular difficulty with prosody)
  2. Development time is long (because of the large number of rules to be created) and therefore the products are expensive
24
Q

Describe concatenation speech synthesis.

A

This approach stores, concatenates (links together) and smooths sections of PRE-RECORDED speech. Digital recordings of an actual speaker are broken up into small units, which can be reconstituted into larger novel utterances.

25
Q

Is the phoneme a useful unit of speech to concatenate? Why?

A

No – because the number of possible transitions from one phoneme to another is large (and doesn’t result in a smooth transition)

26
Q

What units of speech are more successful to concatenate?

A

Syllables, diphones, and disyllables (as well as individual words in some systems). A fairly manageable number of digitally recorded syllables can be stored and combined into virtually any English utterance. Syllables are also useful because they have transitions built into them.

27
Q

What are diphones?

A

Diphones are formed by placing cuts in recorded CV or VC syllables at the steady state vowel component. All words that share a particular phoneme sequence use the same diphone for that sequence. Diphones are recombined to form longer utterances. E.g. the words ‘cat’ and ‘cab’ use the same diphone to produce /ca/. The next diphone would differ though: cat would use /at/ and cab would use /ab/. Therefore the production of the /a/ sound is based on the combination of the two diphones.

28
Q

How many diphones are needed to synthesise English speech?

A

About 2000

29
Q

What are disyllables?

A

They are half a syllable – either the onset up to the middle of the vowel, or from the middle of the vowel to the end of the coda.

30
Q

What is an advantage of using concatenation synthesised speech?

A

It is more natural sounding and intelligible than formant speech synthesis. They have shorter development times, and are relatively less expensive

31
Q

What is a disadvantage of using concatenation synthesised speech?

A

It requires a great deal of memory storage (although this is becoming less of a problem with the memory capacity of modern computers. Prosody is just as unnatural as formant synthesis, as well as discontinuities in the middle of vowels and between syllables.

32
Q

Does perceiving synthesised speech require more or less cognitive effort on behalf of the listener?

A

More

33
Q

What are some strategies to enhance the intelligibility of synthesised speech?

A

Slow down the speech rate, include visual texts

34
Q

In synthesised speech, what kinds of grammatical structures are perceived best?

A

Sentences are understood better than single words (indicating top-down processing). Simple sentences with high probability words are better understood than grammatically complex sentences.

35
Q

Give four examples of the clinical application of synthesised speech

A
  1. Reading instruction – useful for children and adults with developmental or acquired reading disorders
  2. Communication aids – people with speech production disabilities (e.g. dysarthria) can type a message, which will be spoken aloud
  3. Multilingual communication systems – which require speech recognition, linguistic translation, and speech synthesis – still in early stages
  4. Research – the researcher can control acoustic features of speech, and therefore investigate the acoustic cues used during speech perception
36
Q

What is a speech recognition system?

A

Convert spoken utterances into phonemic symbols and then to text in a word processing document.

37
Q

What is a speech understanding system?

A

Goes further than a speech recognition system by taking the words and analysing semantics

38
Q

What is a speech language system?

A

Speech language systems incorporate speech recognition, understanding and speech synthesis to provide a communicative interaction with a person.