Introduction to speech perception Flashcards

1
Q

What are some challenges of speech perception?

A

Recording of sentence “he guessed the answer to the question in the exam”

  1. Unlike written language, no clear gaps between words
    Eg. answer is one word but there might be 2 acoustic events here. Conversely in and the are two different words but theres no gap between the signal.
  2. “the” sounds different in different positions (co-articulation- the acoustic realisation of speech depends on what you’ve just said and what you’re about to say)
    This adds variability to acoustic speech and can make it hard to understand for a computer
  3. Accent, gender and speaking rate
  4. Time constraints
    - We hear up to 200 words per minute
    - Sound is fleeting (sound is always changing, a temporal signal)
    - “Now-or-never bottleneck” - speech is coming in quickly, sound doesn’t stay static- need to quickly process the word you’ve just heard before the next word comes in
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why study speech perception?

A
  • Primary need in which we communicate
  • More broadly- reading- learning to read requires you to learn the relationship between letters and speech sounds (Phoenix)
  • Listeners who have some form of hearing loss. Cochlear implant which directly stimulates the brain. This restores hearing to some extent so adapting to an implant requires the brain to adapt to novel sensory information.
  • Individuals with developmental language disorder- helpful for understanding whats going on and developing strategies to help them
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How do we produce speech?
- what does speech require
- ____ pushes air to _____
- what does this result in
- what are sounds shaped by
- including?
- what are these structures important for?

A
  1. Speech requires a basic energy source. This initial energy source is provided by the lungs
  2. The lungs push air up the trachea (windpipe)
  3. which vibrates the vocal cords in the larynx (voicebox)
  4. Sounds from the vocal cords are then shaped by the supralaryngeal (all the structures above the larynx) vocal tract, including:
    - Pharynx
    - Oral cavity (and lips, tongue, teeth)
    - Nasal cavity
  5. These structures are important for shaping the sounds - you need these for intelligible speech
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What method can be used to see speech production?

A

MRI

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Describing speech: Consonants

How are consonants produced?

A

With a constriction in the vocal tract

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Describing speech: Consonants
What are the 3 main features it’s classified by?

A
  1. Stop- for these consonants, the constriction thats happening is a complete constriction (air flow stops completely). These are voice consonants because vocal cords are vibrating.
  2. Fricative- constriction doesn’t happen completely
  3. Nasal- air flow is redirected to nasal cavity
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Describing speech: Consonants

A

Stop:
+voice: b, d, g
-voice: p, t, k

d- constriction is happening when tongue touches upper teeth
g- tongue is touching the back of the mouth

Fricative:
+voice: v, z
-voice: f, s

Nasal:
m, n, n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are sound waves?

A

Periodic displacement of air molecules, creating increases and decreases in air pressure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Speech as sound waves:

  • what is happening
  • what is formed
A
  • vibrating source (plate thats moving back and force), this movement is moving the air molecules around (vibration of vocal cords). These are then going to be picked up by the ear and the ear will change these to a sensation sound.
  • Plotting changes in sound pressure over time, at certain moments the air molecules come together and theres an increase in pressure.
  • Sound waveform is formed and perceived by brain.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

In relation to a sound waveform, what is amplitude and period?

A

Amplitude:
- related to loudness
- larger the peaks the louder

Period:
- inversely related to frequency; important cue to pitch
- peaks closer together = higher frequency and pitch

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Speech as sound waves:
- what is speech associated with?
- how do you get speech?
- what is speech a mix of?

A

Speech is more complicated than sounding like a beep- theres more variations and its more complex.

Theres a relationship between what it looks like for a simple tone and more complicated. How you get speech is essentially a mix of sounds together- if you shape the amplitude over time, you will get this overtime.

Speech is a mix of lots of simpler sound creating this more complex speech.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Spectrogram: Analysing the frequencies of speech

1- what is a spectogram?
2- difference between dark grey and light grey?
3- why is useful?
4- what is being split?

A
  1. A spectrogram is a graph showing how sound amplitude varies as a function of time (x-axis) and frequency (y-axis)
  2. Dark grey = large amplitude, light grey = small amplitude
  3. Useful because the ear splits sound by frequency so better captures the information available to the brain.
  4. Split this sound into different frequency components. Brain and ears are splitting the information by frequency channel
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Adding source and filter to how we produce speech

A

The lungs push air up the trachea (windpipe)

Which vibrates the vocal cords in the larynx (voicebox) → ‘Source’

Sounds from the vocal cords are then shaped by the supralaryngeal vocal tract → ‘Filter’

  • Pharynx
  • Oral cavity (and lips, tongue, teeth)
  • Nasal cavity
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Source-filter theory

Source only

A

Source (vocal cords) important for voice pitch and intonation

It provides some info such as voice pitch info

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Source-filter theory

Source + filter

A

This shows how important the filter is for making intelligible speech

Filter (supralaryngeal vocal tract) important for producing different speech sounds (phonemes)

Filtering appears as bands of energy at certain frequencies called ‘formants’ (in Latin, “formare” = “to shape”)

The lowest three formant frequencies are the most important for speech intelligibility (labelled F1, F2 and F3)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Source-filter theory: Vowels

  1. What happens when changing from front to back vowels?
  2. What happens when changing from high to low vowels?
A
  1. Changing from front to back vowels e.g. “heed” vs “had” at F2 frequency decreases
  2. Changing from high to low vowels e.g. “heed” vs “hod” at F1 frequency increases
17
Q

Source-filter theory: Vowels
Key Point

A

So your brain can know which vowel it is hearing by detecting these auditory “cues”

18
Q

Source-filter theory: Consonants
What are important cues for identifying consonants?

A

Second and third formants (F2 and F3) are important cues for identifying consonants

For each of these consonants, when looking at the beginning they take on a different shape for p, t and k.

19
Q

How do we perceive phonemes?
(Categorical perception and how to demonstrate it)

3 things

A
  1. Set up a continuum of sounds between two phonemes
  2. Run an identification experiment
  3. Run a discrimination experiment
20
Q

How do we perceive phonemes:
1. Set up a continuum of sounds between two phonemes

A

Different sounds on each ends of the continuum

In the middle point in the continuum is ambiguous between ‘ba’ and ‘da’. In the middle theres an intermediary between the two.

You hear ‘ba’ in the beginning, then something intermediary between the two, then by the end it’s a clear ‘da’

21
Q

How do we perceive phonemes:
2. Run an identification experiment

A

Identify if the sound you’re hearing is a ba or da sound.

Plot the percentage of responses.

When you’re hearing a clear unambiguous ba, most of the time people will respond with ba.

On the other end of the continuum when the sound is a clear da, hardly any of the time, they’re responding with ba (they’re responding with da instead)

If we find the point on this graph where listeners are equally likely to respond ‘ba’ and ‘da’- this is referred to as the phoneme boundary.

One of the main signatures of categorical perception is that around the phoneme boundary, you have an abrupt transition in this graph. Perception suddenly changes.

22
Q

How do we perceive phonemes?
3. Run a discrimination experiment

A

Play pairs of adjacent sounds on the continuum and ask them to say if it’s the same or different

Plot the % of different responses

Discrimination peak near the phoneme boundary

23
Q

What is categorical perception?

A

The tendency to perceive gradual sensory changes in a discrete fashion

24
Q

What are three hallmarks of categorical perception?

A
  1. Abrupt change in identification at phoneme boundary
  2. Discrimination peak at phoneme boundary
  3. Discrimination predicted from identification (only sound “different” if classify the sound as a different phoneme)
25
Q

Yanny or Laurel? Categorical perception in action

A

Yanny- 47%
Laurel- 53%

Sound has to be ambiguous but your brain can’t help but latch onto a specific interpretation- not an intermediate mix between the two.

Your brain in terms of understanding speech will try to latch onto a specific interpretation

26
Q

Context influences speech perception:
Green needle/ brainstorm

A

Exactly the same sound but different expectations each time so changes how you perceiving it

Example of how speech perception depends on prior knowledge of context

27
Q

Context influences speech perception:
Visual context “McGurk effect”

A

You hear one thing and see another thing. What you perceive is changed by what you see. Prior context effect because lip movements tend to perceive the speech that you hear and this influences what you perceive.

28
Q

Context influences speech perception:
Lexical context “Ganong effect”

A
  • Listener has to do identification task
  • Plotting % of g responses
  • Present in an iss sound
  • Then present an ift sound

The graph shows

When you’re at the mid point- when the ambiguous sound is placed in a ift context and you show bias towards g- in combination with ift g will make a word

Placing in front of iss- you are biased towards k and this is because kiss is a real word and giss isn’t.

Exactly the same sound- ambiguous between g and k- even though its the same sound- your senses are bias towards an interpretation that makes a real word.