Speech as sound waves: - what is happening - what is formed

- vibrating source (plate thats moving back and force), this movement is moving the air molecules around (vibration of vocal cords). These are then going to be picked up by the ear and the ear will change these to a sensation sound. - Plotting changes in sound pressure over time, at certain moments the air molecules come together and theres an increase in pressure. - Sound waveform is formed and perceived by brain.

Introduction to speech perception Flashcards by Tilly Phillips

What are some challenges of speech perception?

Recording of sentence “he guessed the answer to the question in the exam”

Unlike written language, no clear gaps between words
Eg. answer is one word but there might be 2 acoustic events here. Conversely in and the are two different words but theres no gap between the signal.
“the” sounds different in different positions (co-articulation- the acoustic realisation of speech depends on what you’ve just said and what you’re about to say)
This adds variability to acoustic speech and can make it hard to understand for a computer
Accent, gender and speaking rate
Time constraints
- We hear up to 200 words per minute
- Sound is fleeting (sound is always changing, a temporal signal)
- “Now-or-never bottleneck” - speech is coming in quickly, sound doesn’t stay static- need to quickly process the word you’ve just heard before the next word comes in

How well did you know this?

Not at all

Perfectly

Why study speech perception?

Primary need in which we communicate
More broadly- reading- learning to read requires you to learn the relationship between letters and speech sounds (Phoenix)
Listeners who have some form of hearing loss. Cochlear implant which directly stimulates the brain. This restores hearing to some extent so adapting to an implant requires the brain to adapt to novel sensory information.
Individuals with developmental language disorder- helpful for understanding whats going on and developing strategies to help them

How well did you know this?

Not at all

Perfectly

How do we produce speech?
- what does speech require
- ____ pushes air to _____
- what does this result in
- what are sounds shaped by
- including?
- what are these structures important for?

Speech requires a basic energy source. This initial energy source is provided by the lungs
The lungs push air up the trachea (windpipe)
which vibrates the vocal cords in the larynx (voicebox)
Sounds from the vocal cords are then shaped by the supralaryngeal (all the structures above the larynx) vocal tract, including:
- Pharynx
- Oral cavity (and lips, tongue, teeth)
- Nasal cavity
These structures are important for shaping the sounds - you need these for intelligible speech

How well did you know this?

Not at all

Perfectly

What method can be used to see speech production?

MRI

How well did you know this?

Not at all

Perfectly

Describing speech: Consonants

How are consonants produced?

With a constriction in the vocal tract

How well did you know this?

Not at all

Perfectly

Describing speech: Consonants
What are the 3 main features it’s classified by?

Stop- for these consonants, the constriction thats happening is a complete constriction (air flow stops completely). These are voice consonants because vocal cords are vibrating.
Fricative- constriction doesn’t happen completely
Nasal- air flow is redirected to nasal cavity

How well did you know this?

Not at all

Perfectly

Describing speech: Consonants

Stop:
+voice: b, d, g
-voice: p, t, k

d- constriction is happening when tongue touches upper teeth
g- tongue is touching the back of the mouth

Fricative:
+voice: v, z
-voice: f, s

Nasal:
m, n, n

How well did you know this?

Not at all

Perfectly

What are sound waves?

Periodic displacement of air molecules, creating increases and decreases in air pressure

How well did you know this?

Not at all

Perfectly

Speech as sound waves:

what is happening
what is formed

vibrating source (plate thats moving back and force), this movement is moving the air molecules around (vibration of vocal cords). These are then going to be picked up by the ear and the ear will change these to a sensation sound.
Plotting changes in sound pressure over time, at certain moments the air molecules come together and theres an increase in pressure.
Sound waveform is formed and perceived by brain.

How well did you know this?

Not at all

Perfectly

In relation to a sound waveform, what is amplitude and period?

Amplitude:
- related to loudness
- larger the peaks the louder

Period:
- inversely related to frequency; important cue to pitch
- peaks closer together = higher frequency and pitch

How well did you know this?

Not at all

Perfectly

Speech as sound waves:
- what is speech associated with?
- how do you get speech?
- what is speech a mix of?

Speech is more complicated than sounding like a beep- theres more variations and its more complex.

Theres a relationship between what it looks like for a simple tone and more complicated. How you get speech is essentially a mix of sounds together- if you shape the amplitude over time, you will get this overtime.

Speech is a mix of lots of simpler sound creating this more complex speech.

How well did you know this?

Not at all

Perfectly

Spectrogram: Analysing the frequencies of speech

1- what is a spectogram?
2- difference between dark grey and light grey?
3- why is useful?
4- what is being split?

A spectrogram is a graph showing how sound amplitude varies as a function of time (x-axis) and frequency (y-axis)
Dark grey = large amplitude, light grey = small amplitude
Useful because the ear splits sound by frequency so better captures the information available to the brain.
Split this sound into different frequency components. Brain and ears are splitting the information by frequency channel

How well did you know this?

Not at all

Perfectly

Adding source and filter to how we produce speech

The lungs push air up the trachea (windpipe)

Which vibrates the vocal cords in the larynx (voicebox) → ‘Source’

Sounds from the vocal cords are then shaped by the supralaryngeal vocal tract → ‘Filter’

Pharynx
Oral cavity (and lips, tongue, teeth)
Nasal cavity

How well did you know this?

Not at all

Perfectly

Source-filter theory

Source only

Source (vocal cords) important for voice pitch and intonation

It provides some info such as voice pitch info

How well did you know this?

Not at all

Perfectly

Source-filter theory

Source + filter

This shows how important the filter is for making intelligible speech

Filter (supralaryngeal vocal tract) important for producing different speech sounds (phonemes)

Filtering appears as bands of energy at certain frequencies called ‘formants’ (in Latin, “formare” = “to shape”)

The lowest three formant frequencies are the most important for speech intelligibility (labelled F1, F2 and F3)

How well did you know this?

Not at all

Perfectly

Source-filter theory: Vowels

What happens when changing from front to back vowels?
What happens when changing from high to low vowels?

Study These Flashcards

Changing from front to back vowels e.g. “heed” vs “had” at F2 frequency decreases
Changing from high to low vowels e.g. “heed” vs “hod” at F1 frequency increases

Source-filter theory: Vowels
Key Point

Study These Flashcards

So your brain can know which vowel it is hearing by detecting these auditory “cues”

Source-filter theory: Consonants
What are important cues for identifying consonants?

Study These Flashcards

Second and third formants (F2 and F3) are important cues for identifying consonants

For each of these consonants, when looking at the beginning they take on a different shape for p, t and k.

How do we perceive phonemes?
(Categorical perception and how to demonstrate it)

3 things

Study These Flashcards

Set up a continuum of sounds between two phonemes
Run an identification experiment
Run a discrimination experiment

How do we perceive phonemes:
1. Set up a continuum of sounds between two phonemes

Study These Flashcards

Different sounds on each ends of the continuum

In the middle point in the continuum is ambiguous between ‘ba’ and ‘da’. In the middle theres an intermediary between the two.

You hear ‘ba’ in the beginning, then something intermediary between the two, then by the end it’s a clear ‘da’

How do we perceive phonemes:
2. Run an identification experiment

Study These Flashcards

Identify if the sound you’re hearing is a ba or da sound.

Plot the percentage of responses.

When you’re hearing a clear unambiguous ba, most of the time people will respond with ba.

On the other end of the continuum when the sound is a clear da, hardly any of the time, they’re responding with ba (they’re responding with da instead)

If we find the point on this graph where listeners are equally likely to respond ‘ba’ and ‘da’- this is referred to as the phoneme boundary.

One of the main signatures of categorical perception is that around the phoneme boundary, you have an abrupt transition in this graph. Perception suddenly changes.

How do we perceive phonemes?
3. Run a discrimination experiment

Study These Flashcards

Play pairs of adjacent sounds on the continuum and ask them to say if it’s the same or different

Plot the % of different responses

Discrimination peak near the phoneme boundary

What is categorical perception?

Study These Flashcards

The tendency to perceive gradual sensory changes in a discrete fashion

What are three hallmarks of categorical perception?

Study These Flashcards

Abrupt change in identification at phoneme boundary
Discrimination peak at phoneme boundary
Discrimination predicted from identification (only sound “different” if classify the sound as a different phoneme)

Yanny or Laurel? Categorical perception in action

Yanny- 47% Laurel- 53% Sound has to be ambiguous but your brain can’t help but latch onto a specific interpretation- not an intermediate mix between the two. Your brain in terms of understanding speech will try to latch onto a specific interpretation

Context influences speech perception: Green needle/ brainstorm

Exactly the same sound but different expectations each time so changes how you perceiving it Example of how speech perception depends on prior knowledge of context

Context influences speech perception: Visual context “McGurk effect”

You hear one thing and see another thing. What you perceive is changed by what you see. Prior context effect because lip movements tend to perceive the speech that you hear and this influences what you perceive.

Context influences speech perception: Lexical context “Ganong effect”

- Listener has to do identification task - Plotting % of g responses - Present in an iss sound - Then present an ift sound The graph shows When you’re at the mid point- when the ambiguous sound is placed in a ift context and you show bias towards g- in combination with ift g will make a word Placing in front of iss- you are biased towards k and this is because kiss is a real word and giss isn’t. Exactly the same sound- ambiguous between g and k- even though its the same sound- your senses are bias towards an interpretation that makes a real word.

Introduction to speech perception Flashcards

(28 cards)