Introduction to speech perception Flashcards
What are some challenges of speech perception?
Recording of sentence “he guessed the answer to the question in the exam”
- Unlike written language, no clear gaps between words
Eg. answer is one word but there might be 2 acoustic events here. Conversely in and the are two different words but theres no gap between the signal. - “the” sounds different in different positions (co-articulation- the acoustic realisation of speech depends on what you’ve just said and what you’re about to say)
This adds variability to acoustic speech and can make it hard to understand for a computer - Accent, gender and speaking rate
- Time constraints
- We hear up to 200 words per minute
- Sound is fleeting (sound is always changing, a temporal signal)
- “Now-or-never bottleneck” - speech is coming in quickly, sound doesn’t stay static- need to quickly process the word you’ve just heard before the next word comes in
Why study speech perception?
- Primary need in which we communicate
- More broadly- reading- learning to read requires you to learn the relationship between letters and speech sounds (Phoenix)
- Listeners who have some form of hearing loss. Cochlear implant which directly stimulates the brain. This restores hearing to some extent so adapting to an implant requires the brain to adapt to novel sensory information.
- Individuals with developmental language disorder- helpful for understanding whats going on and developing strategies to help them
How do we produce speech?
- what does speech require
- ____ pushes air to _____
- what does this result in
- what are sounds shaped by
- including?
- what are these structures important for?
- Speech requires a basic energy source. This initial energy source is provided by the lungs
- The lungs push air up the trachea (windpipe)
- which vibrates the vocal cords in the larynx (voicebox)
- Sounds from the vocal cords are then shaped by the supralaryngeal (all the structures above the larynx) vocal tract, including:
- Pharynx
- Oral cavity (and lips, tongue, teeth)
- Nasal cavity - These structures are important for shaping the sounds - you need these for intelligible speech
What method can be used to see speech production?
MRI
Describing speech: Consonants
How are consonants produced?
With a constriction in the vocal tract
Describing speech: Consonants
What are the 3 main features it’s classified by?
- Stop- for these consonants, the constriction thats happening is a complete constriction (air flow stops completely). These are voice consonants because vocal cords are vibrating.
- Fricative- constriction doesn’t happen completely
- Nasal- air flow is redirected to nasal cavity
Describing speech: Consonants
Stop:
+voice: b, d, g
-voice: p, t, k
d- constriction is happening when tongue touches upper teeth
g- tongue is touching the back of the mouth
Fricative:
+voice: v, z
-voice: f, s
Nasal:
m, n, n
What are sound waves?
Periodic displacement of air molecules, creating increases and decreases in air pressure
Speech as sound waves:
- what is happening
- what is formed
- vibrating source (plate thats moving back and force), this movement is moving the air molecules around (vibration of vocal cords). These are then going to be picked up by the ear and the ear will change these to a sensation sound.
- Plotting changes in sound pressure over time, at certain moments the air molecules come together and theres an increase in pressure.
- Sound waveform is formed and perceived by brain.
In relation to a sound waveform, what is amplitude and period?
Amplitude:
- related to loudness
- larger the peaks the louder
Period:
- inversely related to frequency; important cue to pitch
- peaks closer together = higher frequency and pitch
Speech as sound waves:
- what is speech associated with?
- how do you get speech?
- what is speech a mix of?
Speech is more complicated than sounding like a beep- theres more variations and its more complex.
Theres a relationship between what it looks like for a simple tone and more complicated. How you get speech is essentially a mix of sounds together- if you shape the amplitude over time, you will get this overtime.
Speech is a mix of lots of simpler sound creating this more complex speech.
Spectrogram: Analysing the frequencies of speech
1- what is a spectogram?
2- difference between dark grey and light grey?
3- why is useful?
4- what is being split?
- A spectrogram is a graph showing how sound amplitude varies as a function of time (x-axis) and frequency (y-axis)
- Dark grey = large amplitude, light grey = small amplitude
- Useful because the ear splits sound by frequency so better captures the information available to the brain.
- Split this sound into different frequency components. Brain and ears are splitting the information by frequency channel
Adding source and filter to how we produce speech
The lungs push air up the trachea (windpipe)
Which vibrates the vocal cords in the larynx (voicebox) → ‘Source’
Sounds from the vocal cords are then shaped by the supralaryngeal vocal tract → ‘Filter’
- Pharynx
- Oral cavity (and lips, tongue, teeth)
- Nasal cavity
Source-filter theory
Source only
Source (vocal cords) important for voice pitch and intonation
It provides some info such as voice pitch info
Source-filter theory
Source + filter
This shows how important the filter is for making intelligible speech
Filter (supralaryngeal vocal tract) important for producing different speech sounds (phonemes)
Filtering appears as bands of energy at certain frequencies called ‘formants’ (in Latin, “formare” = “to shape”)
The lowest three formant frequencies are the most important for speech intelligibility (labelled F1, F2 and F3)