Week 8 Speech + Music Flashcards
Vocalisation
- = acoustic energy arising from vocal tract
1. Air pushed from lungs provides molecular disturbance
2. Air crosses vocal folds
o Vocal folds longer in men
o Folds lose tautness with age
3. Resonates through larger cavities
4. Articulated by throat, tongue, lips, teeth, jaw
o Hundreds of fine movements per second
o Also goes through nasal cavity – resonance cavity
o Using all the muscles to form that air into a particular spatial pattern and
push it out the mouth
o Face is over-represented in somatotopic map
Speech perception requires
- Hearing o Functional auditory system o Basic 3 levels of auditory processing Spatial localisation Signal to noise optimising Recognition of sound as vocal energy Gives you identifiable sound - Speech processing – gives you meaning o Semantic content o Paralinguistic information – gives extra info beyond just the words Person speaking Affective state Get affective mood information Intentions Questions, monotone Conveyed by fluctuations
Non-speech vs speech vocalisation
- Non-speech o Screaming, laughing, bawling, grunting o No words o Things we see in other species that can vocalise - Speech o Words o Semantics + prosody o Only in humans
Phonemes
Semantic content
= distinct sounds used to create words in spoken language
- The fundamental unit of speech; if you change a phoneme, you can change the
meaning of a word
- Written /b/ etc.
- Every language has its own phonemes
- ~ 12 phonemes/sec at normal rate of speech
- Each phoneme has a very specific pattern of acoustic energy
o Specific to phoneme. Specific to individual
o Formants – the frequencies at which peaks of acoustic energy occur
Each vowel related phoneme has a characteristic formant pattern
Consonants provide formant transitions – rapid shifts in frequency
- A string of phonemes (a spoken word) gives a specific sound spectrogram
o Specific to word, specific to individual
o Temporal time scale – say it fast and process it fast
o Every person has a different spectrogram for the same word
- Tech applications based on this specificity
o If every individual says a word in a specific pattern of energy that someone
can’t fake – can use this as a security device
Voice recognition for security
Way you say your name is as unique as your fingerprint
Sounds the same as someone else but acoustic energy will be different
Speech recognition for communication with machines/computer interface
Talk to a computer interface of some sort
Train computer to recognise words based on acoustic energy patterns
associated with them
Problem
Computer only knows a certain number of patterns so doesn’t always recognise what you’re saying
Your pattern might not match one embedded in they system
Put as many patterns as possible in the system to compensate
Sound spectrograms
- Illustrate phoneme spectra, cadence, intonation
- Small time scale
o Speech has a fine temporal structure
o There are rapid fluctuations in acoustic energy - How do we make sense of the rapid acoustic energy
o The brain can bind phonemes and parse streams of acoustic energy into words - ~5% children perceptually speech impaired (LBLI – language based learning impairment)
o Brain isn’t able to grasp concept of binding and parsing
o Taking longer in perceptual processing to understand temporal structure of speech – talking slower helps
Bottom Up Processing
maybe each phoneme is indicated by a specific fibre activation pattern (pattern encoding)
- Level: auditory fibres in cochlea
o Tonotopical basilar membrane in cochlea
o Maybe each spectrogram maps to a specific ‘neurogram’ o High frequency at base, low frequency at apex
Hairs have frequency associated with it that they best respond to
o Asyousayaword
Each phoneme has a different amount of energy
Maps straight from map of cochlea and travels straight up stream
Different pattern of activation for each sound – brain puts it together
Shortcomings (Speech processing – content)
o Coarticulation
Sometimes we say a phoneme but the energy is different depending
on what comes after it
/d/ will activate different fibres in cochlea basal membrane in different
situations – different words
Acoustic energy is different but resulting perception is the same
E.g. /di/ and /du/ have different spectrograms for the /d/, but in both
instance we hear /d/
o Phoneme variance
/t/ from different people have different spectrograms but we
understand each other’s /t/
/t/ at 25dB and /t/ at 60dB have different spectrograms but both are
understood as /t/ - as a whisper or loud
Resulting perception is the same
This indicates perceptual constancy is occurring
This occurs on the word and sentence levels
The stimuli are different but resulting perceptions are the
same
E.g. ‘the’ – 50 different spectrograms – understood as ‘the’
Perceptual constancy (Speech processing – content)
o Different incoming stimuli result in the same perceptual interpretation o Examples of perceptual constancy
Phoneme constancy
Speech constancy
o Suggests a lot of top-down processing is occurring in our understanding of
speech – built a knowledge base and rules about patterns to help perceive speech
Cortical , Top down Processing
speech perception needs some incoming bottom-up sensory information but cortical processing is critical
- Via experience, build a knowledge base to assist in interpreting and understanding sensory info coming in from the environment
o Apply knowledge base to the signal to make sense of it
- Level: auditory cortex
o A1 + secondary/tertiary bands
o A1 does not preferentially respond to speech, responds to any noise o Some association areas do preferentially respond to speech
- Wernicke’s area
o In left hemisphere
o Critical for hearing words
o Damage = intact hearing but can’t understand speech
Receptive aphasia – can hear signal but not the semantics, has no meaning associated
- Broca’s area
o In left hemisphere
o Critical for speaking words
o Damage = intact hearing but can’t speak coherent words
Expressive aphasia
Can understand what is spoken but can’t put sounds together into
meaningful words
Paralinguistic info
- R hemisphere
o Areas of temporal lobe, prefrontal cortex, limbic
o Strong activation to emotional intonation and non-speech utterances o Strong activation during speaker identification
o Who is talking, are they in a good mood, are they asking questions? - Prosody – intonation
o Signals intent – declarative/interrogative o Signals mood - Damage
o Dysprosody = content intact but can’t understand intonation o Phonagnosia = content intact but can’t identify speaker
o Autism – problems in superior temporal sulcus processing
Understand speech fine but miss nuances – sarcasm, comedy, anger
Intonation based, not content based
Speech perception requires
- Hearing
o Functional auditory system–pinna-A1
o Basic 3 levels of auditory processing – A1, parabelts, ventral/dorsal stream - Speech content
o Content – Wernicke’s, Broca’s – LH
o Paralinguistic info – person, affective state, intentions
If you don’t have these – miss subtleties of human interaction
RH laden
STS, PFC, limbic system
Phoneme recognition
speech processing
o Acoustic energy pattern
o Phonemic restoration effect
Do need acoustic energy pattern coming in – although the pattern may be missing some of the phonemes you heard
Use knowledge to fill in missing phonemes
People might not pronounce all the phonemes in a word – brain fills in gaps – restoration gap
o Indexical characteristics
Knowledge about the person
Their accent – using that info to help understand what they’re saying
Context
speech recognition
o Language appropriate combinations of letters
o Topic of conversation
Use context to help figure out what they’re saying
Boundaries
speech recognition
o Language appropriate combinations of syllables
There are no boundaries between the words you speak but you hear
separate words
The breaks are an auditory illusion – acoustic energy is continuous but
you can parse the words
Need to parse in the right places to separate the info into meaningful
words
E.g. so I got out of bed this morning VS so I got out of bed this morn ing
o Knowledge of vocabulary
Need knowledge of English and vocabulary
We perceptually hear separate words based on knowledge of
language
The only time we usually acoustically break is to signal ‘done’ in flow of conversation
It sounds weird when people actually pause between words
Supplementary sensory input
speech recognition
o Incoming visual input
Lots of our hearing is also seeing
Reading lips, body language, facial expressions, hand gestures
Humans seem to be natural lip readers e.g. McGurk effect
McGurk effect – what you see overrides what you hear; if you close your eyes you hear the sound as it is but if you open your eyes it can influence what we hear
o Doesn’t matter what knowledge you bring
o Conflicting senses – brain tries to make sense of the
conflict
o The modality that is providing more salient info takes
over or combines with the other
o Seeing the articulators changing visually changes the phoneme you hear
Speech-reading greatly aids speech processing for a person with a cochlear implant
o Previously gained knowledge of how to interpret that input
Knowledge about the visual information and how to interpret it
Practice
speech recognition
o Becoming a native listener of a language takes years
Exposure to the language builds up the top-down categories to understand the bottom-up categories
o Starts in utero?
Newborn preference for maternal voice vs other females
Newborn preference for maternal native language vs other languages
o Phoneme set hones in infancy
Unimportant phonemes start to become ignored ~6 months
If you haven’t been exposed, lose ability to discriminate
E.g. in Japanese /r/ and /l/ discrimination not useful ignore
Important vowels and consonants become delineated, defined
o As exposure to language and knowledge (vocab, syntax) progresses, this facilitates top-down recognition of language appropriate syllables, words, boundaries
Start to be able to understand and parse quickly
Need to be exposed to phonemes passively in infancy – is important
for language later in life
Just hearing them will train brain to discriminate – when it
comes to learning the language the brain has been primed o Baby talk – ‘motherese’
When speaking to infants adults often unconsciously use higher pitch and frequent intonation fluctuation
Also tend to do this with pets and partners
Baby talk is different to the other situations
We stress vowels when talking to human babies
Use articulated voice to help learn the formants
o A1 forms the picture of the formants – babies can learn the better later on – plasticity
Unconscious attempt to amplify phonetic characteristics of native language especially vowel articulation
Babies are born with the capacity to recognise global phoneme library
Experience dependent plasticity occurs in infancy; and only
frequently heard phonemes persist
Kuhl et al., 1997
Want language diversity – give them exposure to as many phonemes as possible
Melodies, combinations of notes – helps language abilities later
Top Down Processing
speech perception
o Able to perceive and understand despite immensely variant acoustic signals
Phonemes vary between and within individuals
Phoneme pronunciation differs between individuals/accents
Oftentimes phonemes missing or muddled
Acoustic signal is extremely rapid
o Able to parse words and sentences despite continuous acoustic signal
Robots have trouble with this
Can make voice recognition but don’t have the knowledge base when
the acoustic signal is a bit muddled or in an accent
o William James
Part of what we perceive comes through our senses from the object
before us, another part always comes out of our own head
Top-down processing
Robots don’t have this – lacks this second component of
perception
o Attempts to give a robot top-down processing by giving it a knowledge base
If it has a detector and an open source database knowledge base, can see if the incoming info matches anything in the database and make sense of it
Auditory scene analysis
o Able to delineate and perceive vocalisations in audibly cluttered
environments
Robot also have trouble with this
Pick up all the sounds as one signal
o Auditory scene = the whole array of sounds in your present environment
o Analysis = processing the auditory scene into separate auditory images;
source segregation (parsing)
Achieved via the 3 basic levels of processing: signal to noise
optimisation, spatial localisation, sound recognition
o While not consciously attending to all the info, still process it subconsciously –
if something happens you can shift attention and pick up on salient
information
o Music playback on a speaker is a good example of ASA
Even though the sound is coming through one speaker – one spatial location – you can still parse it apart
Can tell the vocals from the instruments
We can focus on one person talking and ignore all others
Why Music
- Seemingly odd of humans o Spans globe, age, SE spectrum o Can be communicative, with or without words No semantic content but you still know the feelings - Has persisted for thousands of years o Is it a useful means of communication? What messages can you convey Good at it – attract people to you o Is it simply hedonic Pinker’s idea of auditory cheesecake Do we just like it?
Which came first, speech or music?
o Darwin
Music came first
A way to sexually select
People will be lured to you – brings potential mates
Shows you have skill and creativity
o Spencer
Language came first
The prosody segued into singing o Diana Deutsch
Speech to song illusion
Loop a spoken phrase
Sounds like sung words over time
Brain gets altered for particular phrase – sounds like singing every
time you hear it
Supports that speech came first
Inflection are looped – frequency and pitch change
Expand this and make it pronounced - singing
what is music?
- Definition: a perceptual experience created by purposely combining a limited set of
acoustic signals (notes) - Note: fundamental ‘unit’ of music
o Can vary in loudness
o Has a specific pitch and duration
o Notes with the same pitch and duration can still differ in timbre
o Ever voice category (or instrument) covers a specific range of frequencies
Vary differently from speech
o Very specific differences in Hz for each note
Harmonics
o The sound wave that comes out of the hammer hitting the string
Will have harmonics – overtones related
to the 220 frequency but won’t just be 220Hz by itself
Creates complex waves
o Hit G# create 220Hz wave but have lots of wave
forms on top of each other
o Harmonic – start with base frequency – the
note
Fundamental frequency
Interweaved with the integers – doubled,
tripled, etc.
o Fundamental frequency = n
Nx2,nx3,nx4,nx5…
Different instruments will give different patterns of harmonics for the same note
Pitch perception exhibits perceptual constancy
o Different incoming stimuli can result in the same pitch perception, as long as fundamental frequencies are the same
Two different notes have different characteristics – timbre due to the harmonics
Each musician plays G3
Different acoustic energy patterns –differing in power spectrum of
harmonics but is the same note
Guitar has a lot of higher frequency
harmonics
Bassoon has more power in second
harmonic
o Even if you remove the fundamental frequency
but maintain harmonies the same pitch perception can result
The effect of the missing fundamental
Music perception requires
- Hearing
o Functional auditory system
o Basic 3 levels of auditory processing - Music processing
o Pitch, rhythm – don’t need to learn this when learning speech
o Motoric? – motor processing inherently linked with auditory processing
o Cognitive assessment
Enjoyment, emotion, meaning
Personalised, cultural, memory links activated
Cortical processing of music
- Brain areas activated o Everywhere – the whole brain Even visceral o Frontal – preferences o A1 – hearing stuff Belt regions – right belts and parabelts Music is more on the right side of the brain Left is more for language o Motor cortex o Sensorimotor o Visual – watching music performed
Pitch
- The perceptual submodality of sound based on the frequency of the acoustic energy
(~pitch/tone/note) - Different pitches can be patterned in a certain way to make a melody
o We can usually still perceive and recognise melodies even if the pitches are wrong or the absolute interval is skewed - Melodies show perceptual constancy
o Melodic constancy – as long as the relative pattern is maintained we can recognise usually
Need to maintain melodic contour
Melody is preserved and can make sense of it – recognise it
If it is inverted – can’t recognise - Sometimes the relationship between the notes can be very small
o Difficult to distinguish for some people
o Small differences in Hz - Moving up the keyboard to the right increases frequency and tone height
Pitch differences between individuals
- Absolute pitch
o Perfect pitch
o Heightened ability to identify and name specific pitches
o Used to be considered innate – however correlations to early musical training, early blindness, early linguistic exposure
Train – take perceptual component and add in memory component
Prime auditory cortex to listen to small differences – early exposure to music or tonal language
o ~1 in 10,000 - Relative pitch
o Heightened ability to identify tonal intervals but less accurate at naming specific pitches
o Can be honed, often seen in musicians - Normal pitch
- Amusia
o Tone deaf
o Inability to differentiate small differences in pitch that characterise music
o Can be difficult to perceive melodies
o Can be congenital amusia or secondary to brain damage
o Can be overcome with training
Experience dependent plasticity
- In monkeys, active training increases tone discrimination performance and auditory
cortex remapping - In humans highly skilled musicians have ~25% increase in cortical representation of
music tones - In humans, sound localisation ability high develop in conductors
o Plasticity in belt regions and brainstem
o Superior olive and cochlear nucleus
Rhythm
- The temporal patterning of the acoustic signals
- Does not necessarily need a melody but a melody always has a rhythm
- Rhythm provides perceptual organisation to the music
o Deviations from this organisation can seem pleasant or unpleasant
o E.g. syncopation, jazz improve - Often drives the mood/feeling of music
Music and emotion
- If notes themselves are non-referential, then why do certain combinations of notes
(melodies) makes us feel
o Sad, happy, proud, energetic - Perhaps due to
o Familiarity/conditioning
o A specific memory (episodic link) o Intrinsicmusicalstructure
Tempo, loudness, key
Instrument timbre
Music feels
o Can use music to make you feel happier or to wallow in your sadness o At the gym – energy component Part of this is loudness Sympathetic nervous system activation – cortisol release, waiting for something in the enviro to happen Amps you up physiologically Tempo Using music for athlete’s performance Bpm match heart rate or running pace
Music has power to affect you in a way that language alone doesn’t
-Conditioned from an early age
Religious sounds, radio sounds, funeral sounds, sports sounds
Conditioned as a group – society – you know certain songs are
happy
Conditioned to feel proud – national anthem
-Memory
Brings you back to a time from the past
Unique to you as an individual
-Intrinsic
Sound waves create emotion
Keys – consonant chords are pleasant, dissonant chords are
not
Certain instruments for happy music – have timbre with a
certain feel
Peter and the wolf – each animal has an instrument that
matches the character
Music Therapy
o Music is increasingly being used as a part of speech therapy for aphasia
o Activate right A1 – lack of language in that area
o When congenital aphasia, or selective aphasia
Seems like music is a way to release inhibition in the brain and let you speak again
Sack’s hypothesis
music therapy
The right broca’s area is inhibiting the left broca’s area – aberrant inhibition across hemispheres
Maybe music can distract or engage the right so it disinhibits the left
and lets it work again
Right correlary of Broca’s region – it is usually on the left
Functional recovery
Aphasia that lacks prosody
music therapy
Stroke – regain normal speech
Sound robotic – lost inflection
Use music to reengage the right hemisphere
Paralinguistic prosody area – right hemisphere – learn to use
inflection again
Get patients to sing things and tone down the intonations to a
normal speech level
Tourette’s, autism, movement disorders (PD)
music therapy
“we listen with our muscles” Nietzsche
Sacks
Patients in the 60’s who were catatonic, wake them up with music
Wake them up out of their wheelchairs and respond to things
o Start to walk around
o Motoric component to music that stimulates you to move
o All of a sudden the catatonic people getting up and dancing – environmental enrichment, don’t need drugs or anything else
Even babies will move to music – innate tendencies to move to a beat
Also in animals
We respond - energy in patterns makes us want to move
Memorisation
music therapy
Melody helps you remember and encode information
Will remember a song you liked from 10 years ago
Week 8 Speech + Music