Week 8 Speech + Music Flashcards

1
Q

Vocalisation

A
  • = acoustic energy arising from vocal tract
    1. Air pushed from lungs provides molecular disturbance
    2. Air crosses vocal folds
    o Vocal folds longer in men
    o Folds lose tautness with age
    3. Resonates through larger cavities
    4. Articulated by throat, tongue, lips, teeth, jaw
    o Hundreds of fine movements per second
    o Also goes through nasal cavity – resonance cavity
    o Using all the muscles to form that air into a particular spatial pattern and
    push it out the mouth
    o Face is over-represented in somatotopic map
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Speech perception requires

A
- Hearing
o Functional auditory system
o Basic 3 levels of auditory processing
 Spatial localisation
 Signal to noise optimising
 Recognition of sound as vocal energy
 Gives you identifiable sound
- Speech processing – gives you meaning 
o Semantic content
o Paralinguistic information – gives extra info beyond just the words
 Person speaking
 Affective state
 Get affective mood information
 Intentions
 Questions, monotone
 Conveyed by fluctuations
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Non-speech vs speech vocalisation

A
- Non-speech
o Screaming, laughing, bawling, grunting
o No words
o Things we see in other species that can vocalise
- Speech
o Words
o Semantics + prosody 
o Only in humans
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Phonemes

Semantic content

A

= distinct sounds used to create words in spoken language
- The fundamental unit of speech; if you change a phoneme, you can change the
meaning of a word
- Written /b/ etc.
- Every language has its own phonemes
- ~ 12 phonemes/sec at normal rate of speech
- Each phoneme has a very specific pattern of acoustic energy
o Specific to phoneme. Specific to individual
o Formants – the frequencies at which peaks of acoustic energy occur
 Each vowel related phoneme has a characteristic formant pattern
 Consonants provide formant transitions – rapid shifts in frequency
- A string of phonemes (a spoken word) gives a specific sound spectrogram
o Specific to word, specific to individual
o Temporal time scale – say it fast and process it fast
o Every person has a different spectrogram for the same word
- Tech applications based on this specificity
o If every individual says a word in a specific pattern of energy that someone
can’t fake – can use this as a security device

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Voice recognition for security

A

 Way you say your name is as unique as your fingerprint

 Sounds the same as someone else but acoustic energy will be different

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Speech recognition for communication with machines/computer interface

A

 Talk to a computer interface of some sort
 Train computer to recognise words based on acoustic energy patterns
associated with them
 Problem
 Computer only knows a certain number of patterns so doesn’t always recognise what you’re saying
 Your pattern might not match one embedded in they system
 Put as many patterns as possible in the system to compensate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Sound spectrograms

A
  • Illustrate phoneme spectra, cadence, intonation
  • Small time scale
    o Speech has a fine temporal structure
    o There are rapid fluctuations in acoustic energy
  • How do we make sense of the rapid acoustic energy
    o The brain can bind phonemes and parse streams of acoustic energy into words
  • ~5% children perceptually speech impaired (LBLI – language based learning impairment)
    o Brain isn’t able to grasp concept of binding and parsing
    o Taking longer in perceptual processing to understand temporal structure of speech – talking slower helps
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Bottom Up Processing

A

maybe each phoneme is indicated by a specific fibre activation pattern (pattern encoding)
- Level: auditory fibres in cochlea
o Tonotopical basilar membrane in cochlea
o Maybe each spectrogram maps to a specific ‘neurogram’ o High frequency at base, low frequency at apex
 Hairs have frequency associated with it that they best respond to
o Asyousayaword
 Each phoneme has a different amount of energy
 Maps straight from map of cochlea and travels straight up stream
 Different pattern of activation for each sound – brain puts it together

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Shortcomings (Speech processing – content)

A

o Coarticulation
 Sometimes we say a phoneme but the energy is different depending
on what comes after it
 /d/ will activate different fibres in cochlea basal membrane in different
situations – different words
 Acoustic energy is different but resulting perception is the same
 E.g. /di/ and /du/ have different spectrograms for the /d/, but in both
instance we hear /d/
o Phoneme variance
 /t/ from different people have different spectrograms but we
understand each other’s /t/
 /t/ at 25dB and /t/ at 60dB have different spectrograms but both are
understood as /t/ - as a whisper or loud
 Resulting perception is the same
 This indicates perceptual constancy is occurring
 This occurs on the word and sentence levels
 The stimuli are different but resulting perceptions are the
same
 E.g. ‘the’ – 50 different spectrograms – understood as ‘the’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Perceptual constancy (Speech processing – content)

A

o Different incoming stimuli result in the same perceptual interpretation o Examples of perceptual constancy
 Phoneme constancy
 Speech constancy
o Suggests a lot of top-down processing is occurring in our understanding of
speech – built a knowledge base and rules about patterns to help perceive speech

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Cortical , Top down Processing

A

speech perception needs some incoming bottom-up sensory information but cortical processing is critical
- Via experience, build a knowledge base to assist in interpreting and understanding sensory info coming in from the environment
o Apply knowledge base to the signal to make sense of it
- Level: auditory cortex
o A1 + secondary/tertiary bands
o A1 does not preferentially respond to speech, responds to any noise o Some association areas do preferentially respond to speech
- Wernicke’s area
o In left hemisphere
o Critical for hearing words
o Damage = intact hearing but can’t understand speech
 Receptive aphasia – can hear signal but not the semantics, has no meaning associated
- Broca’s area
o In left hemisphere
o Critical for speaking words
o Damage = intact hearing but can’t speak coherent words
 Expressive aphasia
 Can understand what is spoken but can’t put sounds together into
meaningful words

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Paralinguistic info

A
  • R hemisphere
    o Areas of temporal lobe, prefrontal cortex, limbic
    o Strong activation to emotional intonation and non-speech utterances o Strong activation during speaker identification
    o Who is talking, are they in a good mood, are they asking questions?
  • Prosody – intonation
    o Signals intent – declarative/interrogative o Signals mood
  • Damage
    o Dysprosody = content intact but can’t understand intonation o Phonagnosia = content intact but can’t identify speaker
    o Autism – problems in superior temporal sulcus processing
     Understand speech fine but miss nuances – sarcasm, comedy, anger
     Intonation based, not content based
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Speech perception requires

A
  • Hearing
    o Functional auditory system–pinna-A1
    o Basic 3 levels of auditory processing – A1, parabelts, ventral/dorsal stream
  • Speech content
    o Content – Wernicke’s, Broca’s – LH
    o Paralinguistic info – person, affective state, intentions
     If you don’t have these – miss subtleties of human interaction
     RH laden
     STS, PFC, limbic system
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Phoneme recognition

speech processing

A

o Acoustic energy pattern
o Phonemic restoration effect
 Do need acoustic energy pattern coming in – although the pattern may be missing some of the phonemes you heard
 Use knowledge to fill in missing phonemes
 People might not pronounce all the phonemes in a word – brain fills in gaps – restoration gap
o Indexical characteristics
 Knowledge about the person
 Their accent – using that info to help understand what they’re saying

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Context

speech recognition

A

o Language appropriate combinations of letters
o Topic of conversation
 Use context to help figure out what they’re saying

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Boundaries

speech recognition

A

o Language appropriate combinations of syllables
 There are no boundaries between the words you speak but you hear
separate words
 The breaks are an auditory illusion – acoustic energy is continuous but
you can parse the words
 Need to parse in the right places to separate the info into meaningful
words
 E.g. so I got out of bed this morning VS so I got out of bed this morn ing
o Knowledge of vocabulary
 Need knowledge of English and vocabulary
 We perceptually hear separate words based on knowledge of
language
 The only time we usually acoustically break is to signal ‘done’ in flow of conversation
 It sounds weird when people actually pause between words

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Supplementary sensory input

speech recognition

A

o Incoming visual input
 Lots of our hearing is also seeing
 Reading lips, body language, facial expressions, hand gestures
 Humans seem to be natural lip readers e.g. McGurk effect
 McGurk effect – what you see overrides what you hear; if you close your eyes you hear the sound as it is but if you open your eyes it can influence what we hear
o Doesn’t matter what knowledge you bring
o Conflicting senses – brain tries to make sense of the
conflict
o The modality that is providing more salient info takes
over or combines with the other
o Seeing the articulators changing visually changes the phoneme you hear
 Speech-reading greatly aids speech processing for a person with a cochlear implant
o Previously gained knowledge of how to interpret that input
 Knowledge about the visual information and how to interpret it

18
Q

Practice

speech recognition

A

o Becoming a native listener of a language takes years
 Exposure to the language builds up the top-down categories to understand the bottom-up categories
o Starts in utero?
 Newborn preference for maternal voice vs other females
 Newborn preference for maternal native language vs other languages
o Phoneme set hones in infancy
 Unimportant phonemes start to become ignored ~6 months
 If you haven’t been exposed, lose ability to discriminate
 E.g. in Japanese /r/ and /l/ discrimination not useful  ignore
 Important vowels and consonants become delineated, defined
o As exposure to language and knowledge (vocab, syntax) progresses, this facilitates top-down recognition of language appropriate syllables, words, boundaries
 Start to be able to understand and parse quickly
 Need to be exposed to phonemes passively in infancy – is important
for language later in life
 Just hearing them will train brain to discriminate – when it
comes to learning the language the brain has been primed o Baby talk – ‘motherese’
 When speaking to infants adults often unconsciously use higher pitch and frequent intonation fluctuation
 Also tend to do this with pets and partners
 Baby talk is different to the other situations
 We stress vowels when talking to human babies
 Use articulated voice to help learn the formants
o A1 forms the picture of the formants – babies can learn the better later on – plasticity
 Unconscious attempt to amplify phonetic characteristics of native language especially vowel articulation
 Babies are born with the capacity to recognise global phoneme library
 Experience dependent plasticity occurs in infancy; and only
frequently heard phonemes persist
 Kuhl et al., 1997
 Want language diversity – give them exposure to as many phonemes as possible
 Melodies, combinations of notes – helps language abilities later

19
Q

Top Down Processing

speech perception

A

o Able to perceive and understand despite immensely variant acoustic signals
 Phonemes vary between and within individuals
 Phoneme pronunciation differs between individuals/accents
 Oftentimes phonemes missing or muddled
 Acoustic signal is extremely rapid
o Able to parse words and sentences despite continuous acoustic signal
 Robots have trouble with this
 Can make voice recognition but don’t have the knowledge base when
the acoustic signal is a bit muddled or in an accent
o William James
 Part of what we perceive comes through our senses from the object
before us, another part always comes out of our own head
 Top-down processing
 Robots don’t have this – lacks this second component of
perception
o Attempts to give a robot top-down processing by giving it a knowledge base
 If it has a detector and an open source database knowledge base, can see if the incoming info matches anything in the database and make sense of it

20
Q

Auditory scene analysis

A

o Able to delineate and perceive vocalisations in audibly cluttered
environments
 Robot also have trouble with this
 Pick up all the sounds as one signal
o Auditory scene = the whole array of sounds in your present environment
o Analysis = processing the auditory scene into separate auditory images;
source segregation (parsing)
 Achieved via the 3 basic levels of processing: signal to noise
optimisation, spatial localisation, sound recognition
o While not consciously attending to all the info, still process it subconsciously –
if something happens you can shift attention and pick up on salient
information
o Music playback on a speaker is a good example of ASA
 Even though the sound is coming through one speaker – one spatial location – you can still parse it apart
 Can tell the vocals from the instruments
 We can focus on one person talking and ignore all others

21
Q

Why Music

A
- Seemingly odd of humans
o Spans globe, age, SE spectrum
o Can be communicative, with or without words
 No semantic content but you still know the feelings - Has persisted for thousands of years
o Is it a useful means of communication?
 What messages can you convey
 Good at it – attract people to you
o Is it simply hedonic
 Pinker’s idea of auditory cheesecake
 Do we just like it?
22
Q

Which came first, speech or music?

A

o Darwin
 Music came first
 A way to sexually select
 People will be lured to you – brings potential mates
 Shows you have skill and creativity
o Spencer
 Language came first
 The prosody segued into singing o Diana Deutsch
 Speech to song illusion
 Loop a spoken phrase
 Sounds like sung words over time
 Brain gets altered for particular phrase – sounds like singing every
time you hear it
 Supports that speech came first
 Inflection are looped – frequency and pitch change
 Expand this and make it pronounced - singing

23
Q

what is music?

A
  • Definition: a perceptual experience created by purposely combining a limited set of
    acoustic signals (notes)
  • Note: fundamental ‘unit’ of music
    o Can vary in loudness
    o Has a specific pitch and duration
    o Notes with the same pitch and duration can still differ in timbre
    o Ever voice category (or instrument) covers a specific range of frequencies
     Vary differently from speech
    o Very specific differences in Hz for each note
24
Q

Harmonics

A

o The sound wave that comes out of the hammer hitting the string
 Will have harmonics – overtones related
to the 220 frequency but won’t just be 220Hz by itself
 Creates complex waves
o Hit G# create 220Hz wave but have lots of wave
forms on top of each other
o Harmonic – start with base frequency – the
note
 Fundamental frequency
 Interweaved with the integers – doubled,
tripled, etc.
o Fundamental frequency = n
 Nx2,nx3,nx4,nx5…
 Different instruments will give different patterns of harmonics for the same note

25
Q

Pitch perception exhibits perceptual constancy

A

o Different incoming stimuli can result in the same pitch perception, as long as fundamental frequencies are the same
 Two different notes have different characteristics – timbre due to the harmonics
 Each musician plays G3
 Different acoustic energy patterns –differing in power spectrum of
harmonics but is the same note
 Guitar has a lot of higher frequency
harmonics
 Bassoon has more power in second
harmonic
o Even if you remove the fundamental frequency
but maintain harmonies the same pitch perception can result
 The effect of the missing fundamental

26
Q

Music perception requires

A
  • Hearing
    o Functional auditory system
    o Basic 3 levels of auditory processing
  • Music processing
    o Pitch, rhythm – don’t need to learn this when learning speech
    o Motoric? – motor processing inherently linked with auditory processing
    o Cognitive assessment
     Enjoyment, emotion, meaning
     Personalised, cultural, memory links activated
27
Q

Cortical processing of music

A
- Brain areas activated
o Everywhere – the whole brain 
 Even visceral
o Frontal – preferences 
o A1 – hearing stuff
 Belt regions – right belts and parabelts
 Music is more on the right side of the brain
 Left is more for language
o Motor cortex
o Sensorimotor
o Visual – watching music performed
28
Q

Pitch

A
  • The perceptual submodality of sound based on the frequency of the acoustic energy
    (~pitch/tone/note)
  • Different pitches can be patterned in a certain way to make a melody
    o We can usually still perceive and recognise melodies even if the pitches are wrong or the absolute interval is skewed
  • Melodies show perceptual constancy
    o Melodic constancy – as long as the relative pattern is maintained we can recognise usually
     Need to maintain melodic contour
     Melody is preserved and can make sense of it – recognise it
     If it is inverted – can’t recognise
  • Sometimes the relationship between the notes can be very small
    o Difficult to distinguish for some people
    o Small differences in Hz
  • Moving up the keyboard to the right increases frequency and tone height
29
Q

Pitch differences between individuals

A
  • Absolute pitch
    o Perfect pitch
    o Heightened ability to identify and name specific pitches
    o Used to be considered innate – however correlations to early musical training, early blindness, early linguistic exposure
     Train – take perceptual component and add in memory component
     Prime auditory cortex to listen to small differences – early exposure to music or tonal language
    o ~1 in 10,000
  • Relative pitch
    o Heightened ability to identify tonal intervals but less accurate at naming specific pitches
    o Can be honed, often seen in musicians
  • Normal pitch
  • Amusia
    o Tone deaf
    o Inability to differentiate small differences in pitch that characterise music
    o Can be difficult to perceive melodies
    o Can be congenital amusia or secondary to brain damage
    o Can be overcome with training
30
Q

Experience dependent plasticity

A
  • In monkeys, active training increases tone discrimination performance and auditory
    cortex remapping
  • In humans highly skilled musicians have ~25% increase in cortical representation of
    music tones
  • In humans, sound localisation ability high develop in conductors
    o Plasticity in belt regions and brainstem
    o Superior olive and cochlear nucleus
31
Q

Rhythm

A
  • The temporal patterning of the acoustic signals
  • Does not necessarily need a melody but a melody always has a rhythm
  • Rhythm provides perceptual organisation to the music
    o Deviations from this organisation can seem pleasant or unpleasant
    o E.g. syncopation, jazz improve
  • Often drives the mood/feeling of music
32
Q

Music and emotion

A
  • If notes themselves are non-referential, then why do certain combinations of notes
    (melodies) makes us feel
    o Sad, happy, proud, energetic
  • Perhaps due to
    o Familiarity/conditioning
    o A specific memory (episodic link) o Intrinsicmusicalstructure
     Tempo, loudness, key
     Instrument timbre
33
Q

Music feels

A
o Can use music to make you feel happier or to wallow in your sadness 
o At the gym – energy component
 Part of this is loudness
 Sympathetic nervous system activation – cortisol release, waiting for
something in the enviro to happen
 Amps you up physiologically
 Tempo
 Using music for athlete’s performance
 Bpm match heart rate or running pace
34
Q

Music has power to affect you in a way that language alone doesn’t

A

-Conditioned from an early age
 Religious sounds, radio sounds, funeral sounds, sports sounds
 Conditioned as a group – society – you know certain songs are
happy
 Conditioned to feel proud – national anthem
-Memory
 Brings you back to a time from the past
 Unique to you as an individual
-Intrinsic
 Sound waves create emotion
 Keys – consonant chords are pleasant, dissonant chords are
not
 Certain instruments for happy music – have timbre with a
certain feel
 Peter and the wolf – each animal has an instrument that
matches the character

35
Q

Music Therapy

A

o Music is increasingly being used as a part of speech therapy for aphasia
o Activate right A1 – lack of language in that area
o When congenital aphasia, or selective aphasia
 Seems like music is a way to release inhibition in the brain and let you speak again

36
Q

Sack’s hypothesis

music therapy

A

 The right broca’s area is inhibiting the left broca’s area – aberrant inhibition across hemispheres
 Maybe music can distract or engage the right so it disinhibits the left
and lets it work again
 Right correlary of Broca’s region – it is usually on the left
 Functional recovery

37
Q

Aphasia that lacks prosody

music therapy

A

 Stroke – regain normal speech
 Sound robotic – lost inflection
 Use music to reengage the right hemisphere
 Paralinguistic prosody area – right hemisphere – learn to use
inflection again
 Get patients to sing things and tone down the intonations to a
normal speech level

38
Q

Tourette’s, autism, movement disorders (PD)

music therapy

A

 “we listen with our muscles” Nietzsche
 Sacks
 Patients in the 60’s who were catatonic, wake them up with music
 Wake them up out of their wheelchairs and respond to things
o Start to walk around
o Motoric component to music that stimulates you to move
o All of a sudden the catatonic people getting up and dancing – environmental enrichment, don’t need drugs or anything else
 Even babies will move to music – innate tendencies to move to a beat
 Also in animals
 We respond - energy in patterns makes us want to move

39
Q

Memorisation

music therapy

A

 Melody helps you remember and encode information

 Will remember a song you liked from 10 years ago

40
Q

Week 8 Speech + Music

A