Lec 10/ TB Ch 11 Flashcards

1
Q
  • Lowest freq of harmonic spectrum: 2 names
  • 1st vs 2nd harmonic freq
  • Missing-fundamental effect
    • definition
    • LS graphs
    • RS graphs
      • connection to fundamental freq
  • Rogues waves
A

Complex sounds

  • Harmonics
    • Lowest frequency of harmonic spectrum:
      • Fundamental frequency
        • 2nd harmonic is 2x as high as fundamental freq

Missing fundamental part 1

  • Missing fundamental (x1) is not/barely noticed.
    • # 1 is delected/not played
      • How do we perceive it?
    • Here a tone, we hum it back out
    • If the tone lacks the fundamental freq, we still perceive it
    • The auditory system completes the missing fundamental freq
    • It happens w/ a full or few harmonics
  • More harmonics can be missing. How come? - demo
    • LS: pure tunes w/ increasing frequency
    • RS #1: when all freq are played together, we only perceive the fundamental frequency
    • RS#2: when all freq are played and the fundamental freq is removed, we still only perceive the fundamental freq
    • # 3&4: still hear faint fundamental freq
  • RS#1: Looking at the sound pressure wave, they repeat the cycle of the fundamental freq
  • This explains why the fundamental freq is perceived even though it is not there
    • X
  • This is similar to rogue waves
    • In the ocean, waves can be superimposed
      *
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q
  • graphs 1,2,3
    • how are they related
  • Special feature of graph 4
  • How can you hear 250 Hz?
  • What explains this?
    • place code vs temporal code?
A

Missing fundamentals part 3

  • Graphs #1,2,3: 500, 750, 1000 Hz (differ by increments of 250 Hz)
    • missing fundamental harmonic: 250 Hz
  • How can you hear 250 Hz in Graph 4?
    • Graph 4: all 3 overlap, 2nd, 3rd & 4th harmonic overlap in peaks every 4 ms
      • these three waveforms come into alignment every 4 ms, which, conveniently, happens to be the period of the fundamental frequency for these three harmonics: 250 Hz.
      • Added up they yield a fluctuation in energy at 250 Hz
      • Indeed, every harmonic of 250 Hz will have an energy peak every 4 ms.
      • This explain why listeners perceive the pitch of this complex tone to be 250 Hz, even though the tone has no 250-Hz component.)
    • What explains this: place code vs temporal code?
      • For the basilar membrane, only regions sensitive to 500, 750, and 1000 are active
      • → The place code does not explain the missing fundamental effects
      • It must be the temporal code that explain the superimpositions, and population responses of neurons
        • They are responding particularly stronger at these particular rates (rate of fundamental freq)
        • Population code show increase activity across multiple neurons according to the volley principle
        • Reflecting fundamental freq 250 Hz
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q
  • Timbre
  • What does perception of timbre depend on?
  • “Timbre contrast” or “timbre aftereffect experiment
    • vowel “ee”: # of formants?
    • Methods
      • Graph A
      • Graph b
      • Graph c
    • Conclusion
  • Attack vs decay
    • definition
    • amplitude change
A

Complex sounds cont

  • Timbre: Psychological sensation by which a listener can judge that two sounds that have the same loudness and pitch are dissimilar; conveyed by harmonics and other high frequencies
    • Possible to additional noise??
    • Perception of timbre depends on context in which sound is heard
    • Experiment by Summerfield et al. (1984) - “Timbre contrast” or “timbre aftereffect
  • Study – freq spectrum of vowel ‘ee’
    • RS: “ee” has a peak at lower harmonic, a middle peak, and third peak (3 peaks in harmonic spectrum = fingerprints of vowels = formants)
    • # 1: Summerfield used synthesizer to create artificial sounds
      • Played the one on LS (a)
        • It looks like a broken comb
        • Some harmonics are deleted (3 patches)
        • Rs played this tone first where the harmonics are deleted (similar to color adaptation: stare at red spot)
      • Then played (b)
        • (similar to color adaptation: stare at white wall -> see afterimage of green dot)
        • Then you here an aftersound
        • Their perception were influenced by what they heard before
    • IOW: rs 1st played the opposite of the vowel “ee”
    • Ppl adapt to
      • The first fundamental freq -> doesn’t appear as loud when
      • Same for the other red patches -> reduced ability to hear the red freq
      • But for the other intermediate freq -> you hear them well
      • As a result, what they hear sounds like the vowel “ee”
    • As such, the context (adapting sound @a) matters
  • Attach and decay of sound
    • How sound starts/ends
    • Attack: Part of a sound during which amplitude increases (onset)
    • Decay: Part of a sound during which amplitude decrease
    • Ex. pluck string: attack is immediate
    • Ex. violin & bow: not as immediate
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q
  • Auditory Scene Analysis
  • Auditory scene
  • Multiple sound sources in env (complex sounds) - How does auditory system sort out these sources?
    • 2 names
    • Example: frog + bird + splash
  • strategies to segregate sound sources
    • 1: Spatial separation
      • example: frog & bird
    • 2: motion parallax
    • 3: spectral/temporal qualities/ auditory stream segregation
      • define
      • describe graphs
      • This aligns w/ which gestalt law?
      • Bach fugue example
    • 4: group by timbre
      • Bach organs & timbre
      • This aligns w/ which gestalt law?
    • 5: grouping by onset
      • definition
      • Rasch (1987): what helps to distinguish 2 notes from eo?
      • This aligns w/ which gestalt law?
        • Spectogram vs Spectrum dimensions
      • Did the bottle break
        • Top vs bottom
        • Timbre
  • Multisensory integration
    • definition
    • Which 2 sensory modalities use this?
    • Ventriloquist effect
A

Hearing in the env

  • Auditory scene: the entirety of sounds that is audible in a given moment and that conveys information about the events happening in that moment
  • Multiple sound sources in env (complex sounds) - How does auditory system sort out these sources?
    • Source segregation, or auditory scene analysis (i.e. separate the sounds of frog + bird + splash)
    • Auditory Scene Analysis (how does the auditory system know when to put shit together/segregate them)
      • # of strategies to segregate sound sources
    • # 1: Spatial separation between sounds (frog on left croaks; bird on right chirps)
      • # 2: motion parallax (when you move around, sound sources move pass your head -> spatial separation)
    • # 3: Separation on basis of sounds’ spectral or temporal qualities
    • Auditory stream segregation:
      • Perceptual organization of a complex acoustic signal into separate auditory events for which each stream is heard as a separate event
        • Y-axis = freq
        • X-axis = time
        • LS: Dalaldala
          • The freq/pitch of 2 sounds are close together
          • So our auditory system groups these 2 sounds together based on the Gestalt principle of similarity (Lec 4)
        • RS: The freq/pitch of 2 sounds are far -> don’t group them together
    • Gestalt law: similarity
      • Bach: One instrument playing, but it sounds like 2 is playing
  • x
  • # 4: Grouping by timbre
    • Organ can play same pitch w/ different timbres (2 type of pipes made of different material)
    • Can tell what belongs to what belongs to what
    • This resembles gestalt law of good continuation: based on the color
      • x
  • # 5: Grouping by onset
  • Harmonics of speech sound or music
    • If the sounds are played together -> we tend to group them together
    • If the sounds have diff onsets, we don’t group them together
  • Grouping different harmonics into a single complex tone
  • Rasch (1987): It is much easier to distinguish two notes from one another when onset of one precedes onset of other by very short time
  • -> Resembles Gestalt law of common fate
    • Does the bottle break?
    • Spectogram: A pattern for sound analysis that provides a 3D display of intensity as a function of time and frequency
      • Spectrum: 2D (x-axis = freq; y-axis = power)
      • Spectrogrom: 3D, 3rd dimension = time
        • Used color
        • 1st dimension or vertical-axis = freq
        • 2nd dimension or color = power or intensity
        • 3rd dimension or horizontal-axis = time
        • Based on the spectrogram, we can tell if the bottle is bouncing or breaking
      • Top graph = bounce
        • Fingers or spikes: certain frequencies have more power than other, and these frequencies re-occur (across) – have the same timbre
        • Since the timbre is constant, we can tell the bottle didn’t break
      • Bottom graph = broke
        • After drop, there are different pieces -> messy
        • The different pieces bounce off the floor at different times
        • Each vertical spectrum is different, the timbre are different

Auditory scene analysis

  • Multisensory integration: vision (usually) helps audition to tell what belongs together
  • Ventriloquist effect: An audio-visual illusion in which sound is misperceived as emanating from a source that can be seen to be moving appropriately when it actually emanates from a different invisible source.
    • When there’s 2 ppl, and only 1 person is moving their lips, we tend to perceive sounding coming from this person
    • Visual dominance for location.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q
  • restoration effect in visual modality - describe top 2 figures
  • Which gestalt law if used here?
  • Kluender and Jenison - restoration effect for sound
    • methods: 2 steps
      • 2 conditions
      • results
  • Restoration effect for complex sounds
    • What sources of info is used?
    • Study: The mailman brought the letter.
      • 3 conditions (see spectrogram)
      • explanation
A

Continuity and restoration effects

  • Restoration based on the gestalt law of good continuation: In spite of interruptions, one can still “hear” sound
    • LS: looks like a bunch of chicken feed
    • RS: when you occlude them partially, our visual system perceive the “chicken legs” as a continuous -> we see a cube
    • X
  • Restoration based on the gestalt law of good continuation: In spite of interruptions, one can still “hear” sound
    • Experiments that use signal detection task (e.g., Kluender and Jenison) suggest that at some point, restored missing sounds are encoded in the brain as if they were actually present!
    • Methods:
      • # 1: Rs played a pure tone over time (red line)
      • # 2: At some time, they played some white noise (ex. radio w/ poor reception, all sorts of energy/freq) = grey box
      • There are 2 conditions
        * Condition 1: you hear the pure tone, and the noise simultaneously
        * Condition 2: you hear the pure tone, then the noise, then the pure tone again
        • Results: used d’ experiment to detect if it is yes or not
        • Many people’s d’ = 0
        • They cannot tell if the sound is playing at the same time as the noise in the background
        • It seems the brain reconstructed a perception of the sound based on the gestalt law of good continuation
    • This also applies if Rs plays a pure tone that gradually increases in pitch
    • -> restoration effects for pure tones
  • X
  • Restoration of complex sound, (e.g., music, speech)
    • Here, we don’t just use gestalt law of good continuation/ auditory info, we use “Higher-order” sources of information
    • Ex. we may use semantic info
      • Ex. the phoneme “wh” is missing, replaced as white noise
        • Ex 1: “The *eel fell off the car.” (wheel)
          • we perceive this as “wheel”
        • Ex 2: “The *eel fell off … the table.” (meal)
          • we perceive this as “meal”

The restoration effect

  • our perceptual system is taking into account there is smth occluding the sound you actually want to hear
  • Phonemic restoration: Noise helps comprehension
    • Spectrogram for the sentence “The mailman brought the letter.”
      • For a = normal sentence
      • B: remove certain parts of the sentence (no noise) -> cannot understand sentence
        • There’s nothing, so the perceptual system doesn’t see a reason why we aren’t hearing this -> does not complete the sentence
      • C: replace certain parts of the sentences w/ white noise -> can understand it
        • Since there is white noise, perceptual system assumes smth is missing -> completes the sentence
      • Similar to the chicken feet phenom earlier
        • LS: no bars, no reason to believe it is a cube
        • RS: there are bars occluding things -> likely to see a cube
          *
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q
  • Pythagoras: music & math, planets & math
  • Musical notes freq range
  • Pitch
  • Freq most audible for us
  • Max freq musical freq can play
  • Freq most played by instruments
  • Why aren’t there many instruments playing 5000 Hz?
    • 2 reasons
  • x
  • Octave
  • 2 dimensions of musical pitch
    • tone height
    • tone chroma
    • musical helix - describe
  • Chords
  • dissonant
  • consonant
  • distance b/w
    • 1st → 2nd harmonic
    • 2nd → 3rd harmonic
    • 3rd → 4th harmonic
    • 4th → 5th harmonic
  • Ratios count
  • x
  • What relationship between notes is universal?
  • Western versus Javanese on # of notes in an octave?
  • Musicians’ estimates of intervals between notes vs infant?
  • x
  • Melody
    • definition
    • aka
    • Defined by relationship between ??? not ???
  • Tempo
  • Fugue
  • Rhythm: is it specific to music?
  • Botman 1894 study
    • mehod
    • result
A

Music

  • Music is a way to express thoughts and emotions
  • Pythagoras: Numbers and music intervals
    • Found that music has mathematical regularities; and thinks this is related to the distance b/w planets
  • Some clinical psychologists practice music therapy
  • X
  • Musical notes
    • Sounds of music extend across a frequency range from about 25 to 4500 Hz (aka pitch)
    • Pitch: The psychological aspect of sounds related mainly to the fundamental frequency
    • Bottom of red curve = bottom of blue graph
      • Our best audibility based on these both graphs are 4500 Hz
  • musical instruments can play up to 4500 Hz, not beyond
  • Also note: we can perceive 5000 Hz better than 100 Hz (based on the red graph)
    • There are many instruments that can play 100 Hz (ex. guitar, harp, piano), but not 5000 Hz -> why?
    • Our temporal codes do not work at 5000 Hz, and temporal code is essential for enjoying music
      • Listeners: Great difficulty perceiving octave relationships between tones when one or both tones are greater than 5 kHz -> temporal frequency coding
      • Sounds below freq -> we can perceive patterns
  • x
  • Octave
  • Octave: The interval between two sound frequencies having a ratio of 2:1
    • Example: Middle C (C4) has a fundamental frequency of 261.6 Hz; notes that are one octave from middle C are 130.8 Hz (C3) and 523.2 Hz (C5)
    • C3 (130.8 Hz) sounds more similar to C4 (261.6 Hz) than to E3 (164.8 Hz)
    • There is more to musical pitch than just frequency!

Musical pitch has 2 dimensions

  • Tone height: A sound quality corresponding to the level of pitch. Tone height is monotonically related to frequency
  • Tone chroma: A sound quality shared by tones that have the same octave interval
    • Each note on the musical scale (A–G) has a different chroma
  • Musical helix: Can help visualize musical pitch
    • Notes go around the helix
    • The C2 and C3 are at the same angle (also represented green)

Music - patterns

  • Chords: Created when two or more notes are played simultaneously
  • Consonant: Have simple ratios of note frequencies (3:2, 4:3)
  • Dissonant: Less elegant ratios of note frequencies (16:15, 45:32)
    • If you listen to more jazz, some dissonant sounds seem to be consonant
  • 1st harmonic -> 2nd harmonic = octave
  • 2nd harmonic -> 3rd harmonic = 5th
  • 3rd -> 4th harmonic = 4th
  • 4th -> 5th harmonic = 3rd
  • Ratios count: chords across several octaves perceived as the “same”.
  • X

Cultural differences

  • Some relationships between notes, such as octaves, are universal
  • Research on music perception: Western versus Javanese
    • Javanese culture: Fewer notes within an octave; greater variation in note’s acceptable frequencies
    • Pelog scale in Javanese music:
    • Musicians’ estimates of intervals between notes correspond to the music scale from their culture
    • Infants detect inappropriate notes within both scales (there’s smth innate in both of these scales)

Melody

  • Melody: An arrangement of notes or chords in succession (chroma (not tone height) & rhythm) forming a gestalt
    • Examples: “Twinkle, Twinkle, Little Star” or “Baa Baa Black Sheep”
  • Defined by relationship between successive notes, not specific sounds
  • Melodies can change octaves or keys and still be the same melody
  • Notes and chords vary in duration
  • Tempo: The perceived speed of the presentation of sounds
  • x
  • Melodies/themes as (gestalts) building blocks of music
  • Fugue: a compositional technique (in classical music) in two or more voices, built on a subject (theme) that is introduced at the beginning and then repeated at different pitches.
    • Long held chord -> pattern = theme in the fugue in different pitches

Rhythm: not just in music

  • Lots of activities have rhythm: Walking, waving, finger tapping, etc.
  • Bolton (1894) played sounds perfectly spaced in time. Listeners are predisposed to group sounds into rhythmic patterns
  • Car and train rides
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q
  • # of speech sounds
  • vocal tract
  • 3 parts of speech production & organs
  • x
  • respiration & phonation
    • process of initiating speech
    • children vs adult vocal folds
      • → voice pitch
  • Articulation
    • What are sounds modulated by?
    • how do we change the shape of vocal tract?
A

Speech production

  • Humans are capable of producing lots of different speech sounds:
    • 5000 languages spoken today, utilizing >850 different speech sounds ->
    • flexibility of the human vocal tract: can learn more than 850 diiff speech sounds
  • Vocal tract: The airway above the larynx used for the production of speech
  • X
  • – Respiration (lungs): also diaphragm and trachea
  • – Phonation (vocal cords): also larynx and vocal box
  • – Articulation (vocal/oral tract): the circle area
    • Involves alveolar ridge, tongue, hard palate, soft palate/velum, teeth, lips, epiglottis, nasal tract, pharynx (air flow through)
  • X
  • Respiration and phonation
    • Initiating speech: Diaphragm pushes air out of lungs, through trachea, up to larynx
    • At larynx: Air must pass through two vocal folds
      • Children: Small vocal folds -> higher pitched
      • Adult men: Larger mass of vocal folds (or when you have a cold) -> lower pitch
      • Thus, larynx produces canonical sounds with vocal folds
  • Articulation (sounds are modulated by vocal tract)
    • Area above larynx: Vocal tract
    • Humans can change the shape of their vocal tract by manipulating their jaws, lips, tongue body, tongue tip, and velum (soft palate)
      • Resonance characteristics created by changing size and shape of vocal tracts to affect sound frequency distribution
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q
  • formants
  • Describe top graphs: a → b → c
  • What is the vocal tract doing?
  • In the spectrogram, what do the 3 stripes correspond to?
    • Boot vs bat
A

Formants

  • Peaks in speech spectrum: Formants
    • Labelled by number, from lowest to highest (F1, F2, F3, …)—concentrations in energy occur at different frequencies, depending on length of vocal tract
    • If LS top graph = the harmonic spectrum of what comes out of your vocal folds/larynx
      • Vocal tract will modify it to look like LS bottom graph
      • Ex. top graph: fundamental freq most powerful -> bottom: 5th harmonic has the most power
      • These peaks = formants, also seen to timbre after effect
    • So the vocal tract is acting like a filter function, that amplify certain freq (F1,2,3)
    • This depends on shape of vocal tract
  • Formants show up in spectograms as bands of acoustic energy that undulate up and down, depending on the speech sounds being produced.
  • Spectrogram
    • X-axis = time
    • Y-axis = freq
    • Color: energy
      • 3 types of stripes = formants
  • The speech sounds produced depend on the tongue position (front, back, high, low)
    • Ex. boot (back, high) vs bat (front, low)
  • Spectrogram: A pattern for sound analysis that provides a three-dimensional display plotting time on the horizontal axis, frequency on the vertical axis, and intensity on a color or gray scale
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q
  • what do VOWELS do to the vocal tract?
    • how do we produce diff vowels?
  • 3 modes of obstruction when producing consonants
    • bah vs dah vs gah
    • s vs t
    • zoo vs sue
  • Issue when we have to produce speech fast?
  • Coarticulation
    • Definition
    • Explanation
  • Bah vs boo spectrogram
    • why does the b look different?
  • How are sounds perceived despite coarticulation?
  • x
  • Categorical perception
    • Sound: bah, dah, gah spectrum
      • For the sounds w/in categories, how do ppl perceive them?
    • Visual: blue → green spectrum
      • For the colors w/in categories, how do ppl perceive them?
  • McGurk effect Exp
  • x
  • How do we use categorical perception despite a lack of invariances?
    • It depends on X??
    • Color cube
    • Baa vs Boo
    • Why does eebah and oodah look the same?
    • How do we perceive syllables?
  • What other phenom is this also related?
  • explain
  • Baa vs boo
A

Classifying speech sounds

  • Sound: Most often described in terms of articulation
  • Vowels open the vocal tract, differently shaped tongue and lips
  • Consonants obstruct the vocal tract: 3 modes that obstruction can happen
    • Place of articulation (e.g., at lips, at alveolar ridge, etc.)
      • Ex. “bah”: b is obstructing air flow w/ lips
      • Ex. dah: d obstructs air flow w/ alveolar ridge
      • Ex. gah: g obstructs air flow w/ back of the tongue
    • Manner of articulation (totally/partially/slightly obstructed airflow)
      • “s”: partially obstructed
      • “t”: totally obstructed
    • Voicing: Whether the vocal cords are vibrating or not
      • Ex. zoo from Z = vibrating starting w/ “z”
      • Ex. Sue = larynx is not really vibrating from “s”
        • Speech production: Very fast
    • Since we need to move things (ex. thing in out vocal tract) really fast, things might not be at the ideal position to create the sound
  • Coarticulation: The phenomenon in speech whereby attributes of successive speech units overlap in articulatory or acoustic patterns
    • Inertia prevents tongue, lips, jaw, etc. from moving too fast
      • Ex. you can only move your tongue so fast
      • Ex. Bah vs boo spectrogram
        • There’s overlap b/w vowel that comes after b
        • Here the b for both words look different
        • B is different depending on what comes next
    • Lack of invariance (there’s variance)
  • -> How are sounds perceived despite coarticulation?
    • X
  • Categorical perception
    • (Artificial) sound stimuli can vary continuously from “bah” (LS) to “dah” (middle) to “gah” (RS)
      • Aka continuum
        • For visual perception: if it is blue -> green, we see blue, bluish grey, grey, greenish grey, green
    • But people do not perceive continuous variation, they perceive sharp categorical boundaries btw. stimuli; don’t perceive differences btw. sounds within category
      • Ex. we perceive bah, dah, gah
      • If you play things b/w bah and dah, they either say bah or gah and are absolutely convinced about it even though the stimuli are ambiguous
  • McGurk effect Exp: See gah, hear bah. Perceive dah
    • audiovisual illusion that illustrates categorical perception
  • x
  • So then: how categorical perception despite a lack of invariances?
    • It depends on context
    • Similar to color perception
      • We see the red tile in the shadow as red as well as the red tile in the light as right
      • The top tile looks brown, the one in the shadow looks orange: they are physically the same but we perceive them differently
      • Perception of speech depends on the context (ex. what you hear b4)
  • Back to coarticulation …
    • Ex. Baa vs boo = b are very different depending on the vowel coming after b/c of coarticulation
    • It’s more complicated
    • ‘bah’ after ‘ee’ (LS top graph) very similar sound as ‘dah’ after ‘oo’ (RS bottom graph)
      • Here “bah” and “dah” looks the same due to coarticulation
        • We perceive syllables on the basis of the relative change in the spectrum: spectral contrast.
      • Spectral context (ex. we just heard “ee” or we just heard the “oo”)
      • Relative to the spectral context, we hear “b” and “d” differently
  • This is also related to the timbre aftereffect
    • Spectral contrast results in adaptation to timbre
    • So we perceive the “b graph sound” as ee
    • We can modify things based on the spectral contrast/context
      *
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q
  • do we use only one cue to recognize faces/objects/sounds?
  • Does this word for languages we are learning?
  • So what does perception depend on?
  • Prenatals
    • what voice do their prefer?
    • What language do their prefer?
  • “r”/”l” for Japanese vs English
  • bb w/ English speaking parents vs Bb w/ Hindi speaking parents: describe graph
  • Conclusion
  • x
  • 2nd challenge when learning a 2nd langauge
  • x
  • Saffran, Aslin, and Newport (1996):
    • Methods: learning new language
      • 3 steps
    • Statistical learning
      *
A
  • Using multiple acoustic cues (for speech perception, we use multiple cues together to recognize speech sounds)
    • This is Similar to face recognition
      • Similar features can be used in different combinations and we can rely on multiple cues to recognize a face
  • -> use multiple cues: this doesn’t work well on languages we are still learning
    • So, Perception depends on experience
  • X
  • Learning to listen
    • Babies learn to listen even before they are born!
    • Prenatal experience: Newborns prefer hearing their mother’s voice over other women’s voices
    • 4-day-old French babies prefer hearing French over Russian (prefer the language their mom’s speak)
  • X
  • Becoming a native listener
    • There are more then 850 speech sounds, but not all of these sounds are used in each language; we do not need the ability to distinguish between all of these speech sounds
    • Sound distinctions are specific to various languages
    • Example: “r” and “l” are not distinguished in Japanese
    • Infants begin filtering out irrelevant acoustics long before they start to say speech sounds
    • Exp: bb w/ English speaking parents vs Bb w/ Hindi speaking parents
      • When you play Hindi sounds to 6-8 mo bb w/ English speaking parents -> they are responsive
      • Their responses declines
      • 10-12 mo bb w/ English speaking parents: not responsive, they perceive them as irrelevant
      • 11-12 mo bb w. Hindi speaking parents -> still very responsive
    • Thus, as we grow, our minds spend more resources to learn our own language, so it only respond to sounds that are relevant in our language
  • Another challenge when learning a 2nd language
    • Learning words
      • How do we know where one word ends and another begins?
      • Ex. the spectrogram for the sentence “where are the silences b/w words”
        • Here “where are the” is spoken continuously w/ very little pauses
        • We can be choppy: where, are, the; but no one speaks like this
        • For new learners, if becomes difficult to tell where the words end/start
  • X
  • Bb learn new words really quickly: hear it once, knows it
  • Research by Saffran, Aslin, and Newport (1996):
    • # 1: Created a novel language (w/ it’s own phonetic rules) and infants listened to sentences for two minutes
      • Phonetic rules: how likely is “s” coming after “p” (ex. spoon)
        • IOW: how likely are certain phonemes in this specific sequence (examine w/in words rather than b/w words?)
      • tupiro, golabu, dapiku, tilado
    • # 2: played novel words from that language bb have never heard b4 (those novel words have the same phonological rules)
    • Afterwards, infants could already distinguish between words and non-words in the novel language
  • Statistical learning: Certain sounds (making words) are more likely to occur together and babies are sensitive to those probabilities
    • How can you tell if sounds belong to the same words vs 2 different words being pieced together
      • IOW: The phonological rules w/in words are regular; while the phonological rules b/w word (from one word to another) do not apply anymore
      • This is how we can tell and infants learn them fast
  • Saffran suggests that bb learn to pick words out of the speech stream by accumulating experience w/ sounds that tend to occur together
    *
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q
  • Why can’t we study speech in other species?
  • Why other techniques do we use?
    • Brain damages: Why is this not an ideal technique?
    • Nroimaging: 2 main issues?
  • Listening speech: what brain areas are more active?
  • What brain areas are more activated for complex language tasks?
    • brain area?
    • L vs HR?
      *
A

Speech

Speech in the brain

  • For visual system: we can conduct studies on animals as there are similarities b/w the visual system for humans and animals
    • We can’t do this for speech as other animals do not have similar capabilities
  • Use different techniques
    • Identify brain areas for speech
    • We cannot damage patients’ brain to study this; cannot only study this when bad events (stroke, tumors) happen
      • Issue: stroke -> regular damage patterns to blood vessels
      • Language deficits symptoms look different for patients who have a stroke vs tumors removed
      • The patterns of blood vessels do not say anything specific about whether specific structures for language production are close together in the brain
      • Is it because of neighboring structures in the brain or is it about the neighboring structures that are perfused by the same blood vessels
      • This is the main difficulty of nropsych studies in patients
  • Brain damage follows patterns of blood vessels, not brain function, so difficult to study
  • TB: Damage from stroke may cover just part of a particular brain fx -> leaving some of the fx undamaged
  • Or damage can cover a wide region that includes some or all of a particular brain fx and part of other fx
  • Examine healthy ppl w/ EEG, MEG, PET and fMRI studies: Help to learn about speech processing in the brain
    • Correlational only
    • Hard to create well-controlled nonspeech stimuli because humans are so good at understanding even severely distorted speech
      • Challenges: difficult to find control conditions
      • Ex. there’s wind passing by in experiments, ppl interpret this pattern even though it’s just random noise in the lab
      • IOQ: perceptual system is so sensitive to speech that is attempts to perceive speech in control conditions
  • Listening to speech: Left and right superior temporal lobes are activated more strongly in response to speech than to nonspeech sounds
  • X
  • Categorical perception tasks: Listeners attempt to discriminate sounds like “bah” and “dah” while having their brain scanned
  • As sounds become more complex, they are processed by more anterior and ventral regions of superior temporal cortex – in both hemispheres
  • As sounds become more speech-like, more activation in the left brain
  • Research indicates that some “speech” areas become active when lip-reading (multisensory aspects of speech processing?)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly