Lec 10/ TB Ch 11 Flashcards
1
Q
- Lowest freq of harmonic spectrum: 2 names
- 1st vs 2nd harmonic freq
- Missing-fundamental effect
- definition
- LS graphs
- RS graphs
- connection to fundamental freq
- Rogues waves
A
Complex sounds
- Harmonics
- Lowest frequency of harmonic spectrum:
- Fundamental frequency
- 2nd harmonic is 2x as high as fundamental freq
- Fundamental frequency
- Lowest frequency of harmonic spectrum:
Missing fundamental part 1
- Missing fundamental (x1) is not/barely noticed.
- # 1 is delected/not played
- How do we perceive it?
- Here a tone, we hum it back out
- If the tone lacks the fundamental freq, we still perceive it
- The auditory system completes the missing fundamental freq
- It happens w/ a full or few harmonics
- More harmonics can be missing. How come? - demo
- LS: pure tunes w/ increasing frequency
- RS #1: when all freq are played together, we only perceive the fundamental frequency
- RS#2: when all freq are played and the fundamental freq is removed, we still only perceive the fundamental freq
- # 3&4: still hear faint fundamental freq
- RS#1: Looking at the sound pressure wave, they repeat the cycle of the fundamental freq
- This explains why the fundamental freq is perceived even though it is not there
- X
- This is similar to rogue waves
- In the ocean, waves can be superimposed
*
- In the ocean, waves can be superimposed
2
Q
- graphs 1,2,3
- how are they related
- Special feature of graph 4
- How can you hear 250 Hz?
- What explains this?
- place code vs temporal code?
A
Missing fundamentals part 3
- Graphs #1,2,3: 500, 750, 1000 Hz (differ by increments of 250 Hz)
- missing fundamental harmonic: 250 Hz
- How can you hear 250 Hz in Graph 4?
- Graph 4: all 3 overlap, 2nd, 3rd & 4th harmonic overlap in peaks every 4 ms
- these three waveforms come into alignment every 4 ms, which, conveniently, happens to be the period of the fundamental frequency for these three harmonics: 250 Hz.
- Added up they yield a fluctuation in energy at 250 Hz
- Indeed, every harmonic of 250 Hz will have an energy peak every 4 ms.
- This explain why listeners perceive the pitch of this complex tone to be 250 Hz, even though the tone has no 250-Hz component.)
- What explains this: place code vs temporal code?
- For the basilar membrane, only regions sensitive to 500, 750, and 1000 are active
- → The place code does not explain the missing fundamental effects
- It must be the temporal code that explain the superimpositions, and population responses of neurons
- They are responding particularly stronger at these particular rates (rate of fundamental freq)
- Population code show increase activity across multiple neurons according to the volley principle
- Reflecting fundamental freq 250 Hz
- Graph 4: all 3 overlap, 2nd, 3rd & 4th harmonic overlap in peaks every 4 ms
3
Q
- Timbre
- What does perception of timbre depend on?
- “Timbre contrast” or “timbre aftereffect experiment
- vowel “ee”: # of formants?
- Methods
- Graph A
- Graph b
- Graph c
- Conclusion
- Attack vs decay
- definition
- amplitude change
A
Complex sounds cont
- Timbre: Psychological sensation by which a listener can judge that two sounds that have the same loudness and pitch are dissimilar; conveyed by harmonics and other high frequencies
- Possible to additional noise??
- Perception of timbre depends on context in which sound is heard
- Experiment by Summerfield et al. (1984) - “Timbre contrast” or “timbre aftereffect
- Study – freq spectrum of vowel ‘ee’
- RS: “ee” has a peak at lower harmonic, a middle peak, and third peak (3 peaks in harmonic spectrum = fingerprints of vowels = formants)
- # 1: Summerfield used synthesizer to create artificial sounds
- Played the one on LS (a)
- It looks like a broken comb
- Some harmonics are deleted (3 patches)
- Rs played this tone first where the harmonics are deleted (similar to color adaptation: stare at red spot)
- Then played (b)
- (similar to color adaptation: stare at white wall -> see afterimage of green dot)
- Then you here an aftersound
- Their perception were influenced by what they heard before
- Played the one on LS (a)
- IOW: rs 1st played the opposite of the vowel “ee”
- Ppl adapt to
- The first fundamental freq -> doesn’t appear as loud when
- Same for the other red patches -> reduced ability to hear the red freq
- But for the other intermediate freq -> you hear them well
- As a result, what they hear sounds like the vowel “ee”
- As such, the context (adapting sound @a) matters
- Attach and decay of sound
- How sound starts/ends
- Attack: Part of a sound during which amplitude increases (onset)
- Decay: Part of a sound during which amplitude decrease
- Ex. pluck string: attack is immediate
- Ex. violin & bow: not as immediate
4
Q
- Auditory Scene Analysis
- Auditory scene
- Multiple sound sources in env (complex sounds) - How does auditory system sort out these sources?
- 2 names
- Example: frog + bird + splash
- strategies to segregate sound sources
- 1: Spatial separation
- example: frog & bird
- 2: motion parallax
- 3: spectral/temporal qualities/ auditory stream segregation
- define
- describe graphs
- This aligns w/ which gestalt law?
- Bach fugue example
- 4: group by timbre
- Bach organs & timbre
- This aligns w/ which gestalt law?
- 5: grouping by onset
- definition
- Rasch (1987): what helps to distinguish 2 notes from eo?
- This aligns w/ which gestalt law?
- Spectogram vs Spectrum dimensions
- Did the bottle break
- Top vs bottom
- Timbre
- 1: Spatial separation
- Multisensory integration
- definition
- Which 2 sensory modalities use this?
- Ventriloquist effect
A
Hearing in the env
- Auditory scene: the entirety of sounds that is audible in a given moment and that conveys information about the events happening in that moment
- Multiple sound sources in env (complex sounds) - How does auditory system sort out these sources?
- Source segregation, or auditory scene analysis (i.e. separate the sounds of frog + bird + splash)
- Auditory Scene Analysis (how does the auditory system know when to put shit together/segregate them)
- # of strategies to segregate sound sources
- # 1: Spatial separation between sounds (frog on left croaks; bird on right chirps)
- # 2: motion parallax (when you move around, sound sources move pass your head -> spatial separation)
- # 3: Separation on basis of sounds’ spectral or temporal qualities
-
Auditory stream segregation:
- Perceptual organization of a complex acoustic signal into separate auditory events for which each stream is heard as a separate event
- Y-axis = freq
- X-axis = time
- LS: Dalaldala
- The freq/pitch of 2 sounds are close together
- So our auditory system groups these 2 sounds together based on the Gestalt principle of similarity (Lec 4)
- RS: The freq/pitch of 2 sounds are far -> don’t group them together
- Perceptual organization of a complex acoustic signal into separate auditory events for which each stream is heard as a separate event
- Gestalt law: similarity
- Bach: One instrument playing, but it sounds like 2 is playing
- x
- # 4: Grouping by timbre
- Organ can play same pitch w/ different timbres (2 type of pipes made of different material)
- Can tell what belongs to what belongs to what
- This resembles gestalt law of good continuation: based on the color
- x
- # 5: Grouping by onset
- Harmonics of speech sound or music
- If the sounds are played together -> we tend to group them together
- If the sounds have diff onsets, we don’t group them together
- Grouping different harmonics into a single complex tone
- Rasch (1987): It is much easier to distinguish two notes from one another when onset of one precedes onset of other by very short time
- -> Resembles Gestalt law of common fate
- Does the bottle break?
- Spectogram: A pattern for sound analysis that provides a 3D display of intensity as a function of time and frequency
- Spectrum: 2D (x-axis = freq; y-axis = power)
- Spectrogrom: 3D, 3rd dimension = time
- Used color
- 1st dimension or vertical-axis = freq
- 2nd dimension or color = power or intensity
- 3rd dimension or horizontal-axis = time
- Based on the spectrogram, we can tell if the bottle is bouncing or breaking
- Top graph = bounce
- Fingers or spikes: certain frequencies have more power than other, and these frequencies re-occur (across) – have the same timbre
- Since the timbre is constant, we can tell the bottle didn’t break
- Bottom graph = broke
- After drop, there are different pieces -> messy
- The different pieces bounce off the floor at different times
- Each vertical spectrum is different, the timbre are different
Auditory scene analysis
- Multisensory integration: vision (usually) helps audition to tell what belongs together
- Ventriloquist effect: An audio-visual illusion in which sound is misperceived as emanating from a source that can be seen to be moving appropriately when it actually emanates from a different invisible source.
- When there’s 2 ppl, and only 1 person is moving their lips, we tend to perceive sounding coming from this person
- Visual dominance for location.
5
Q
- restoration effect in visual modality - describe top 2 figures
- Which gestalt law if used here?
- Kluender and Jenison - restoration effect for sound
- methods: 2 steps
- 2 conditions
- results
- methods: 2 steps
- Restoration effect for complex sounds
- What sources of info is used?
- Study: The mailman brought the letter.
- 3 conditions (see spectrogram)
- explanation
A
Continuity and restoration effects
- Restoration based on the gestalt law of good continuation: In spite of interruptions, one can still “hear” sound
- LS: looks like a bunch of chicken feed
- RS: when you occlude them partially, our visual system perceive the “chicken legs” as a continuous -> we see a cube
- X
- Restoration based on the gestalt law of good continuation: In spite of interruptions, one can still “hear” sound
- Experiments that use signal detection task (e.g., Kluender and Jenison) suggest that at some point, restored missing sounds are encoded in the brain as if they were actually present!
- Methods:
- # 1: Rs played a pure tone over time (red line)
- # 2: At some time, they played some white noise (ex. radio w/ poor reception, all sorts of energy/freq) = grey box
- There are 2 conditions
* Condition 1: you hear the pure tone, and the noise simultaneously
* Condition 2: you hear the pure tone, then the noise, then the pure tone again- Results: used d’ experiment to detect if it is yes or not
- Many people’s d’ = 0
- They cannot tell if the sound is playing at the same time as the noise in the background
- It seems the brain reconstructed a perception of the sound based on the gestalt law of good continuation
- There are 2 conditions
- This also applies if Rs plays a pure tone that gradually increases in pitch
- -> restoration effects for pure tones
- X
- Restoration of complex sound, (e.g., music, speech)
- Here, we don’t just use gestalt law of good continuation/ auditory info, we use “Higher-order” sources of information
- Ex. we may use semantic info
- Ex. the phoneme “wh” is missing, replaced as white noise
- Ex 1: “The *eel fell off the car.” (wheel)
- we perceive this as “wheel”
- Ex 2: “The *eel fell off … the table.” (meal)
- we perceive this as “meal”
- Ex 1: “The *eel fell off the car.” (wheel)
- Ex. the phoneme “wh” is missing, replaced as white noise
The restoration effect
- our perceptual system is taking into account there is smth occluding the sound you actually want to hear
- Phonemic restoration: Noise helps comprehension
- Spectrogram for the sentence “The mailman brought the letter.”
- For a = normal sentence
- B: remove certain parts of the sentence (no noise) -> cannot understand sentence
- There’s nothing, so the perceptual system doesn’t see a reason why we aren’t hearing this -> does not complete the sentence
- C: replace certain parts of the sentences w/ white noise -> can understand it
- Since there is white noise, perceptual system assumes smth is missing -> completes the sentence
- Similar to the chicken feet phenom earlier
- LS: no bars, no reason to believe it is a cube
- RS: there are bars occluding things -> likely to see a cube
*
- Similar to the chicken feet phenom earlier
- Spectrogram for the sentence “The mailman brought the letter.”
6
Q
- Pythagoras: music & math, planets & math
- Musical notes freq range
- Pitch
- Freq most audible for us
- Max freq musical freq can play
- Freq most played by instruments
- Why aren’t there many instruments playing 5000 Hz?
- 2 reasons
- x
- Octave
- 2 dimensions of musical pitch
- tone height
- tone chroma
- musical helix - describe
- Chords
- dissonant
- consonant
- distance b/w
- 1st → 2nd harmonic
- 2nd → 3rd harmonic
- 3rd → 4th harmonic
- 4th → 5th harmonic
- Ratios count
- x
- What relationship between notes is universal?
- Western versus Javanese on # of notes in an octave?
- Musicians’ estimates of intervals between notes vs infant?
- x
- Melody
- definition
- aka
- Defined by relationship between ??? not ???
- Tempo
- Fugue
- Rhythm: is it specific to music?
- Botman 1894 study
- mehod
- result
A
Music
- Music is a way to express thoughts and emotions
- Pythagoras: Numbers and music intervals
- Found that music has mathematical regularities; and thinks this is related to the distance b/w planets
- Some clinical psychologists practice music therapy
- X
- Musical notes
- Sounds of music extend across a frequency range from about 25 to 4500 Hz (aka pitch)
- Pitch: The psychological aspect of sounds related mainly to the fundamental frequency
- Bottom of red curve = bottom of blue graph
- Our best audibility based on these both graphs are 4500 Hz
- musical instruments can play up to 4500 Hz, not beyond
- Also note: we can perceive 5000 Hz better than 100 Hz (based on the red graph)
- There are many instruments that can play 100 Hz (ex. guitar, harp, piano), but not 5000 Hz -> why?
- Our temporal codes do not work at 5000 Hz, and temporal code is essential for enjoying music
- Listeners: Great difficulty perceiving octave relationships between tones when one or both tones are greater than 5 kHz -> temporal frequency coding
- Sounds below freq -> we can perceive patterns
- x
- Octave
- Octave: The interval between two sound frequencies having a ratio of 2:1
- Example: Middle C (C4) has a fundamental frequency of 261.6 Hz; notes that are one octave from middle C are 130.8 Hz (C3) and 523.2 Hz (C5)
- C3 (130.8 Hz) sounds more similar to C4 (261.6 Hz) than to E3 (164.8 Hz)
- There is more to musical pitch than just frequency!
Musical pitch has 2 dimensions
- Tone height: A sound quality corresponding to the level of pitch. Tone height is monotonically related to frequency
- Tone chroma: A sound quality shared by tones that have the same octave interval
- Each note on the musical scale (A–G) has a different chroma
- Musical helix: Can help visualize musical pitch
- Notes go around the helix
- The C2 and C3 are at the same angle (also represented green)
Music - patterns
- Chords: Created when two or more notes are played simultaneously
- Consonant: Have simple ratios of note frequencies (3:2, 4:3)
-
Dissonant: Less elegant ratios of note frequencies (16:15, 45:32)
- If you listen to more jazz, some dissonant sounds seem to be consonant
- 1st harmonic -> 2nd harmonic = octave
- 2nd harmonic -> 3rd harmonic = 5th
- 3rd -> 4th harmonic = 4th
- 4th -> 5th harmonic = 3rd
- Ratios count: chords across several octaves perceived as the “same”.
- X
Cultural differences
- Some relationships between notes, such as octaves, are universal
- Research on music perception: Western versus Javanese
- Javanese culture: Fewer notes within an octave; greater variation in note’s acceptable frequencies
- Pelog scale in Javanese music:
- Musicians’ estimates of intervals between notes correspond to the music scale from their culture
- Infants detect inappropriate notes within both scales (there’s smth innate in both of these scales)
Melody
- Melody: An arrangement of notes or chords in succession (chroma (not tone height) & rhythm) forming a gestalt
- Examples: “Twinkle, Twinkle, Little Star” or “Baa Baa Black Sheep”
- Defined by relationship between successive notes, not specific sounds
- Melodies can change octaves or keys and still be the same melody
- Notes and chords vary in duration
- Tempo: The perceived speed of the presentation of sounds
- x
- Melodies/themes as (gestalts) building blocks of music
- Fugue: a compositional technique (in classical music) in two or more voices, built on a subject (theme) that is introduced at the beginning and then repeated at different pitches.
- Long held chord -> pattern = theme in the fugue in different pitches
Rhythm: not just in music
- Lots of activities have rhythm: Walking, waving, finger tapping, etc.
- Bolton (1894) played sounds perfectly spaced in time. Listeners are predisposed to group sounds into rhythmic patterns
- Car and train rides
7
Q
- # of speech sounds
- vocal tract
- 3 parts of speech production & organs
- x
- respiration & phonation
- process of initiating speech
- children vs adult vocal folds
- → voice pitch
- Articulation
- What are sounds modulated by?
- how do we change the shape of vocal tract?
A
Speech production
- Humans are capable of producing lots of different speech sounds:
- 5000 languages spoken today, utilizing >850 different speech sounds ->
- flexibility of the human vocal tract: can learn more than 850 diiff speech sounds
- Vocal tract: The airway above the larynx used for the production of speech
- X
- – Respiration (lungs): also diaphragm and trachea
- – Phonation (vocal cords): also larynx and vocal box
- – Articulation (vocal/oral tract): the circle area
- Involves alveolar ridge, tongue, hard palate, soft palate/velum, teeth, lips, epiglottis, nasal tract, pharynx (air flow through)
- X
- Respiration and phonation
- Initiating speech: Diaphragm pushes air out of lungs, through trachea, up to larynx
- At larynx: Air must pass through two vocal folds
- Children: Small vocal folds -> higher pitched
- Adult men: Larger mass of vocal folds (or when you have a cold) -> lower pitch
- Thus, larynx produces canonical sounds with vocal folds
- Articulation (sounds are modulated by vocal tract)
- Area above larynx: Vocal tract
- Humans can change the shape of their vocal tract by manipulating their jaws, lips, tongue body, tongue tip, and velum (soft palate)
- Resonance characteristics created by changing size and shape of vocal tracts to affect sound frequency distribution
8
Q
- formants
- Describe top graphs: a → b → c
- What is the vocal tract doing?
- In the spectrogram, what do the 3 stripes correspond to?
- Boot vs bat
A
Formants
- Peaks in speech spectrum: Formants
- Labelled by number, from lowest to highest (F1, F2, F3, …)—concentrations in energy occur at different frequencies, depending on length of vocal tract
- If LS top graph = the harmonic spectrum of what comes out of your vocal folds/larynx
- Vocal tract will modify it to look like LS bottom graph
- Ex. top graph: fundamental freq most powerful -> bottom: 5th harmonic has the most power
- These peaks = formants, also seen to timbre after effect
- So the vocal tract is acting like a filter function, that amplify certain freq (F1,2,3)
- This depends on shape of vocal tract
- Formants show up in spectograms as bands of acoustic energy that undulate up and down, depending on the speech sounds being produced.
- Spectrogram
- X-axis = time
- Y-axis = freq
- Color: energy
- 3 types of stripes = formants
- The speech sounds produced depend on the tongue position (front, back, high, low)
- Ex. boot (back, high) vs bat (front, low)
- Spectrogram: A pattern for sound analysis that provides a three-dimensional display plotting time on the horizontal axis, frequency on the vertical axis, and intensity on a color or gray scale
9
Q
- what do VOWELS do to the vocal tract?
- how do we produce diff vowels?
- 3 modes of obstruction when producing consonants
- bah vs dah vs gah
- s vs t
- zoo vs sue
- Issue when we have to produce speech fast?
- Coarticulation
- Definition
- Explanation
- Bah vs boo spectrogram
- why does the b look different?
- How are sounds perceived despite coarticulation?
- x
- Categorical perception
- Sound: bah, dah, gah spectrum
- For the sounds w/in categories, how do ppl perceive them?
- Visual: blue → green spectrum
- For the colors w/in categories, how do ppl perceive them?
- Sound: bah, dah, gah spectrum
- McGurk effect Exp
- x
- How do we use categorical perception despite a lack of invariances?
- It depends on X??
- Color cube
- Baa vs Boo
- Why does eebah and oodah look the same?
- How do we perceive syllables?
- What other phenom is this also related?
- explain
- Baa vs boo
A
Classifying speech sounds
- Sound: Most often described in terms of articulation
- Vowels open the vocal tract, differently shaped tongue and lips
- Consonants obstruct the vocal tract: 3 modes that obstruction can happen
- Place of articulation (e.g., at lips, at alveolar ridge, etc.)
- Ex. “bah”: b is obstructing air flow w/ lips
- Ex. dah: d obstructs air flow w/ alveolar ridge
- Ex. gah: g obstructs air flow w/ back of the tongue
- Manner of articulation (totally/partially/slightly obstructed airflow)
- “s”: partially obstructed
- “t”: totally obstructed
- Voicing: Whether the vocal cords are vibrating or not
- Ex. zoo from Z = vibrating starting w/ “z”
- Ex. Sue = larynx is not really vibrating from “s”
- Speech production: Very fast
- Since we need to move things (ex. thing in out vocal tract) really fast, things might not be at the ideal position to create the sound
- Place of articulation (e.g., at lips, at alveolar ridge, etc.)
- Coarticulation: The phenomenon in speech whereby attributes of successive speech units overlap in articulatory or acoustic patterns
- Inertia prevents tongue, lips, jaw, etc. from moving too fast
- Ex. you can only move your tongue so fast
- Ex. Bah vs boo spectrogram
- There’s overlap b/w vowel that comes after b
- Here the b for both words look different
- B is different depending on what comes next
- Lack of invariance (there’s variance)
- Inertia prevents tongue, lips, jaw, etc. from moving too fast
- -> How are sounds perceived despite coarticulation?
- X
- Categorical perception
- (Artificial) sound stimuli can vary continuously from “bah” (LS) to “dah” (middle) to “gah” (RS)
- Aka continuum
- For visual perception: if it is blue -> green, we see blue, bluish grey, grey, greenish grey, green
- But people do not perceive continuous variation, they perceive sharp categorical boundaries btw. stimuli; don’t perceive differences btw. sounds within category
- Ex. we perceive bah, dah, gah
- If you play things b/w bah and dah, they either say bah or gah and are absolutely convinced about it even though the stimuli are ambiguous
- (Artificial) sound stimuli can vary continuously from “bah” (LS) to “dah” (middle) to “gah” (RS)
- McGurk effect Exp: See gah, hear bah. Perceive dah
- audiovisual illusion that illustrates categorical perception
- x
- So then: how categorical perception despite a lack of invariances?
- It depends on context
- Similar to color perception
- We see the red tile in the shadow as red as well as the red tile in the light as right
- The top tile looks brown, the one in the shadow looks orange: they are physically the same but we perceive them differently
- Perception of speech depends on the context (ex. what you hear b4)
- Back to coarticulation …
- Ex. Baa vs boo = b are very different depending on the vowel coming after b/c of coarticulation
- It’s more complicated
- ‘bah’ after ‘ee’ (LS top graph) very similar sound as ‘dah’ after ‘oo’ (RS bottom graph)
- Here “bah” and “dah” looks the same due to coarticulation
- We perceive syllables on the basis of the relative change in the spectrum: spectral contrast.
- Spectral context (ex. we just heard “ee” or we just heard the “oo”)
- Relative to the spectral context, we hear “b” and “d” differently
- This is also related to the timbre aftereffect
- Spectral contrast results in adaptation to timbre
- So we perceive the “b graph sound” as ee
- We can modify things based on the spectral contrast/context
*
10
Q
- do we use only one cue to recognize faces/objects/sounds?
- Does this word for languages we are learning?
- So what does perception depend on?
- Prenatals
- what voice do their prefer?
- What language do their prefer?
- “r”/”l” for Japanese vs English
- bb w/ English speaking parents vs Bb w/ Hindi speaking parents: describe graph
- Conclusion
- x
- 2nd challenge when learning a 2nd langauge
- x
- Saffran, Aslin, and Newport (1996):
- Methods: learning new language
- 3 steps
- Statistical learning
*
- Methods: learning new language
A
- Using multiple acoustic cues (for speech perception, we use multiple cues together to recognize speech sounds)
- This is Similar to face recognition
- Similar features can be used in different combinations and we can rely on multiple cues to recognize a face
- This is Similar to face recognition
- -> use multiple cues: this doesn’t work well on languages we are still learning
- So, Perception depends on experience
- X
- Learning to listen
- Babies learn to listen even before they are born!
- Prenatal experience: Newborns prefer hearing their mother’s voice over other women’s voices
- 4-day-old French babies prefer hearing French over Russian (prefer the language their mom’s speak)
- X
- Becoming a native listener
- There are more then 850 speech sounds, but not all of these sounds are used in each language; we do not need the ability to distinguish between all of these speech sounds
- Sound distinctions are specific to various languages
- Example: “r” and “l” are not distinguished in Japanese
- Infants begin filtering out irrelevant acoustics long before they start to say speech sounds
- Exp: bb w/ English speaking parents vs Bb w/ Hindi speaking parents
- When you play Hindi sounds to 6-8 mo bb w/ English speaking parents -> they are responsive
- Their responses declines
- 10-12 mo bb w/ English speaking parents: not responsive, they perceive them as irrelevant
- 11-12 mo bb w. Hindi speaking parents -> still very responsive
- Thus, as we grow, our minds spend more resources to learn our own language, so it only respond to sounds that are relevant in our language
- Another challenge when learning a 2nd language
- Learning words
- How do we know where one word ends and another begins?
- Ex. the spectrogram for the sentence “where are the silences b/w words”
- Here “where are the” is spoken continuously w/ very little pauses
- We can be choppy: where, are, the; but no one speaks like this
- For new learners, if becomes difficult to tell where the words end/start
- Learning words
- X
- Bb learn new words really quickly: hear it once, knows it
- Research by Saffran, Aslin, and Newport (1996):
- # 1: Created a novel language (w/ it’s own phonetic rules) and infants listened to sentences for two minutes
- Phonetic rules: how likely is “s” coming after “p” (ex. spoon)
- IOW: how likely are certain phonemes in this specific sequence (examine w/in words rather than b/w words?)
- tupiro, golabu, dapiku, tilado
- Phonetic rules: how likely is “s” coming after “p” (ex. spoon)
- # 2: played novel words from that language bb have never heard b4 (those novel words have the same phonological rules)
- Afterwards, infants could already distinguish between words and non-words in the novel language
- # 1: Created a novel language (w/ it’s own phonetic rules) and infants listened to sentences for two minutes
- Statistical learning: Certain sounds (making words) are more likely to occur together and babies are sensitive to those probabilities
- How can you tell if sounds belong to the same words vs 2 different words being pieced together
- IOW: The phonological rules w/in words are regular; while the phonological rules b/w word (from one word to another) do not apply anymore
- This is how we can tell and infants learn them fast
- How can you tell if sounds belong to the same words vs 2 different words being pieced together
- Saffran suggests that bb learn to pick words out of the speech stream by accumulating experience w/ sounds that tend to occur together
*
11
Q
- Why can’t we study speech in other species?
- Why other techniques do we use?
- Brain damages: Why is this not an ideal technique?
- Nroimaging: 2 main issues?
- Listening speech: what brain areas are more active?
- What brain areas are more activated for complex language tasks?
- brain area?
- L vs HR?
*
A
Speech
Speech in the brain
- For visual system: we can conduct studies on animals as there are similarities b/w the visual system for humans and animals
- We can’t do this for speech as other animals do not have similar capabilities
- Use different techniques
- Identify brain areas for speech
- We cannot damage patients’ brain to study this; cannot only study this when bad events (stroke, tumors) happen
- Issue: stroke -> regular damage patterns to blood vessels
- Language deficits symptoms look different for patients who have a stroke vs tumors removed
- The patterns of blood vessels do not say anything specific about whether specific structures for language production are close together in the brain
- Is it because of neighboring structures in the brain or is it about the neighboring structures that are perfused by the same blood vessels
- This is the main difficulty of nropsych studies in patients
- Brain damage follows patterns of blood vessels, not brain function, so difficult to study
- TB: Damage from stroke may cover just part of a particular brain fx -> leaving some of the fx undamaged
- Or damage can cover a wide region that includes some or all of a particular brain fx and part of other fx
- Examine healthy ppl w/ EEG, MEG, PET and fMRI studies: Help to learn about speech processing in the brain
- Correlational only
- Hard to create well-controlled nonspeech stimuli because humans are so good at understanding even severely distorted speech
- Challenges: difficult to find control conditions
- Ex. there’s wind passing by in experiments, ppl interpret this pattern even though it’s just random noise in the lab
- IOQ: perceptual system is so sensitive to speech that is attempts to perceive speech in control conditions
- Listening to speech: Left and right superior temporal lobes are activated more strongly in response to speech than to nonspeech sounds
- X
- Categorical perception tasks: Listeners attempt to discriminate sounds like “bah” and “dah” while having their brain scanned
- As sounds become more complex, they are processed by more anterior and ventral regions of superior temporal cortex – in both hemispheres
- As sounds become more speech-like, more activation in the left brain
- Research indicates that some “speech” areas become active when lip-reading (multisensory aspects of speech processing?)