Lec 2/ TB Ch 5 Flashcards
- 5 main features of speech perception
- sound patterns are converted to neural signals
* Ex. phonemes- /r/ and /l/ are easily noticed by Eng native speakers, difficult for Japanese speakers
- sound patterns are converted to neural signals
- need to be sensitive to subtle cues, but also accommodate individuals diff among talkers
* Ex. we need to distinguish “goat” from “coat”
- need to be sensitive to subtle cues, but also accommodate individuals diff among talkers
- identify subtle boundaries b/w words
* i.e. silent pauses b/w words
- identify subtle boundaries b/w words
- comprehend speech at high speed
* rate of phoneme processing is fast- Causal speech: 10-15 phonemes/s
- Fast speech: 20-30 phonemes/s
- Artificially accelerated speech: 40-50 phonemes/s
- comprehend speech at high speed
- There’s systems that analyze grammar, semantics, motor systems for articulation
dual stream model
- Location of initial stages of speech perception
- 2 streams for further processing
- 1 initial stages of speech perception happens in superior temporal regions
- 2 processing splits into 2 streams
- Ventral stream: processing in other temporal area; comprehension
- I.e. Map sound onto meaning
- Dorsal stream: processing in temporoparietal and frontal areas; auditory-motor transformations
- i.e. Map sound onto action
- Ventral stream: processing in other temporal area; comprehension
Basic Properties of Speech Sounds
- General path - 4 parts
- 2 Parts in larynx
- whisper vs normal speech
- What is pitch based on?
- vocal tract
- location
- 4 chambers
- What influences their resonance range
- Fx of chambers
- fx of articulatory
- 3 articulators
- Evolved part in humans that allow for speech
- Avg # of vowels
- Avg # of consonants
- Coarticulation
- General path for speech: air from lungs → trachea (windpipe) → larynx (voice box) → vocal tract
- Larynx: glottis (opening) + vocal folds (2 flaps of retractable muscle tissue)
- whisper speech: vocal folds spread apart, sounds like hissing (sss)
- normal speech: vocal cords stretched over the glottis, sounds like a buzz (zzz)
- pitch is based on vocal fold vibrating frequency
- Sound is not a pure tone, it has many harmonics
- Vocal tract: abv larynx, 4 chambers
- Pharynx (throat)
- Nasal cavity
- Oral cavity
- Opeaning b/w lips
- Each chamber has unique shape → determine their resonance range
- Each chamber = filter: allow/block specific sound frequencies
- 3 articulators that can modify resonance in each chamber
- Velum (soft palate): opens/closes nasal cavity
- Tongue body, tip, root
- Lips
- Evolved human anatomy for speech
- b4: tongue can’t move in oral cavity → can’t create vowels
- now: larynx shifted down, so the tongue can move vertically and horizontally → can create vowels
- phonemes hv distinctive features
- Ex. diff vowels (max 15 in German; avg = 5)
- phonemes hv distinctive features
- Consonants hv distinctive features
- Ex. max 120 of consonants (avg = 20)
- Language rules vary
- Coarticulation: if phonemes are articulated w/ diff body parts, we smooth them together to make it more efficient (ex. communicate faster)
- Ex. /n/, /d/ are articulated at alveolar bridge; but are articulated at the teeth for “month/width”
- This is b/c we anticipate the th sound
- Ex. /n/, /d/ are articulated at alveolar bridge; but are articulated at the teeth for “month/width”
- first 2 stages of dual stream model
- spectrotemporal analyses
- 2 ways the stages are organized
- Hierarchical
- 3 lvs
- 2 aspects of phonemes
- 2 parts of syllables
- 2 sub-parts in Rime
- Hierarchical
- Monkey tonal screams study
- hypothetical hierarchical neural network for monkey speech perception
- 3 lv
- lv 1: 2 steps
- lv 2: 2 steps
- lv 3: 2. steps
- What is Δt1?
- What type of pathway?
- Location?
- How is it diff from the human one?
Early Cortical Stages of Speech Perception
- primary auditory cortex at Heschl’s gyrus & dorsal superior temporal gyrus conduct spectrotemporal analyses
* spectrotemporal (waves & time) analyses: receive info from thalamus, extract info about the stimuli frequency and what rate the stimuli is in
- primary auditory cortex at Heschl’s gyrus & dorsal superior temporal gyrus conduct spectrotemporal analyses
- send info to the phonological network (STS) to calculate the frequency and rate
- These 2 stages of speech perception are organized hierarchically and bilaterally
Hierarchical Organization
- Speech patterns are complex auditory stimuli; the structure has multiple levels
- Segmental structure lv - phoneme
* Ex. cat has 3 phonemes: /k/, /æ/, and /t/
* Each phoneme 2 aspect acoustic (sound) and articulatory (use specific muscles) aspects
- Segmental structure lv - phoneme
- Syllabic structure - CVC
* CVC structure: consonant-vowel-consonant
* 2 parts- Onset: consont /k/
- Rime: remainder
* 2 sub-parts- Nucleus: vowel /æ/
- Coda: consonant /t/
- Rime: remainder
- Syllabic structure - CVC
- Morphophonological structure - whole thing/word
- Monkey study
- Examine how monkeys perceive speech/calls or tonal scream
- hypothetical hierarchical neural network for monkey speech perception
- Lower order cells: detect specific “frequency modulated (FM) sweeps” at a specific time
* cells detect FM component in the upward sweep @ time 1 (200 ms) and downward sweep @ time 2 (2nd 200ms)
* extract sweeps and send it to mid-lv cells
- Lower order cells: detect specific “frequency modulated (FM) sweeps” at a specific time
- mid-lv cells: (T1 and T2) combine inputs from the lower lv
* detect harmonic patterns in each time frame
- mid-lv cells: (T1 and T2) combine inputs from the lower lv
- High-lv cells: combine inputs from the middle lv
* detect complex auditory stimuli w/ spectrotemporal features in tonal screams
- High-lv cells: combine inputs from the middle lv
- NOTE: the connection from T1 cell to the cell at the highest lv has a delay (Δt1) → This hold up the signal long enough so the inputs from T1 and T2 arrive at the top cell the same time
- Monkey study
- feedforward synaptic pathway in the STG
- The one for human language is more complex: has dorsal and ventral streams
How/steps Auditory info is transformed b4 reaching the cerebral cortex
- 2 steps
- Spiral ganglion
- Hair cells: near base vs apex
- 3 lv of nuclei in brainstem
- 2 properties this ascending pathway maintains
- 2 paths for info processing
- Descending path: 2 parts
- Descending path fx
*
Box 5.2: From Cochlea to Cortex
- Auditory info is transformed b4 reaching the cerebral cortex
- sound is coded as electrical signals in the spiral ganglion
* Spiral ganglion: part of the cochlear in the inner ear
* The sound waves moves across may sensory receptors/hair cells
* Hair cells are organized by frequency- Near base: low frequency sounds
- Near apex: high frequency sounds
- sound is coded as electrical signals in the spiral ganglion
- signals travel through trochlear nerve to brainstem; then pass thru 3 lv of nuclei
* Superior olivary nucleus
* Lateral lemniscus
* Primary auditory cortex in Heschl’s gyrus (aka transverse superior temporal gyrus)
- signals travel through trochlear nerve to brainstem; then pass thru 3 lv of nuclei
- EEG studies: this ascending path maintains spectral and temporal properties of sound
- Info processing is bottom-up (ascending path) and top-down (descending path)
- Top-down/Descending pathway: cog states (ex. selective attention) reacts to reaches spiral ganglion;
- regulate early stages of auditory perception in top-down way
- Top-down/Descending pathway: cog states (ex. selective attention) reacts to reaches spiral ganglion;
- Rs: auditory brainstem is neither passive or hardwired; it can be modified
- Ex. dev musical skills, learn tonal language
- define bilateral organization
- Which stages in Dual Stream Model is organized this way?
- Which hemispheres used for speech perception?
- Damage to which region impairs speech perception most?
- Studies
- 1 Binder et al 2000
- Showed ppl 5 types of auditory stimuli: 5 stimuli?
- What does the result show?
- 3 main results
- Limitation
- 2 explanation
- What do ST areas do?
- 2 Okada and Hickok 2006 - fMRI study: Bilateral STS areas are sensitive to phonological neighbourhood density
- phonological neighbourhood
- Results
- Lesion studies
- 3 Hikok et al 2008
- Showed LH & RH are each capable of speech perception
- 2 part method
- Wada procedure
- Task
- Results
- 4 Word deafness study implication
- 1 Binder et al 2000
- 2 MP of all studies
- Role of LH
- Role of RH
Bilateral Organization = Both Hemispheres Contribute to Speech Perception
- Recall: Early cortical stages of speech perception (aka 1st 2 stages of Dual Stream Model) is organized bilaterally
- LH and RH: activated by speech stimuli, can perceive speech
- Bilateral damage to STG/STS impair speech perception
- Binder et al 2000
-
Showed ppl 5 types of auditory stimuli
- Unstructured noise
- FM tones
- Words (ex. desk, fork, stream)
- Pronounceable pseudowords (ex. sked, korf, reemst)
- Reversed words
- Patterns of activation:
- Auditory area on dorsal plane of STG responded more to tones > noise
- Region in mid-lateral STG responded to speech > tones > noise
- Middle sector of STS responded more to speech > tones
- This support the hypothesis that early cortical stages of speech perception are organized hierarchically and bilaterally
- dorsal part of STG: conduct spectrotemporal analyses
- lateral part of STG and middle part of STS: detect complex feature combinations in human speech
- These areas are sequentially engaged in the LH
- Limitation: STS is activated for words, pseudowords, and reversed words
- 2 explanations
- Pseudowords used the same regions as real words b/c they share phoneme and syllable features
- All 3 stimuli (reversed, real, and pseudowords) have equivalent acoustic complexity
-
Showed ppl 5 types of auditory stimuli
- Hickok and Poeppel 2007
- Many fMRI studies agree that parts of lateral STG and middle STS in LH and RH contribute more to perceptual analysis of speech than non-speech info
- Okada and Hickok 2006
- Bilateral STS areas are sensitive to phonological neighbourhood density
- Some words hv many similar sounds words
- Ex. cat belongs to the neighbourhood including: cab, cad, calf, cash, etc
- Other words have few associates (ex. spinach, obtuse)
- Words from high density neighbourhood activate more phonological competitors
- fMRI results: STS was engaged bilaterally by high density words more extensively
- IOW: STS represents the phonological competitors that are activated during auditory word recognition
-
Neuropsychology lesion studies
- Hikok et al 2008
- Showed both LH and RH are independently capable of speech perception
- Method:
- Wada procedure: Inject sodium amobarbitol to temporarily shut down an entire hemisphere (20 patients)
- Task
* Listen to a word (ex. bear), then point to matching picture on the sheet
* Other distractors- Phenomic distractor (ex. a pear)
- Semantic distractor (ex. moose)
- Unrelated picture (ex. grapes)
- Task
- Results:
- there were more phenomics based errors in the LH anaesthesia
- But it is still low (10%)
- IOW: when LH is offline temp, RH can still perceive speech well
- This supports dual stream model: early cortical stages of speech perception are bilaterally organized
- Hikok et al 2008
-
Neuropsychology lesion studies
- Word deafness studies
- Supports Dual Stream Model
- Word deafness: neuro disorder where most hearing and non-speech sounds are intact, but speech perception is disrupted
- It is a continuum of severity
- Most cases (70%) have symmetric bilateral lesions that affect the middle and posterior parts of STG, but not the HG
- IOW: need to damage higher-order auditory systems in both hemispheres to cause the disorder
- Overall, studies show LH and RH each play a role in speech perception
- But there is some functional asymmetry
- L: dominant for integrating signals for rapidly changing phonemes
- R: dominant to integrate signals for longer syllables
The Two Hemispheres Have Partially Different Temporal Windows for Speech Perception
- 2 types of phonological info
- “asymmetric sampling in time” hypothesis
- 2 parts
- primary auditory cortex fx
- higher order auditory fx
- LH vs RH
- 2 parts
- Liebenthal et al 1995 - Compare how ppl discriminate familiar phonemics sounds w/ nonphonemic sounds equal in complexity
- Phonemic discrimination task stimuli
- Nonphonemic discrimination task stimuli
- Overall method
- Results
- Categorical perception
- Phonemic discrimination task vs nonphonemic
- fMRI result/overall
- Abrams et al 2008 - Used EEG to record LH and RH temporal patterns when children listened passively to sentence in 3 modes of speech
- 3 modes
- Speech envelope
- Results about RH
- Ex. some phonological info occur quickly (i.e. 50 ms)
- Ex. contrast b/w /k/, /g/
- Ex. contrast b/w pest and pets
- Some occur more slowly (200 ms)
- Ex. Cues for syllabus stress
- “asymmetric sampling in time” hypothesis
- Proposed by Poeppel
- Primary auditory cortex in both hemispheres create symmetric representations of auditory signals
- The higher order auditory cortex in both hemispheres filter them through diff temporal window that produce asymmetric representations in “chunks”
* LH: more sensitive to auditory variation around 50ms, to detect tiny distinctions
* RH: more sensitive to longer auditory pattern around 200ms, to extract info at syllables
- The higher order auditory cortex in both hemispheres filter them through diff temporal window that produce asymmetric representations in “chunks”
- Liebenthal et al 1995
- Compare how ppl discriminate familiar phonemics sounds w/ nonphonemic sounds equal in complexity
- Phonemic discrimination task stimuli
- Created 8 stimuli that are CV syllables (#1-8)
- A continuum of /ba/ to /da/
- Created 8 stimuli that are CV syllables (#1-8)
- Nonphonemic discrimination task stimuli
- Alter the sounds in Phonemic discrimination task so the stimuli are not sounds that are naturally produced by human vocal tract
- Method
- Ppl were scanned while performing a task; they need to determine if the given sound X is identical to the first or second sound in a prev presented pair (ex. 2&4, 4&6, 6&8)
- Results
-
There is categorical perception for phonemic continuum but not the nonphonemic continuum
- Categorical perception: perceive 2 speech sounds that belong to the same category as more similar to each other (ex. 2 instances of /b/) compared to speech sounds from different categories (ex. /b/ vs /d/)
- But the objective acoustic differences (i.e. formants/peaks of acoustic energy in vocal tract frequency) are the same
- In particular, in the phonemic continuum
- discrimination b/w 4&6 is good
- This is b/c the 2 tokens are located in the sharp boundary b/w /ba/ and /da/ categories
- The discrimination b/w 2&4 and 6&8 were poor
- This is b/c the 2&4 are located in the /ba/ category
- 6&8 are located in the /da/ category
- discrimination b/w 4&6 is good
- In nonphonemic continuum, there is difference in performance; this suggest there is no category boundary detected
- fMRI results
- all sounds engaged dorsal STG bilaterally and to equal degrees
- phonemic stimuli engaged the middle STS in LH more than nonphonemic stimuli
- no areas were activated more by nonphonemic than phonemic stimuli
- STS activation associated w/ the contrast b/w phonemic and nonphonemic stimuli was more active on the left
-
There is categorical perception for phonemic continuum but not the nonphonemic continuum
- Conclusion
- “asymmetric sampling in time” hypothesis states
- Higher order process: LH: more sensitive to auditory variation around 50ms, to detect tiny distinctions
- Study results support theory: Discriminating sounds along phonetic /ba/-/da/ continuum activate left STS more than right STS
- asymmetric sampling in time” hypothesis states
- bilateral Primary auditory cortexes create symmetric representations of auditory signals
- Study challenges this: left bias to STS w/ phonemic sounds was bigger than that w/ nonphonemic sounds
- explanation: phonemic sounds were more familiar; nonphonemic = unfamiliar
- Maybe the left temporal lobe is also responsible for categorical perception (familiar vs unfamiliar)
- “asymmetric sampling in time” hypothesis states
- Overall, study shows LH prefer short auditory signals, and process categorically
- Posterior portion of left STS contribute to speech perception by using auditory info and visual info (from lip and tongue)
- Asymmetric sampling in time hypothesis for the RH
- Abrams et al 2008
- Used electrophysiology to record temporal patterns of LH and RH on children
- Children listened passively to sentence in 3 modes of speech
- Ex. the young boy left home
- Clear: enhanced diction, intelligible
- Conversational: natural informal manner
- Compressed: 2x rate
- Speech envelope: slow temporal variation (i.e. little diff across time) in acoustic energy in speech; this reflects syllable pattern
- Results:
- There are 3 electrodes on left temporal lobe; 3 on right
- 3 on the right are more reliable at tracking the speech envelope in all 3 conditions
- Also showed larger responses
- Red lines (3 electrodes on RH) conform to the speech envelope line more than the blue lines (3 electrodes on LH)
- The ERPs recorded from RH correlates better w/ the speech envelope compared to those from the LH
- Conclusion:
- Results suggest the RH is dominant for processing speech on a slow time scale for syllable patterns
-
IOW: supports “asymmetric sampling in time” hypothesis
- RH: more sensitive to longer auditory pattern around 200ms, to extract info at syllables
Summary
- Early cortical stages of speech perception starts in HG and project into STG and STS
- Auditory processing here is hierarchically organized
- Lower lv conduct elementary spectrotemporal analysis
- Higher lv: extract more complex phonological patterns
- Also, it is bilaterally organized
- 2 hemispheres have diff fx contributions:
- LH: detect + categorize rapidly changing phonemic features (50ms)
- RH: deal w/ longer syllabic info (200 ms)
- 2 hemispheres have diff fx contributions:
- Beauchamp et al 2010
- Examine if the left posterior STS create the McGurk effect
- Used fMRI and TMS
- Stage 1 – fMRI
- Measure ppl brain activity in 2 conditions
- Listen to spoken words and watch faces produce words
- Only watch faces produce words
- Analysis results
- Region in left posterior STS respond to both auditory and visual speech
- Measure ppl brain activity in 2 conditions
- Stage 2: TMS
- Stimulate the centre of STS, and the control site (dorsal & posterior) in 2 conditions
- McGurk stimuli w/ male voice and face
- McGurk stimuli w/ female voice and face
- Analysis results:
- TMS delivered to STS reduced the chance of fusing auditory and visual signals of McGurk stimuli
- TMS delivered to the control site did not alter the perception
- Ppl report suggest auditory input dominated visual input (heard /ba/ during McGurk effect)
- TMS only disrupted McGurk effect when it was delivered to the STS b/w -100 ms (100 ms v4 showing McGurk stimuli) to 100 ms
- Conclusion: left posterior STS is responsible for auditory-visual integration during speech perception
- Stimulate the centre of STS, and the control site (dorsal & posterior) in 2 conditions
- Why do most ppl perceive a blend of the syllable /da/ in McGurk effect?
- 2 streams in visual processing
- Double dissociation showing 2 visual streams can be selectively impaired
- Effects when “what stream” is damaged
- Effects when “how stream” is damaged
- Double dissociation & speech processing
- 2 types of impaired abilities
- transcortical sensory aphasia
- conduction aphasia
- What does this suggest about the dual stream model?
- Double dissociation for auditory monitoring & comprehension
- Auditory comprehension task
- Auditory monitoring task
- Miceli et al 1980 - Gave auditory comprehension and discrimination/monitoring tasks to aphasia patients
- auditory comprehension task: 6 pics
- Result
Box 5.3: The Neural Substrates of Auditory–Visual Integration During Speech Perception: A Combined fMRI and TMS Study of the McGurk Effect
- McGurk effect: illusion in face-to-face speech perception; brain fuses auditory and visual signals
- Method: Present the audio recording of syllable /ba/ w/ video recording of face/mouth pronouncing /ga/
- Result: most ppl perceive a blend, the syllable /da/
- Explanation: the brain integrate 2 competing sensory signals by adopting an intermediate interpretation
- Alveolar /da/ is midway b/w labial /ba/ (sound) and velar/ ga/ (visual)
A Double Dissociation Between Comprehension and Repetition: Initial Evidence for Separate Streams of Speech Processing
- Visual processing occurs at occipital lobe, then it splits into 2 channels
- Channel 1: enters the ventral temporal cortex
- Aka “what” path; provide info on shape, color, texture to recognize the object
- Channel 2: runs dorsally, enter superior parietal cortex to the premotor cortex
- “how” path: responsible for visual-motor transformations, help coordinate bodily interaction w/ objects
- Ex. reach out and grasp apple
- Channel 1: enters the ventral temporal cortex
- Evidence
- Study: double dissociation
- 2 visual streams can be selectively impaired
- Damage to “what” path disrupt the ability to perceive and identify visually presented objects
- Ex. Patient DF cannot say if the pencil is oriented vertical or horizontally
- But she can reach out and grasp it
- Damage to “how” stream disrupt the ability to act appropriately on visually presented objects; but you can recognize them
- Ex. patient w/ optic ataxia
- They aim at the wrong direction to reach and grasp objects
- But they can recognize the object perfectly
- double dissociation cases in speech processing
- Ex. focal brain damage can selectively impair comprehension (knowing “what is said/content) or repetition (knowing how it is said/ vocal action)
- Ex. patient w/ transcortical sensory aphasia: can’t understand meanings in words and sentences; but can perfectly repeat words and sentences
- Ex. patients w/ conduction aphasia: understand perfectly; terribly at repetition
- This suggest that after early cortical stages of speech perception, info is further processed in 2 separate streams
- Route 1: link phonological representations w/ lexical semantic system
- Route 2: link phonological rep w/ motor articulatory system
- IOW: Dual Stream Model
- Ex. focal brain damage can selectively impair comprehension (knowing “what is said/content) or repetition (knowing how it is said/ vocal action)
double dissociation studies for auditory comprehension tasks and auditory monitoring task
- Auditory comprehension task: word-pic matching
- Auditory monitoring task: discriminate and identify phoneme
- Ex. Miceli et al 1980
- Gave auditory comprehension and monitoring tasks to 70 aphasia patients
- Comprehension task: match words/pics
- 6 pictures:
- the target
- semantic related distractors
- phonologic related distractor
- 3 unrelated distractors
- 6 pictures:
- Discrimination/monitoring task: make same-diff judgements on pairs of syllables
- Set: prin,trin,krin,brin,drin,grin
- Comprehension task: match words/pics
- Results: double dissociation
- Some were perfect at both tasks
- Some sucked at both tasks
- Some were good at comprehension task; shitted discrimination task
- Some were good at discrimination task; shitted at comprehension task
- Gave auditory comprehension and monitoring tasks to 70 aphasia patients
- Summary: Some ppl do well in auditory monitoring tasks, but shit at auditory comprehension task
- Ex. matches “cat” w/ a “cat” picture ; can’t tell apart “cat” vs “cot”
- Ventral stream - aka
- fx
- 2 functional-anatomical components
- other connection?
- location
- LH bias?
- Lexical interface
- fx
- 2 views for mapping process
- Lemma
- Evidence: lexical interface in pMTG and pITG w/ LH bias
- Patients w/ Wernicke aphasia
- Patients w/ transcortical sensory aphasia
- Boatman et al 2000 - Examine how electrical interferences at diff sites influenced task performance
- 2 main results
- 2 modifications to model
- Dronkers et al 2004 - Showed lexical interface depends largely on left pMTG
- What is CYCLE-R?
- Results - role of pMTG
- patient w/ corpus callosum severed - they can still understand some words
- What does this suggest?
- Dronkers et al 2004 - Showed lexical interface depends largely on left pMTG
- The combinatorial Network
- What is it?
- 2 things it implements
- Evidence
- Rogalsky and Hickok 2009 - examine if the a portion of the left lateral ATL responses to compositional semantics; while the other responds to syntactic structure
- Methods
- 2 tasks
- Key finding
- 2 suggestions
- Rogalsky and Hickok 2009 - examine if the a portion of the left lateral ATL responses to compositional semantics; while the other responds to syntactic structure
- Summary
- Ventral path 2 parts
- Lexical
- fx
- location
- Combinatorial
- fx
- location
The Ventral “What” Stream: From Sound to Meaning
- The “what” stream fx
- map sound to meaning
- form integrated meanings of complex speech (ex. phrase, sentence)
- 2 functional-anatomical components
- Lexical interface:
- Connected to phonological network
- Location: posterior MTG, ITS
- LH bias
- Combinational network
- Location: anterior MTG, ITS
- LH dominance
- Lexical interface:
- The Lexical Interface
- map phonological structures (from phonological network) and semantic structure
- IOW: does not store meanings of words
- mapping process - 2 views
- One-stage view
* word meaning (ex. concept of cat) projects to phonological representation (ex. /kæt/)
* then to a more specific phonological representation that spells things out (ex. /k/, /æ/, /t/)
- One-stage view
- Two-stage view
* Additional lv: lemma
* Lemma: indicates grammar category (ex. cat is a type of noun); bridge b/w semantics and phonology,
- Two-stage view
- Evidence: lexical interface in pMTG and pITG w/ LH bias
- Study: Patients w/ Wernicke aphasia w/ the worst comprehension deficits
- They tend to have lesions on the left pMTG
- Study: Patients w/ transcortical sensory aphasia
- Have damaged pMTG and pITG
- Their understanding of spoken words, phrases, and sentences is severely impaired
- MP: These impairments affect the neural mechanisms that map sound to meaning
- Boatman et al 2000
- 6 patients
- Dr implant electrode array in left lateral cortex
- Examine how direct electrical interferences at diff sites influenced performance on 7 tasks
- Method
- Send electrical current b/w 2 adjacent electrodes, for 5 s
* Tested 81 electrode pairs per patient
- Send electrical current b/w 2 adjacent electrodes, for 5 s
- Results
- Stimulating 29/81 electrode pairs triggered ST transcortical sensory aphasia
- Most of the electrodes were in pMTG
- Also interfere w/ auditory comprehension (can hear, don’t understand)
- Interfere w/ oral reading
- Ex. phonemic paraphasia (say orly, not nearly)
- Ex. semantic substitutions (say stick, not pencil)
- Stimulating 19/29 critical sites -> Impair oral object naming
- Stimulating 10/29 critical sites -> no effect
- IOW: semantic knowledge is not affected
- Stimulating 29/81 electrode pairs triggered ST transcortical sensory aphasia
- This showed disruption can happen b/w LH phonology and lexical semantic processing in patients
- Supports the Dual Stream model: The ventral stream has lexical interface, relays b/w sound and meaning during speech comprehension
- Dronkers et al 2004
- Showed lexical interface depends largely on left pMTG
- 65 chronic stroke patients w/ LH lesion
- Method: Did Curtiss-Yamada Comprehensive Language Evaluation -receptive (CYCLE-R)
- 11 subtests on sentience-pic matching
- Simple to complex sentences
- Simple: ex the clown has a balloon
- Complex: ex the girl is kissing the boy that the clown is hugging
- Analysed performance and lesions site w/ voxel-based lesion-symptom mapping
- Results: the worse deficits were strongly associated w/ left pMTG damage
- Patients w/ damage to this area did normal in tasks using the simplest sentence type, but shitted on the others
- They failed 3 comprehension tasks in Western Aphasia battery
- Conclusion: left pMTG plays a key role in understanding words
- But the data can’t tell if pMTG contributes to conceptual-semantic or phonological aspects, or linking b/w form and concept
- Boatmann et al 2000 (see abv) showed pMTG links b/w form and concept
- Dronkers et al 2004
- Study: patient w/ corpus callosum severed – they can understand some words
- Suggest there may be bilateral capability in lexical and semantic access
- The combinatorial Network:
- the lexical interface maps sound to meaning → then sends it to combinatorial network @ the lateral ATL (anterior temporal lobe), LH bias
- draw on semantic and syntactic info to construct the integrated meanings of phrases and sentences
- Study: Patients w/ Wernicke aphasia w/ the worst comprehension deficits
- Evidence
- fMRI, PET study: left lateral ATL respond more to intelligible, correct sentences than unintelligible multi-words
- Rogalsky and Hickok 2009
- some studies think a part of left lateral ATL is more sensitive to compositional semantics; other part = syntactic structure → examine this
- fMRI study
- Identified region of interest (ROI) in left ATL
* Ppl passively listen to nouns, record which voxels were active
- Identified region of interest (ROI) in left ATL
- Ppl did 2 tasks
* Task 1. Ppl listened to sentences, then pressed a button when they detect a semantic anomaly- Ex. the bb spilled some carpet on the milk
* Task 2: ppl listened to sentences and pressed a button when they detect a syntactic anomaly - Ex. the plumber w/ the glasses were installing the sink
- Ex. the bb spilled some carpet on the milk
- Ppl did 2 tasks
- Rs discarded anomalous sentences; only looked at the normal sentences; this ensures the ROI activity differences are not due to diff in sentences
- Finding: ATL ROI was equally sensitive to semantic and syntactic task
- This suggests the left lateral ATL implements combinatorial network
- Suggests the semantic and syntactic features are processed in an interactive, not segregated manner
Summary
- Ventral path in dual stream model: speech comprehension
- 2 parts
- Lexical interface: maps sound to meaning
* Location: pMTG and pITG bilaterally, LH bias
- Lexical interface: maps sound to meaning
- Combinatorial network: use syntactic and semantic info, to integrate the phrases/sentence meaning
* Location: ATL, LH bias
- Combinatorial network: use syntactic and semantic info, to integrate the phrases/sentence meaning
- 2 parts
- 2 modifications to model
- ATL maybe a LT storage for words & connects features to the main content
* Ex. connect the visual image and function to the word “spoon”
* This info is then sent to the combinatorial network in ATL
- ATL maybe a LT storage for words & connects features to the main content
- The ventral stream operates w/ the dorsal stream to work out morphology and syntax
- Dorsal pathway aka?
- 4 fx
- 2 components
- connections?
- Location
- The Sensorimotor Interface
- left Planum temporale (PT) aka?
- fx
- Connectiions
- When is Spt active?
- Evidence
- Hicktok et al 2003 - ppl did the “speech” condition task
- Speech condition task
- Results
- Spt area
- Anterior STG
- what does this suggest
- Part 2
- Music condition task
- Spt and STG results
- What does this suggest?
- Pa and Hickok 2008 - fMRI study showed that Spt only regulates vocal tract
- Specific population?
- Music condition task
- Play condition task
- Main finding
- Aphasia studies
- What causes conduction aphasia?
- Conduction aphasia
- How does conduction aphasia affect dual stream model?
- Buchsbaum et al 2011 - overlaid the lesions sites, added more data from patients w/ conduction aphasia
- Main finding
- logopenic progressive aphasia
- Hicktok et al 2003 - ppl did the “speech” condition task
-
Articulatory network
- 4 locations
- 2 fx
- Auditory-verbal STM
- Digit-span task
- Method
- When STM does it engages?
- auditory-verbal STM process - 2 steps
- frontal articulatory network
- fMRI studies: when is frontal articulatory network activated
- 4 fx of motor system
- Prev “double dissociation b/w comprehension and monitoring”
- 2 main results
- What type of aphasia these patients hv?
- What type of lesions these patients hv?
- 2 main reasons that explain results
- TMS study: TMS lips and tongue in left PMC to see if this enhance certain fx
- 2 Hypothesis
- Key result
- What does this suggest?
- Pulvermuller et al 2006 - measure motor brain area activity in when ppl listen to passive speech
- Results
- 2 problems from all these studies
- Hickok et al defence to 2nd problem
- Summary
- Dorsal route main fx
- 3 main steps
- 3 main fx of articulatory network
- Dorsal pathway = “how” system
- Dorsal stream fx
- It maps sounds onto action (how it is articulated w/ muscles)
- help learn language by controlling muscles to imitate speech patterns
- foundation of phonological loop (aka auditory-verbal STM)
* Memory is kept alive by covert repetition
- foundation of phonological loop (aka auditory-verbal STM)
- Helps speech perception
- 2 main fx-anatomical components
- Sensorimotor interface:
* Connects to phonological network and spectrotemporal analysis
* parietal-temporal Spt; LH bias
- Sensorimotor interface:
- Articulatory network
* Connects to sensorimotor interface
* Located in pIFG, anterior insula; LH bias
- Articulatory network
- The Sensorimotor Interface
- Located in left Planum temporale (PT), aka area Spt (Area Spt: sylvian parietal-temporal)
- a sensorimotor integration system that uses auditory info to help guide movement of vocal tract
- Spt connects the word’s “sound image” in middle STS (i.e. phonological network) w/ word’s “motor image” in posterior FL (i.e. articulatory network)
- fMRI Study: Spt is active in speech perception, speech production (covert & overt)
- Hicktok et al 2003
- Method: “speech” condition task
- Ppl heard 3-s meaningless sentence
* Real nouns and verbs were replaced w/ pseudowords
- Ppl heard 3-s meaningless sentence
- Ppl covertly rehearsed the sentences for 15 s
- Ppl heard another 3s meaningless sentence
- Ppl rested for 15s
- Results
- Spt area was activated during 2 auditory stimulation phases, and covert rehearsal phase
- It dropped to baseline during rest phase
- Anterior part of dorsal STG were activated in 2 auditory stimulant phases, but not in rehearsal phase
- IOW: the cortex help speech perception, but are not part of sensorimotor interface
- Method: “speech” condition task
- Part 2 of study
- Examine if Spt area contributes to sensorimotor coordination for other vocal sounds/actions
- Method w/ “music condition”
- Ppl heard a 3 s unfamiliar melody
- Ppl hummed melody for 15s
- Ppl heard another 3s unfamiliar melody
- Ppl rested for 15s
- Results: similar to speech condition
- Spt and anterior STG activated
- This suggest that Spt is a sensorimotor integration system that uses phonological material and “doable” sounds
- Pa and Hickok 2008
- fMRI study showed that Spt only regulates vocal tract
- Method – skilled pianists do music condition and play condition tasks
- Music condition: similar to music condition procedure above
- Ppl heard a 3 s unfamiliar melody
- *2. Ppl hummed melody for 15s
- Ppl heard another 3s unfamiliar melody
- Ppl rested for 15s
- play condition
- Ppl heard a 3 s unfamiliar melody
- *2. Ppl imagined playing the melody on a keyboard
- Ppl heard another 3s unfamiliar melody
- Ppl rested for 15s
- Results
- Music condition: more activation in area Spt in auditory stimuli and covert rehearsal phases, but not in rest phase
- Play condition: more activation in anterior intraparietal sulcus in auditory stimulation and cover rehearsal phase, but not in rest phase
- This aligns w/ other findings
- Anterior intraparietal sulcus is a sensorimotor interface for perceptual guidance actions (ex. play piano)
- This aligns w/ other findings
- MP: Area Spt maps sounds and actions for vocal tract only
- Aphasia studies
- Damage to left supramarginal gyrus and area Spt leads to conduction aphasia
- Conduction aphasia: language comprehension intact; language production is impaired
- Specifically Distorted by phonemic paraphasia (ex, say tephelon, not telephone), repetition is impaired
- Conduction aphasia: language comprehension intact; language production is impaired
- For Dual Stream model, this aphasia damages sensorimotor interface
- Comprehension is preserved as there is no lesion on ventral stream
- There’s phonemic paraphasia (esp for long, complex, low f words)
- This is b/c the relay station is damaged, and the auditory info is stuck, can’t help guide movement of vocal tract
- Damage to left supramarginal gyrus and area Spt leads to conduction aphasia
- Buchsbaum et al 2011
- Part 1: rs overlaid lesions sites of 15 patients w/ conduction aphasia
- Results: common damaged region in 85% of cases is in the left temporoparietal area, incl area Spt
- Part 2: Combined imagine data from 105 healthy ppl from studies similar to Hickok (see abv)
- Combined analysis showed 50% of ppl showed sig activation in area Spt during auditory stimulation (encoding) and covert rehearsal phases of the tasks
- There’s only 50% b/c there are indiv diff in neuroanatomy in PT and area Spt
- Part 3: rs superimposed lesion data and fMRI data
- There’s 85% lesion overlap and sig activation during encoding and rehearsal task at the area Spt
- Part 1: rs overlaid lesions sites of 15 patients w/ conduction aphasia
- These findings support the hypothesis conduction aphasia is an impairment to the sensorimotor interface in dual stream model
- NOTE: logopenic progressive aphasia is associated w/ gradual atrophy in area Spt
- The Articulatory Network
- Location: left pIFG (Broca’s area), premotor and PMC to control vocal apparatus and anterior insula
- fx: auditory-verbal STM, and some speech perception
- Auditory-verbal STM: aka phonological loop, keep phonological info in mind for a short time, used in covert rehearsal
- Digit-span task: lab test that determine the longest string of random digits a person can repeat correctly
- Most ppl: 7 items (ex. telephone #: 7 digits)
- Used when you need to remember driving directions
- This uses auditory-verbal STM when you rehearse it covertly
- Neural substrates of auditory-verbal STM
- speech we perceive activate phonological rep/ auditory verbal STM in STS bilaterally
- These phonological reps are maintained in the dorsal stream by frontal articulatory network
* frontal articulatory network: controls and refreshes the phonological rep via the sensorimotor interface (in Spt area)
- These phonological reps are maintained in the dorsal stream by frontal articulatory network
- fMRI studies: frontal articulatory network were activated during covert rehearsal
- Evidence: motor system help perceive, recog, program and execute action
- Studies: prev “double dissociation b/w comprehension and monitoring”
- Some brain damaged patients do well on auditory comprehensions tasks (ex. word-pic matching: match the word “cat” w/ cat pic)
- But they suck at auditory monitoring tasks
- (ex. phoneme discrimination and identification – can’t determine if “cat” and “cot” are diff words; OR if “cat” has the vowel /æ/)
- They tend to hv Broca’s aphasia or conduction aphasia; lesions are in left frontal of left frontoparietal
- IOW: these findings show at that monitoring task rely on dorsal stream
- i.e. need to pay attention to phonological structure of utterances
- 2 reasons
- They need auditory-verbal STM to keep relevant phonological rep online to make discrimination/identification (monitoring task)
- These tasks involve segmented syllables in the phonemes; this needs the articulatory network
- TMS studies show that auditory monitoring tasks use articulatory network
- TMS targeted Broca’s area, primary motor cortex/PMC, premotor cortex
- D’Ausilio et al 2009
- Method:
- Identify the areas for controlling the lips and tongue in left PMC in ppl
* NOTE: lip area is superior to tongue area as in the homunculus
* They located it using the coordinates of the peak activations in fMRI
- Identify the areas for controlling the lips and tongue in left PMC in ppl
- Ppl did task
* On each trial, present ¼ speech sounds- 2 produced w/ lips (/bæ/ or /pæ/)
- 2 produced w/ tongue (/dæ/ or /tæ/)
* Ppl were asked to identify each sounds by pressing ¼ buttons
* To avoid ceiling effects (not hard enough), sounds were embedded in 500 ms of white noise -> correct response for 75%
* For 60/80 trials, 2 TMS pulses were delivered to lip or tongue area - TMS 1 at 100 ms after noise onset
- TMS 2 at 150 ms after noise onset (or 50 ms b4 consonant is presented)
- Ppl did task
- Assumption: pulses enhance activity in stimulated areas
- so, rs predict stimulating the lip area will improve telling apart the labial sounds (/bæ/ and /pæ/)
- Stimulating tongue area improve telling apart dental sounds /dæ/ and /tæ/
- Results support predictions
- In the lip area,
- trials w/ TMS led to faster RT (lower than 100) to recognize lip-produced sounds compared to no TMS
- Had slower RT (100+) to recognise tongue=produced sounds
- In tongue area: opposite
- Trials w/ TMS -> faster RT to recognize tongue-produced sounds
- Slower RT to recog lip-produced sounds
- In the lip area,
- Stimulation of motor area (tongue or lip) help identify speech sounds produced by the respective area; inhibits identification produced from other area
- This supports articulatory network help us pay attention to phonological makeup of perceived speech
- Articulatory network is also engaged when ppl listen passively to utterances
- Pulvermuller et al 2006
- Measure motor activity in ppl’s brain while doing 3 tasks
- Lip and tongue movements
- Silent production of lip-related (/pa/) and tongue-related (/ta/) sounds
- Passive perception of lip-related (/pa/) and tongue-related (/ta/) sounds
- Findings:
- Some motor areas engaged during in all 3 tasks
- MP: Articulatory network is also engaged when ppl listen passively to utterances
- Problem:
- studies above did not show motor response to speech are sig diff from those to other sounds
* Ex. Watkin et al 2003- Report motor activations triggered by speech sounds are not sig greater than those triggered by nonverbal sounds (ex. car engine, braking glass)
- studies above did not show motor response to speech are sig diff from those to other sounds
- Articulatory network may modulate passive perception of speech, but may not help comprehension
* Hickok et al defence- There’s evidence
- Large left frontal lesions reduce speech production
* But there is only 8% error rate to discriminate phonemic pairs (ex mathc “bear” to pic)
- Large left frontal lesions reduce speech production
- Bb has speech perception b/w speech production
- Articulatory network may modulate passive perception of speech, but may not help comprehension
Summary
- Dorsal route: map speech perception on speech production
- Auditory rep in dorsal STG and mid STS (spectrotemp analysis & phonological network) are transmitted to area Spt (sensorimotor interface)
- Area Spt transform input; then send signals to articulatory network at left posterior FL
- Articulatory network
* Helps acquire auditory based speech-motor patterns
* Helps auditory-verbal STM
* Help perceptual processing in speech
- Articulatory network
- Scott’s model of turn taking
Box 5.4: Might Articulatory Activation During Speech Perception Facilitate Turn-Taking?
- Scott et al 2009
- Scott’s model of turn taking/ Hypothesis: dorsal pathway may help speech perception; it tracks rhythm and rate of talkers, which helps smooth turn-taking
- This H is consistent w/ findings
- Ex. during convos, ppl involuntarily align their conceptual and syntactic structure, breathing and pronunciation
- Turn-taking happen rapidly (1/2 s)
Lec
- MRI - fx
- pro
- con
- How do we combine MRI + Neuropsychology methods?
- fMRI - fx
- pro
- con
- Why is temp resolution shit? - 2 reasons
- How do we combine fMRI + cross-species comparisons?
- Dogs: LH fx, RH fx
- How does praise their trigger reward system?
- What is the cross-species similarity b/w dog and humans?
- What does this suggest about the evolution for language acquisition?
MRI and fMRI
- Magnetic Resonance Imaging (MRI): images of brain structure (static pic)
- Highly popular technique - Increasing # of articles using MRI
- Pros: Excellent Spatial Resolution
- Cons: Show structure, not how it functions
- We can combine MRI + Neuropsychology methods
- Back then, we look at patients w/ Broca/Wernicke’s aphasia post mortem
- MRI allows as to see the damage in vivo
- functional Magnetic Resonance Imaging (fMRI): try to correlate images to neural activity (e.g., increased blood flow to a brain region responsible for a language ability)
- Advantages: Excellent Spatial Resolution
- Disadvantages: Poor temporal resolution, costly
- Why is temp resolution shit?
- 1 We are not directly measuring neural activity; we are only measuring the correlate (i.e. blood flow)
- 2 When a brain area is processing smth intensely, neuron fires which requires energy and resources; after that, you need to replenish lost resources
- It takes time to fire, send signal for more blood flow and oxygen in the brain regions → IOW: signal lags
- We can combine fMRI and cross-species comparisons
- Similar to humans, dogs use LH to process speech and RH to process intonation
- Praise trigger reward system if the word and intonation match
- There are cross-species similarities for language abilities (i.e. LH bias)
- This suggest language abilities existed some time as it exists in species that are very difference from us
- Lec
- 2 evo approaches
- How does studying genetically modified “dyslexic” mice help us understand language?
- PET - 2 step process
Evolutionary Approaches / Cross-species Comparisons
- 1 evolutionary trees (ex. dogs and humans) show how different language-related abilities develop.
- 2 we can use invasive investigations on animal models that are not typically examined in humans
- (e.g., Holly Fitch @ UConn - genetically modified “dyslexic” mice; see).
- There’s a gene that contribute to dyslexia in humans; GM mice to have dyslexia
- NOTE: mice don’t read, but they have visual and memory abilities that align w/ some aspects of dyslexia
- Rs examine how this gene enable or disable certain cog fx, and understand how dyslexics suffer from language repairments, and find ways to remediate it
fMRI’s earlier cousin: PET
- PET: Positron Emission Tomography
- Similar principles,
- inject with radioactive tracer; Specifically, radioactive tracer binds to oxygen and releases a positron, the circle coil detector detects the release
- The detector shows which brain location accumulated the most amount of this tracer; This is the proxy for neural activity
*
- The detector shows which brain location accumulated the most amount of this tracer; This is the proxy for neural activity
- Lec
- MEG
- 3 pros
- EEG issue
- MEG
- how does it have good spatial and temp resolution
- Study - Examine how monolingual and bilingual bb process spoken language stimuli
- Methods 4 steps
- MEG vs fMRI
- 3 main results
- 2 methods best in spatial and temporal resolution
- Major difference
- 3 main cons of MEG
MEG/ Magnetoencephalography
- 1,2 Combines the strengths of both MEG and EEG — good temporal and spatial resolution
- 3 It’s one of the best technique, b/c other methods that have good temporal and spatial resolution are invasive
- MEG vs EEG
- EEG detects electrical activity on scalp
- Cons: bones, brain tissue, and skin conducts electricity; so it is difficult to trace where the signal came from
- MEG monitors magnetic activity, which radiates from the source
- Magnetic activity is not affected by the bones, brain tissue, skin like EEG
- Rs can create a math model to find out the source of signal
- IOW: good spatial resolution
- When neurons fire, there’s electrical activity (i.e. EM source). You can detect magnetic activity instantaneously
- IOW: good temporal resolution
- EEG detects electrical activity on scalp
- Study: bilingualism in babies using MEG
- Examine how monolingual and bilingual bb process spoken language stimuli
- Method
- Bring monolingual bb (Eng only) and bilingual bb (Eng and Spanish) to lab
- Digitize the bb’s skull shape, so they can determine the location of the brain matter, which allows them to locate the brain activity
- Bb in MEG room
* MEG: Cooled w/ liquid He, but super quiet
* fMRI: Cooled w/ liquid He, but super noisy
- Bb in MEG room
- Listened to /da/ and /ta/ sounds
* Some were Eng sounds, some Spanish, some common to both languages
- Listened to /da/ and /ta/ sounds
- Results 1:
- Monolingual: specialized to process English only, not Spanish
- Bilingual: specialized to process both Eng and Spanish
- This suggests, by 11 mo, bb’s brains are specialized to process whatever language is present
- Result 2: Compared to monolingual bb, bilingual showed more activity in prefrontal and orbitofrontal cortex
- These areas are associated w/ EF, and our active when bilingual adults are speaking back and forth in 2 languages
- These areas are also active when bilingual bb are listening to a stream of sounds w/ Eng and Spanish sounds
- When bilingual bb are born, they practice switching b/w 2 languages
- This may be related to increased brain activity in areas for EF
-
IOW: by 11 mo (1 yr), bb have sophisticated representation of their own language
- Monolingual bb tune out to non-English features
- Bilingual bb are sensitive to features common to both languages; engage PFC more often
Overall summary of methods and their temp + spatial resolution
- EEG: poor spatial resolution, good temporal resolution
- fMRI: good spatial, shit temporal
- Best in temp and spatial
- IEEG: intercranial EEG
- Remove scalp and skull, stab electrodes in it
- You know where you are working in the brain
- MEG: non-invasive
- IEEG: intercranial EEG
- Why not use MEG?
- Cost
- Very sensitive equipment; need shields
- Can’t pin point deep brain structures (ex. thalamus) but ok for cortical areas
- Why not use MEG?
- Lec
- process of direction cortical stimulation - 2 main steps for epileptic patients
*
Direct Cortical Stimulation
- Feed electrical activity to the brain
- Relatively rare
- open scalp and skull
- Stimulate brain to locate where language is represented and source of epileptic seizures
* We want to avoid operating on the brain area for language (and other vital areas)
* Ex. For some epileptic patients, we need to remove some brain tissue in deep brain structure- To get to the deep brain, we will damage other tissue in the way
- We want to avoid language areas, as language is key in our life
- This method helps locate where language areas is, and help us devise an alternative path to remove the affected brain region as well as bypassing language area,
- Stimulate brain to locate where language is represented and source of epileptic seizures