Wk4b - CI Speech Processing Strategies Flashcards
What is a speech processing strategy?
An algorithm within the CI speech processor that converts sounds picked up by the microphone, into electrical signals sent to the implant electrodes
Speech processing strategies turn a broadband acoustic signal into 12-22 _____ ______ pulse trains
Amplitude modulated - one pulse train for every CI channel/electrode
- Med-EL has the fewest (12 electrodes) and Cochlear Americas has the most (22)
How many accredited CI manufacturers are currently in Canada?
4: Advanced Bionics, Cochlear Americas, Med-EL, Oticon Medical
In an electrode array, we need both positive and negative sources to stimulate. How many of each?
Each electrode driver contains at least one positive and one negative source; there can be many positives, but only one negative
What does the number of electrode drivers tell us?
How many electrodes can fire at once; may vary from 1 (cochlear americas) to 16 (advanced bionics)
What is the maximum rate?
The max pulses per second that these CIs can fire
- Oticon Medical has lowest (19000)
Advanced Bionics has highest (83000)
What does the DSP-unit (of the CI’s external unit) do?
- receives mic input
- extracts features of the sound
- converts features into bitstream
- contains maps (pt specific info)
What does the External Unit (Speech Processor) of the CI consist of?
The DSP unit and Power Amplifier
In CI’s, first the signal is processed through the DSP unit, then the ____ ____, then sent to a __-______, which sends the signal through the skin to a receiver in the internal unit
Power amplifier Radio Frequency (RFF)-Transmitter
What does the internal unit of a CI consist of?
RF-receiver
Hermetically sealed stimulator
Telemetry System
What does the Hermetically Sealed Stimulator of the CI’s Internal Unit do?
- receives power from RF-signal
- decodes RF-bitstream
- conversion to electric currents
What is the function of the telemetry system of the CI’s Internal unit?
To measure impedances and ECAP’s (action potentials)
What does eCAP stand for?
Electric Compound Action Potential
What are Back Telemetry and ECAPs used for?
- to check the status of the internal unit (e.g. voltages)
- to measure and monitor critical info about the electrode-tissue interface (e.g. electrode impedance (non-audible stimulus), field potential, neural responses)
- to conduct neural response telemetry (NRT) (AN response to electric stimulation) (like ABR, but response usually buried in electric artifact)
T/F: It is possible for an audiologist to overwrite a CI’s safety checks
False - the CI performs these automatically and they cannot be overwritten
What are some commonly implemented safety checks in CIs?
- stimulation parameter check
- max charge check
- charge balance check
- parity check to detect bit error from RF-transmission or decoding
What does CIS stand for?
Continuous Interleaved Sampling
- most modern strategies are based on this method
What speech features are important for the speech processor?
Voicing
Pitch
Spectrum/Formants
Pitch is represented by the formant ____. What are the problems with encoding it?
F0
We could use pulse rate, but it is unlikely we would have an electrode so far apically, and pitch alone is not enough to understand what is being said
Why can’t we encode F2 and F3?
Their frequencies are >700 Hz, which is above the rate-pitch limit (250-500 pps)
- for F3 we would need to use place coding (electrode location)
Why must the charges be balanced in a CI?
To avoid tissue damage
What do we need to consider when proposing signal processing strategies? Name 3
- charge equalization to avoid tissue damage
- spread of electric field in fluid (poor freq selectivity)
- electric field interactions when stimulating 2 electrodes simultaneously
- monopolar vs bipolar stimulation
- preservation of AN function across CI users and across the array w/in one user
- location of electrode relative to modulus
- narrow dynamic range (10-20 dB vs 100 dB in NH listeners)
How does the spread of the electric field impact frequency?
Poor frequency selectivity
What were the names of the 2 first speech processing strategies?
F0/F2
F0/F1/F2
What was the goal of the first speech processing strategies?
To avoid overloading the auditory system by extracting speech info explicitly (using only F0, F1, and F2)
How were the formant encoded in the first speech processing strategies (F0/F2 and F0/F1/F2)?
F0 encoded with pulse rate (b/c lower freq)
Formant freq encoded by electrode placement
With the old strategies (F0/F2, F0/F1/F2) how many electrodes were active at once?
One or two
one for pitch and one for the formant
Why are F0/F2 and F0/F1/F2 no longer used?
Poor performance
How do modern speech processing strategies differ from the original strategies?
They do NOT attempt to extract speech features explicitly
- they encode speech features implicitly by accurately representing the spectro-temporal structure of the the speech signal
- i.e. extract temporal envelope in several independent freq bands
How does CIS work?
- divides incoming sound into several freq bands
- extracts temporal envelope in each band
- uses compressed version of envelope to modulate a fixed-rate train of pulses on each electrode
- pulses for each electrode are interleaved in time
How does CIS divide incoming signals into diff freq bands (step 1)?
By using a set of bandpass filters
- typically one freq band per electrode
What are the pros and cons of matching the frequency map to the electrode placement in the cochlea?
Pros: preserves natural BM place-to-frequency relationship
Cons: loss of low frequency info, since electrode cannot reach the apex
What are the pros and cons of mapping frequencies so that the entire speech frequency range is represented on the electrode?
Pros: All frequencies represented
Cons: Unnatural frequency-to-place mapping (compressed or shifted)
Which of the two theories on frequency mapping onto electrodes is better in practice? Why?
Mapping the frequencies so that the entire speech frequency range is represented, but in a condenses area of the cochlea
- better b/c CI listeners can adapt to some distortion, and frequency shifts still yield intelligible speech
HOWEVER, not all distortions can be understood
What did the frequency reversal simulation show us?
That too much distortion makes a speech signal unintelligible
Which scale do we use with CI’s: linear or logarithmic? Why?
Logarithmic; it is closer to the tonotopic organization of the cochlea
What is the typical range covered by the CIS bandpass filters? What determines the filter widths?
120-8000 Hz
The number of electrodes determines the filter width
T/F: The acoustic frequency range of an electrode typically matches its Greenwood frequency
False
What is the second CIS step?
Envelope extraction
What two things can the band-pass filter output be broken down into? Which one does CIS use?
- Carrier (temporal fine structure)
- Amplitude modulation (envelope)
- CIS uses the envelope
How is the envelope used in CIS?
- each electrode is sent a pulse stream at a constant high rate (greater than or equal to 1000 pps)
- the amplitude of the pulses is modulated by the envelope of the output of the corresponding filter bank
What are the two methods of extracting the envelope?
- Rectification and low-pass filtering
- Hilbert transform
Describe rectification and low-pass filtering (Method 1 of envelope extraction)
- The lower half of the filter output signal is erased (half-wave rectified - only positive sine waves kept b/c only positive peaks will trigger an Action Potential)
- Low-pass filter the half-wave rectified signal (maintains the shape of the envelope and discards the fine structure)
Describe the Hilbert transform (Method 2 of envelope extraction)
- Generate a 90 degrees phase shifted signal “Y”(based on the original signal “X”)
- develop an envelope based on the equation: envelope = sqrt (X^2 + Y^2)
(- similar to modern FB cancellation, but not inverted)
**envelope is more precise with Hilbert compared to Method 1)
Which envelope extraction method is better for single tone waves?
Hilbert - the envelope is very precise
Does Hilbert work well with narrow band-width signals?
Yes
What type of signal does Hilbert not work well with?
Wideband signals - the envelope is half of the wave, including lots of fine structuring info
- Hilbert works better with as many electrodes as possible (shallower insertion -> smaller filter bank -> poorer results)
How does the cutoff of low-pass filtering impact envelope extractions?
- If we use a 50 Hz cutoff, envelope has limited amount of fine structure remaining (closer to Hilbert)
- 200 Hz cutoff, we get a good amount of temporal fine structure, with extracted envelope, which can be useful
Which is better: half-rectification and low-pass filtering OR Hilbert transform?
No clear winner
What is the third step of CIS?
Compression and conversion to current level
Why does the speech processing signal need to be compressed?
- the envelope levels follow the signal SPL, and may vary over >80 dB range
- the range of current levels b/w threshold (T) and max comfort level (C) is only 6-30 dB
THEREFORE cannot convert amplitude signals directly
Can we encode the entire dynamic range into the available “room” between T and C?
No - too much, even with compression
Since we cannot encode the entire dynamic range of human hearing into electric hearing, which parts are discarded?
Anything below 20 dB
Anything above 90 dB (everything above this is set to the max amount)
(so we now have a dynamic range that is roughly twice that available to us)
What does ASW stand for?
Adaptive Sound Window - an input dynamic range that changes with the environment
- adaptively reduces 75 dB range to 55 dB (25-80 in quiet and 45-100 in loud)
What are the steps involved in the compression and conversion to current level (Step 3 of CIS)?
- Omit <20 dB and set everything >100 dB to max
- Map the input dynamic range (IDR) to the adaptive sound window (ASW)
- Map ASW to electric dynamic range (compress those 55 dB to whatever is available, or T -> C range)
What is an alternative to ASW?
- Discard everything below 20 dB
- Infinitely compress everything above 90 dB
- Set the M (most comfortable) and T
- Mapping the IDR to the electrical DR by compressing the remaining input levels to fit
- then, a sensitivity control will adjust the IDR, leading to more or less compression (e.g. b/w 40 and 80 dB, thereby reducing sensitivity) (this setting can be adjusted by the AUD or sometimes the user)
How does “volume” in CIs differ from HAs?
Volume in CIs doesn’t make soft sounds louder, it increases the M level
e.g. decrease M from 1 to 0.8; output stays b/w M and T
Summarize Step 3 of CIS
- sound wave level obtained at mic
- set IDR
- implement ASW (alternative: manual sensitivity or skip)
- map this reduced envelope to electric DR (knowing T, C and M)
- simple compensation for summation effect is volume control through M-level reduction
- output is now 12-22 channels in current units
What are 2 reasons against sending the 12-22 channel output (end of CIS Step 3) to the electrodes?
- safety?
- channel interaction?
What is step 4 of CIS?
Generation of modulated pulse trains
- need to multiply a biphasic pulse train (e.g. 1000 pps) with amplitude corresponding to the envelope from Step 3
How do we avoid electrode interaction?
Interleaved sampling and presentation (e.g. entire pulse burst lasts 1 ms or 1/pps)
- e.g. electrode 1 stimulated - pause - electrode 2 - pause - electrode 3….
During Step 4 of CIS, the _____ from Step 3 are converted into AM pulse trains
Envelope outputs
What is good about CIS?
- interleaved stimulation avoids channel interactions
- preservation of tonotopic organization (via bandpass filters and which electrodes they send info to)
- high pulse rates allow representation of F0 and voiced/unvoiced info
- better speech reception scores
What are the problems with CIS?
- fixed rate pulsatile stimulation -> unnecessary synchronization of neural response
- severe distortions in temporal discharge patterns
- no delivery of phase info