Wk4b - CI Speech Processing Strategies Flashcards

Question

What was the goal of the first speech processing strategies?

Answer 1

To avoid overloading the auditory system by extracting speech info explicitly (using only F0, F1, and F2)

Answer 2

F0 encoded with pulse rate (b/c lower freq) | Formant freq encoded by electrode placement

Answer 3

One or two | one for pitch and one for the formant

Answer 4

Poor performance

Answer 5

They do NOT attempt to extract speech features explicitly - they encode speech features implicitly by accurately representing the spectro-temporal structure of the the speech signal - i.e. extract temporal envelope in several independent freq bands

Answer 6

- divides incoming sound into several freq bands - extracts temporal envelope in each band - uses compressed version of envelope to modulate a fixed-rate train of pulses on each electrode * pulses for each electrode are interleaved in time

Answer 7

By using a set of bandpass filters | - typically one freq band per electrode

Answer 8

Pros: preserves natural BM place-to-frequency relationship Cons: loss of low frequency info, since electrode cannot reach the apex

Answer 9

Pros: All frequencies represented Cons: Unnatural frequency-to-place mapping (compressed or shifted)

Answer 10

Mapping the frequencies so that the entire speech frequency range is represented, but in a condenses area of the cochlea - better b/c CI listeners can adapt to some distortion, and frequency shifts still yield intelligible speech HOWEVER, not all distortions can be understood

Answer 11

That too much distortion makes a speech signal unintelligible

Answer 12

Logarithmic; it is closer to the tonotopic organization of the cochlea

Answer 13

120-8000 Hz | The number of electrodes determines the filter width

Answer 14

Envelope extraction

Answer 15

- Carrier (temporal fine structure) - Amplitude modulation (envelope) - CIS uses the envelope

Answer 16

- each electrode is sent a pulse stream at a constant high rate (greater than or equal to 1000 pps) - the amplitude of the pulses is modulated by the envelope of the output of the corresponding filter bank

Answer 17

- Rectification and low-pass filtering | - Hilbert transform

Answer 18

- The lower half of the filter output signal is erased (half-wave rectified - only positive sine waves kept b/c only positive peaks will trigger an Action Potential) - Low-pass filter the half-wave rectified signal (maintains the shape of the envelope and discards the fine structure)

Answer 19

- Generate a 90 degrees phase shifted signal "Y"(based on the original signal "X") - develop an envelope based on the equation: envelope = sqrt (X^2 + Y^2) (- similar to modern FB cancellation, but not inverted) **envelope is more precise with Hilbert compared to Method 1)

Answer 20

Hilbert - the envelope is very precise

Answer 21

Wideband signals - the envelope is half of the wave, including lots of fine structuring info - Hilbert works better with as many electrodes as possible (shallower insertion -> smaller filter bank -> poorer results)

Answer 22

- If we use a 50 Hz cutoff, envelope has limited amount of fine structure remaining (closer to Hilbert) - 200 Hz cutoff, we get a good amount of temporal fine structure, with extracted envelope, which can be useful

Answer 23

No clear winner

Answer 24

Compression and conversion to current level

Answer 25

- the envelope levels follow the signal SPL, and may vary over >80 dB range - the range of current levels b/w threshold (T) and max comfort level (C) is only 6-30 dB THEREFORE cannot convert amplitude signals directly

Answer 26

No - too much, even with compression

Answer 27

Anything below 20 dB Anything above 90 dB (everything above this is set to the max amount) (so we now have a dynamic range that is roughly twice that available to us)

Answer 28

Adaptive Sound Window - an input dynamic range that changes with the environment - adaptively reduces 75 dB range to 55 dB (25-80 in quiet and 45-100 in loud)

Answer 29

1. Omit <20 dB and set everything >100 dB to max 2. Map the input dynamic range (IDR) to the adaptive sound window (ASW) 3. Map ASW to electric dynamic range (compress those 55 dB to whatever is available, or T -> C range)

Answer 30

- Discard everything below 20 dB - Infinitely compress everything above 90 dB - Set the M (most comfortable) and T - Mapping the IDR to the electrical DR by compressing the remaining input levels to fit - then, a sensitivity control will adjust the IDR, leading to more or less compression (e.g. b/w 40 and 80 dB, thereby reducing sensitivity) (this setting can be adjusted by the AUD or sometimes the user)

Answer 31

Volume in CIs doesn't make soft sounds louder, it increases the M level e.g. decrease M from 1 to 0.8; output stays b/w M and T

Answer 32

- sound wave level obtained at mic - set IDR - implement ASW (alternative: manual sensitivity or skip) - map this reduced envelope to electric DR (knowing T, C and M) * simple compensation for summation effect is volume control through M-level reduction - output is now 12-22 channels in current units

Answer 33

- safety? | - channel interaction?

Answer 34

Generation of modulated pulse trains | - need to multiply a biphasic pulse train (e.g. 1000 pps) with amplitude corresponding to the envelope from Step 3

Answer 35

Interleaved sampling and presentation (e.g. entire pulse burst lasts 1 ms or 1/pps) - e.g. electrode 1 stimulated - pause - electrode 2 - pause - electrode 3....

Answer 36

Envelope outputs

Answer 37

- interleaved stimulation avoids channel interactions - preservation of tonotopic organization (via bandpass filters and which electrodes they send info to) - high pulse rates allow representation of F0 and voiced/unvoiced info - better speech reception scores

Answer 38

- fixed rate pulsatile stimulation -> unnecessary synchronization of neural response - severe distortions in temporal discharge patterns - no delivery of phase info

Wk4b - CI Speech Processing Strategies Flashcards

(64 cards)