speech perception - exam 2 Flashcards
bottom up processing
data-driven
using sensory info of incoming signal
small details
the actual sounds
top down processing
hypothesis driven
using the knowledge of our own language to understand speech
big picture
brain will expect a word more than a non word when given an ambiguous signal
Ganong Effect
play /d/ & /t/ on a continuum (make one end a word & one a non word - deach & teach)
we tend to favor the word over the non word at the category boundary
results in shifting the category boundary so %word takes up more area on the graph than %nonword
top down processing
sine wave speech
created by replacing formant freqs w/ sine waves
initially unintelligible
becomes understandable once listeners knows what the person is saying
top down processing
pop out effect
phoneme restoration effect
listeners “fill in” missing phonemes in a word, relying on context & expectations
top down effect allows continuity in perception even w/ absent sounds - noisy environments
priming
exposure to one stimulus influences a response to a subsequent stimulus
just seeing options yanny & laurel primes you to hear one or the other (& not some secret 3rd thing)
laurel/yanny
your brain chooses which freqs to pay attention to
laurel/yanny signal ambiguous so if you pay attention to lower freqs = laurel & high freqs = yanny
attention changes perception of sound –> top down
low quality recording & noise at high freq makes it plausible to mix up F3 & F2
plausible masker
playing sound over a sentence w/ gaps
easier to understand the sentence w/ the sound than w/out it
w/ masker – people couldn’t tell where the masker was & thought all phonemes were present
what are whistled languages
whistled versions of spoken language - must speak the language to understand
can overcome ambient noise & distance much better than speech
higher freqs makes it harder to mask
useful in mountainous regions & w/ shepards
pitch based whistling
used in tonal languages
whistles emulate pitch contours
speech is stripped of articulation
leaves only suprasegmental features like duration & tone
formant based whistling
used in non-tonal languages
whistles emulate articulatory features
timbral variations are transformed into pitch variations
Lombard effect
auditory feedback causes compensatory changes in speech output
involuntary (& usually unknown to speaker) increase in volume & clarity when speaking in noisy environments
static plated louder in headphones
she spoke louder (she didn’t know)
receiving less feedback from her own voice so increased volume until she was receiving feedback again
disproves that you adjust volume for your communication partner
how do we compensate for loud environments
volume
increasing pitch
increasing vowel duration
prolonging duration of content words (vs function words)
larger facial movements
sensorimotor adaptation
oppose feedback changes
learned over time
feedback loop
when speakers hear altered feedback, –> they adjust their speech in response
demonstrates feedback loop between production & perception
acuity relationships
how well you discriminate sounds predicts how differently you produce sounds
adaptive dispersion
hypothesis suggesting that vowel sounds in a language spread out within the F1-F2 space to maximize distinctiveness
maximize perceptual distance between them
vowels tend to spread out around the edges in all languages
cocktail party effect
ability to focus on one speaker in a noisy environment
auditory attention enhancing the neural representation of the target speech stream
article 2
play a sound where 2 speakers are saying different things at the same
underlying signal stays the same but brain representation (multi-electrode surface recordings from the cortex) changes depending on who you are listening for
the representation of when you were attending to one speaker was very similar to if you heard that speaker alone
attention can be trained
performed the study on patients who needed surgery already
example of top down processing
listener might hear an unclear sound in a familiar sentence & interpret it correctly due to context
“the quick b— fox jumped over the log”
can guess “brown” because high predictability sentence
temporal modulation
how fast loudness is changing in speech
faster modulation = rougher speech
faster changes on the outsides of the graph
continuous signals
continuous in both time & amplitude
infinite
analog
discrete signals
discrete in both time & amplitude
limit to how many decimal places
digital
how to convert analog to digital
limit decimal places in time = sampling
limit decimal places in amp = quantization
sampling
choosing points in time to measure
sampling rate
how often you measure
how many points do we need to reconstruct a sine wave’s freq
at least 2 per cycle
aliasing
not taking enough sample points
becomes a different freq in the reconstruction
Nyquist freq
highest freq that can be captured w/ a given sampling rate
1/2 the sampling rate
what sampling rate is needed for speech
determine freqs we care about in speech (75-8000 Hz)
not much above 10kHz - call it 11 to be safe
11kHz = Nyquist
sample at 22kHz to make sure we get everything
can sample lower & still get intelligible speech but you might start to lose detail (f confusable w/ s for example)
quantization
choosing values for the measurements
how to encode speech
break the signal into smaller chunks
quantize the louder chunks w/ more bits & less rounding
quantize the quieter chunks w/ less bits & more rounding because they will likely be masked - don’t waste storage on bits on something that won’t really be heard anyway