Exam 3 Flashcards

1
Q

what is the external part of the ear called?

A

the pinna

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what does the outer ear consist of?

A

pinna, ear canal, and the eardrum

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what does the middle ear consist of?

A

from ear drum to the oval window: contains three small bones malleus, incus, and stapes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

passage through the middle ear does what to the sound?

A

amplifies it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what does the inner ear consist of?

A

semicircular canal and cochlea

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what happens in the cochlea?

A

mechanical sound waves are converted to electrical nerve impulses

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

for an unwound cochlea, there is a thicker and thinner end which is which? (Apical or Basal end)

A

thicker is the Apical end, thinner is the Basal end

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

for an unwound cochlea which frequencies do the Apical and Basal ends move more for?

A

the Apical end moves more for lower frequencies because thicker = lower resonant frequency
the Basal end moves more for high frequencies because thin = higher resonant frequency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

the basilar membrane has a thick and thin end which are which (Apical and Basal)

A

Apical is thick and Basal is thin

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

the basilar membrane is tonotopically organized - what does that mean?

A

different locations on the membrane correspond to different frequencies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Denes and Pinson 1993 : 90
- shows how far the basilar membrane is pushed out of place by different frequencies

What was the conclusion found?

A

lower frequencies (25 Hz) are higher farther (30 mm) from the stapes than higher frequencies (1600 Hz –> 17 mm)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

explain how hair cells work - what is their role?

A

they are attached to the basilar membrane, the hair cell fires if movement of the basilar membrane pushes the cell out of position sufficiently

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what is the response curve of a hair cell?

A

shows the lowest intensity at which a pure tone at a given frequency triggers a firing of the cell - the low point shows the freq. the hair cell responds to most readily - the closer to the apical end (thick) the lower the resonant frequency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Moore 1997 : 33
- shows the response curves for different hair cells

What does the lowest point show?

A

the lowest point is the characteristic freq. where it will fire at the lowest amplitude

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what is the most important factor of a hair cell?

A

the location of it - they are all the same otherwise

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what causes hair loss at certain frequencies?

A

the hair cells are pushed too far and sheared off

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

the outer hair cells are different from inner how?

A

when the outer hair cells fire they change length to push back on the basilar membrane and amplify the signal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Denes and Pinson 1993 : 95
- shows human’s hearing range

What does this show about our speech sounds as humans?
What is the peak sensitivity?

A

speech sounds evolved to be where our hearing is particularly good - the peak sensitivity is between 1000 and 10,000 Hz

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

tonotopically organized signals from the ear are passed to the brain through what?

A

the auditory nerve, through various bodies in the brainstem and to the cerebrum (uppermost and outermost part of the brain)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

signals from the right ear are passed to where? what is this called?

A

the left hemisphere of the brain - decussation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

where is the auditory cortex located and what does it border?

A

in each hemisphere of the temporal lobe of the cerebral cortex on the superior temporal gyrus (STG) - borders the lateral (Sylvian) fissure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

what is the primary auditory cortex?

A

entryway into the cerebral cortex for signals from the ears

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

how is the primary auditory cortex organized?

A

tonotopically - different locations correspond to different frequency bands

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

the frequency-based locations in the primary auditory cortex correspond to what?

A

frequency-sensitive locations on the basilar membrane

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

damage to the primary auditory cortex could cause what?

A

aphasia

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Bear et al. 2007
- both hemispheres of the brain have an auditory cortex

But what?

A

but one is dominant to speech processing - left for 93% of people (96% of right-handed, 70% left handed)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

what is dichotic listening?

A

speech materials are processed in the opposite hemisphere of the ear it receives it from, therefore there is often a right-ear processing advantage for speech but NOT for non-speech sounds like music or humming

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

what is Wernicke’s Area and where is it?

A

middle region of the STG that if injured causes problems with perception and comprehension (Wernicke’s Aphasia)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

where is Wernicke’s Area in relation to the auditory cortex?

A

posterior, the auditory cortex is the “bottom” part of the STG

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

when and who discovered Wernicke’s area?

A

1874, German neurologist Karl Wernicke - it was early evidence for brain area specialization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

True or false:

electrical stimulation of Wernicke’s area interferes with identification of speech sounds, discrimination between speech sounds, and comprehension of speech

A

true

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

what are combination-sensitive neurons?

A

in the STG, respond to particular patterns of frequency and amplitude - they fire only if there is an activation of a particular combination of primary cells

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Mesgarani, Cheung, Johnson and Chang (2014) - study of 6 adults whose skulls were opened for epilepsy surgery, electrodes were placed on the surface of the left STG (electrocorticography).

True or false: this study was testing to see which parts of the brain activated when the patient was producing speech and when they were not.

A

False - the study was to see which parts of the brain were active when speech was playing, but inactive during silence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Mesgarani, Cheung, Johnson and Chang (2014) - study of 6 adults whose skulls were opened for epilepsy surgery, electrodes were placed on the surface of the left STG (electrocorticography).

True or false: the patients passively listened to 500 samples of SAE sentences

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Mesgarani, Cheung, Johnson and Chang (2014) - study of 6 adults whose skulls were opened for epilepsy surgery, electrodes were placed on the surface of the left STG (electrocorticography).

True or false: researchers found that when passively listening to speech, the STG was activated constantly, but was not in silence

A

False - different groups of neurons in the STG activated for different classes of sounds

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Mesgarani, Cheung, Johnson and Chang (2014) - study of 6 adults whose skulls were opened for epilepsy surgery, electrodes were placed on the surface of the left STG (electrocorticography).

True or false:
e1 responded to the sibilant fricatives /s, ʃ , z/

A

False - e1 responded to the plosives /b, d, g, p, t, k/

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

Mesgarani, Cheung, Johnson and Chang (2014) - study of 6 adults whose skulls were opened for epilepsy surgery, electrodes were placed on the surface of the left STG (electrocorticography).

True or false:
e2 responded to the sibilant fricatives /s, ʃ , z/

A

true

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

Mesgarani, Cheung, Johnson and Chang (2014) - study of 6 adults whose skulls were opened for epilepsy surgery, electrodes were placed on the surface of the left STG (electrocorticography).

True or false:
e3 responded to the low-back vocoids (vowels and glides) /ɑ, aʊ/

A

true

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

Mesgarani, Cheung, Johnson and Chang (2014) - study of 6 adults whose skulls were opened for epilepsy surgery, electrodes were placed on the surface of the left STG (electrocorticography).

True or false:
e4 responded to the plosives /b, d, g, p, t, k/

A

False - e4 responded to the high-front vocoids /i, j/

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

Mesgarani, Cheung, Johnson and Chang (2014) - study of 6 adults whose skulls were opened for epilepsy surgery, electrodes were placed on the surface of the left STG (electrocorticography).

True or false:
e5 responded to nasals /m, n, ŋ/

A

true

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

Mesgarani, Cheung, Johnson and Chang (2014)
What is PSI?

A

phoneme selectivity index, represents the number of other phonemes statistically distinguishable from that phoneme in the response of a specific electrode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

Mesgarani, Cheung, Johnson and Chang (2014)
what does a PSI = 0 mean?

A

that electrode does NOT distinguish between that phoneme and any others

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

Mesgarani, Cheung, Johnson and Chang (2014)
true or false:
PSI = 32 means the electrode can detect 32 phonemes

A

false - it means the electrode is maximally selective, the phoneme is distinguishable from all other phonemes in the response of that electrode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

Mesgarani, Cheung, Johnson and Chang (2014)
true or false:
neurons sensitive to a particular acoustic combination are located near neurons sensitive to similar combinations and are therefore tonotopically organized

A

false - they are organized by phonetic category

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

Mesgarani, Cheung, Johnson and Chang (2014)
true or false:
from the basilar membrane to the primary auditory cortex, sound is represented tonotopically in the form of time-varying frequency spectrum, corresponding to a spectrogram

A

true

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

what is a category?

A

a set of entities or events that all elicit an equivalent response

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

categories are essential to learning and cognition - why?

A

we can only generalize particular experiences to general knowledge through the use of categories

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

true or false:
speech categories are the same across people and situations

A

false - they vary greatly from speaker to speaker and context to context; each person has a broad range of phonetic events they pull from to decode a word or sound

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

true or false:
an acoustic continuum is a series of items that differ gradiently for a series of acoustic properties

A

false - only one acoustic property not multiple

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

true or false:
an F1 continuum would be a series of items that have the same F1 but are different in other aspects

A

false - the items would differ ONLY in F1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

true or false:
in a F1 continuum, the difference in F1 of each of the items in the series is the same as the preceding member

A

true

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

Lisker & Abramson 1970
- varied VOT in word-initial stops, using speech synthesis from -150 to 150 ms in 10 ms steps for each place of articulation (bilabial, apical, velar) - subjects who spoke Thai, Spanish, and English were asked to identify the initial consonant of the stimulus among a choice of sounds in their language

true or false:
the space where the lines for either sound meets is called the perceptual/identification boundary

A

true

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
53
Q

Lisker & Abramson 1970
- varied VOT in word-initial stops, using speech synthesis from -150 to 150 ms in 10 ms steps for each place of articulation (bilabial, apical, velar) - subjects who spoke Thai, Spanish, and English were asked to identify the initial consonant of the stimulus among a choice of sounds in their language

true or false:
the study found that at low VOT, English speakers identified the stop as voiceless 100% of the time

A

false - at low VOT the subjects identified the sounds as VOICED 100% of the time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
54
Q

Lisker & Abramson 1970
- varied VOT in word-initial stops, using speech synthesis from -150 to 150 ms in 10 ms steps for each place of articulation (bilabial, apical, velar) - subjects who spoke Thai, Spanish, and English were asked to identify the initial consonant of the stimulus among a choice of sounds in their language

true or false:
at high VOT the English subjects identified the stop as voiceless 100% of the time

A

true

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
55
Q

Lisker & Abramson 1970
- varied VOT in word-initial stops, using speech synthesis from -150 to 150 ms in 10 ms steps for each place of articulation (bilabial, apical, velar) - subjects who spoke Thai, Spanish, and English were asked to identify the initial consonant of the stimulus among a choice of sounds in their language

true or false:
the perceptual / identification boundary is where subjects were able to tell the stops apart 100% of the time

A

false - the boundary is where they identified the stimulus as voiced 50% of the time and voiceless 50% of the time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
56
Q

true or false:
Lisker and Abramson (1964) found that further forward places of articulation are associated with greater VOT values

A

false - places of articulation that are further back are associated with greater VOT values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
57
Q

Lisker & Abramson 1970
- varied VOT in word-initial stops, using speech synthesis from -150 to 150 ms in 10 ms steps for each place of articulation (bilabial, apical, velar) - subjects who spoke Thai, Spanish, and English were asked to identify the initial consonant of the stimulus among a choice of sounds in their language

true or false:
a conclusion drawn from this study is that the identification boundary is at a lower VOT for alveolars and velars than for bilabials

A

false - because alveolar and velar sounds are further back in the mouth = greater (higher) VOT

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
58
Q

what is categorical perception?

A

listeners ignore the differences of sounds on the same side of the perceptual boundary and only discriminate sounds that lie on opposite sides

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
58
Q

Lisker & Abramson 1970
- varied VOT in word-initial stops, using speech synthesis from -150 to 150 ms in 10 ms steps for each place of articulation (bilabial, apical, velar) - subjects who spoke Thai, Spanish, and English were asked to identify the initial consonant of the stimulus among a choice of sounds in their language

true or false:
this study found that speakers differentiate sounds within each side of the perceptual boundary

A

false - they ignore the differences of those on the same side and only discriminate sounds that lie on opposite sides of the boundary

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
59
Q

Liberman et al 1957
- synthesized a series of stop-vowel syllables that were alike in steady-state values of F1 and F2 - they only differed in the onset value of the initial F2 transition from way above F2 steady-state to way below (hand drawn looked like eyebrows) - subjects were asked to identify as b, d, or g

true or false:
when F2 pointed down, subjects identified the consonant as d

A

false - F2 pointing down was identified as b

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
60
Q

Liberman et al 1957
- synthesized a series of stop-vowel syllables that were alike in steady-state values of F1 and F2 - they only differed in the onset value of the initial F2 transition from way above F2 steady-state to way below (hand drawn looked like eyebrows) - subjects were asked to identify as b, d, or g

true or false:
when F2 was flat, subjects identified the consonant as g

A

false - F2 was flat it was identified as d

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
61
Q

Liberman et al 1957
- synthesized a series of stop-vowel syllables that were alike in steady-state values of F1 and F2 - they only differed in the onset value of the initial F2 transition from way above F2 steady-state to way below (hand drawn looked like eyebrows) - subjects were asked to identify as b, d, or g

true or false:
when F2 pointed up, subjects identified the consonant as b

A

false - F2 pointed up was identified as g

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
62
Q

Liberman et al 1957
- synthesized a series of stop-vowel syllables that were alike in steady-state values of F1 and F2 - they only differed in the onset value of the initial F2 transition from way above F2 steady-state to way below (hand drawn looked like eyebrows) - subjects were asked to identify as b, d, or g

there is only one ambiguous stop in between which two stimulus?

A

3 (almost always b) and #5 (almost always d) - boundary between b and d

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
63
Q

Liberman et al 1957
discrimination experiment:
- synthesized a series of stop-vowel syllables that were alike in steady-state values of F1 and F2 - they only differed in the onset value of the initial F2 transition from way above F2 steady-state to way below (hand drawn looked like eyebrows) - subjects listened to a series of 3 syllables (b, d, or g) together (e.g. ABX) where A and B are different and X is either identical to A or B

true or false:
if two of the syllables were within the same category (same side) subjects found it hard to discriminate between them

A

true

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
64
Q

why might humans be more sensitive to acoustic cues that distinguish categories and insensitive to those within the categories?

A

because the acoustic differences within categories do NOT help with our goal of identifying what sound is being produced

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
65
Q

Miyawaki et al. 1975
- synthesized syllables with a sonorant consonant followed by [ɑ], the only difference was the frequency of F3 in the consonant (r, l) - subjects (SAE and Japanese) heard each in random order and asked to determine if they were l or r (law or raw).

true or false:
stimuli with a low F3 in the consonant was identified as “l” nearly 100% of the time

A

false - low F3 were identified as “r” nearly 100% of the time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
66
Q

Miyawaki et al. 1975
- synthesized syllables with a sonorant consonant followed by [ɑ], the only difference was the frequency of F3 in the consonant (r, l) - subjects (SAE and Japanese) heard each in random order and asked to determine if they were l or r (law or raw).

true or false:
stimuli with a high F3 in the consonant were identified as “l” nearly 100% of the time

A

true

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
67
Q

Miyawaki et al. 1975
- synthesized syllables with a sonorant consonant followed by [ɑ], the only difference was the frequency of F3 in the consonant (r, l) - subjects (SAE and Japanese) heard each in random order and asked to determine if they were l or r (law or raw).

there was one stimulus that could not be clearly assigned by the subjects - what does this mean and which one was it?

A

7, it was the identification boundary between l and r

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
68
Q

Miyawaki et al. 1975
- synthesized syllables with a sonorant consonant followed by [ɑ], the only difference was the frequency of F3 in the consonant (r, l) - subjects (SAE and Japanese) heard each in random order and asked to determine if they were l or r (law or raw).

what were the three main findings of this study?

A
  1. SAE speakers did well distinguishing the sounds on opposite sides of the boundary
  2. SAE speakers were guessing/leaving it to chance when discriminating within the categories
  3. Japanese speakers, having no contrast between the sounds in Japanese, could not distinguish the sounds
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
69
Q

are vowels similar in discrimination to consonants? why or why not?

A

no they aren’t, there is a perceptual boundary but it is not a peak in discriminability like consonants, it is gradable - people can discriminate within vowel categories as well as between them

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
70
Q

what is one hypothesis as to why consonants have a perceptual boundary and vowels don’t?

A

categorical perception may be limited to rapid, dynamic acoustic properties, like the VOT and F2 formant transitions between consonants and vowels, but vowels have steady-state formant patterns that stay the same for what in speech is a long time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
71
Q

what is speaker normalization?

A

the listener’s ability to handle/understand the differences among speakers that are unlike what they have heard before

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
72
Q

what are the 3 main ways speaker’s voices differ and which one is the MAIN way?

A
  1. MOST IMPORTANTLY differ in the formant frequencies
  2. they differ in f0 (higher or lower pitch) depending on the length of their vocal chords
  3. voice quality as measured in open quotient or spectral tilt
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
73
Q

true or false:
only F1 is higher in women than men

A

false - F1 and F2 are higher in women than men

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
74
Q

men are generally larger than women, and women are larger than children - what does this mean in terms of their voices?

A

men have longer vocal tracts than women who have longer ones than children therefore men have the lowest resonant frequencies, then women’s, then children - however that does not mean that all large people have the deepest voices

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
75
Q

true or false:
the difference between men and women lies mainly in the length of the pharynx

A

true

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
76
Q

true or false:
Peter Ladefoged has formant values for his vowels that are close to those of SAE but are not the same vowel and are therefore easily confused

A

false - though the formant values are close for one vowel said by him and a different one said by a SAE speaker, they do NOT get confused for each other - distinction is NOT ONLY in formant values (speaker normalization)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
77
Q

what is one of the biggest problems when developing automatic speech recognition software?

A

computers cannot, as easily or as well as humans, perform speaker normalization when encountering a new voice unlike what they’ve heard before

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
78
Q

Ladefoged and Broadbent 1957
- synthesized 4 syllables differing only in F1 and F2, in isolation the syllables were identified as bit, bet, bat, and but - they also synthesized (via F1 and F2) the syllables of a carrier sentence “Please say what this word is” before the next word which subjects had to identify.

true or false:
with the “normal” carrier sentence test word A (F1: 375 Hz) was identified as “bat”

A

false - with the “normal” carrier sentence test word A (375 Hz) was identified as “bit”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
79
Q

Ladefoged and Broadbent 1957
- synthesized 4 syllables differing only in F1 and F2, in isolation the syllables were identified as bit, bet, bat, and but - they also synthesized (via F1 and F2) the syllables of a carrier sentence “Please say what this word is” before the next word which subjects had to identify.

true or false:
with the “normal” carrier sentence test word B (F1: 450) was identified as “bet”

A

true

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
80
Q

Ladefoged and Broadbent 1957
- synthesized 4 syllables differing only in F1 and F2, in isolation the syllables were identified as bit, bet, bat, and but - they also synthesized (via F1 and F2) the syllables of a carrier sentence “Please say what this word is” before the next word which subjects had to identify.

true or false:
with the “normal” carrier sentence test word C (F1: 575 Hz) was identified as “but”

A

false - with the “normal” carrier sentence test word C (F1: 575 Hz) was identified as “bat”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
81
Q

Ladefoged and Broadbent 1957
- synthesized 4 syllables differing only in F1 and F2, in isolation the syllables were identified as bit, bet, bat, and but - they also synthesized (via F1 and F2) the syllables of a carrier sentence “Please say what this word is” before the next word which subjects had to identify.

true or false:
with the “normal” carrier sentence test word D (F1: 600 Hz, F2: 1300 Hz) was identified as “bat”

A

false - with the “normal” carrier sentence test word D (F1: 600 Hz, F2: 1300 Hz) was identified as “but”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
82
Q

Ladefoged and Broadbent 1957
- synthesized 4 syllables differing only in F1 and F2, in isolation the syllables were identified as bit, bet, bat, and but - they also synthesized (via F1 and F2) the syllables of a carrier sentence “Please say what this word is” before the next word which subjects had to identify.

true or false:
when F1 was lowered in the carrier sentence, test word A (375 Hz) started to be identified as “bet”

A

true - in the low F1 context, the value of 375 Hz counted as high in comparison so the vowel was judged to be low

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
83
Q

Ladefoged and Broadbent 1957
- synthesized 4 syllables differing only in F1 and F2, in isolation the syllables were identified as bit, bet, bat, and but - they also synthesized (via F1 and F2) the syllables of a carrier sentence “Please say what this word is” before the next word which subjects had to identify.

true or false:
when F1 was raised in the carrier sentence, test word B (450 Hz) started to be identified as “bat”

A

false - “bet” began to be identified as “bit” because with the context of high F1 values in the carrier, 450 Hz counted as low in comparison so the vowel was judged to be high

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
84
Q

Ladefoged and Broadbent 1957
- synthesized 4 syllables differing only in F1 and F2, in isolation the syllables were identified as bit, bet, bat, and but - they also synthesized (via F1 and F2) the syllables of a carrier sentence “Please say what this word is” before the next word which subjects had to identify.

true or false:
when F1 was raised in the carrier sentence, test word C (575 Hz) started to be identified as “bit”

A

false - “bat” started to be identified as “bet” because compared to the high F1 in the carrier, 575 Hz was not that high so the vowel was judged to be mid rather than low

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
85
Q

Ladefoged and Broadbent 1957
- synthesized 4 syllables differing only in F1 and F2, in isolation the syllables were identified as bit, bet, bat, and but - they also synthesized (via F1 and F2) the syllables of a carrier sentence “Please say what this word is” before the next word which subjects had to identify.

true or false:
when F2 was lowered in the carrier sentence, test word D (F1: 600 Hz, F2: 1300 Hz) started to be identified as “but”

A

false - “but” started to be identified as “bat” because compared to the low F2 values, 1300 Hz was not all that low, and was judged to be front

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
86
Q

Ladefoged and Broadbent 1957
- synthesized 4 syllables differing only in F1 and F2, in isolation the syllables were identified as bit, bet, bat, and but - they also synthesized (via F1 and F2) the syllables of a carrier sentence “Please say what this word is” before the next word which subjects had to identify.

what was the conclusion found by this study?

A

listeners notice where the formants are in vowels from a new speaker and adapt their model of the vowel space to fit the new voice - their expectations change as they learn where the new speakers vowels are which can happen in a matter of seconds - intelligent problem solving NOT passive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
87
Q

Mullenix et al. 1989
- one group of listeners identified lists of words in noise produced by a single speaker, while another group heard the same words produced by multiple speakers

true or false:
the group hearing a single speaker in noise identified the words slower and less accurately that those hearing multiple speakers

A

false - the group hearing a single speaker in noise were faster and more accurate because over the short period of time they were able to learn more about the single voice and improve their processing of their speech

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
88
Q

Nygaard and Pisoni 1998
- subjects listen to samples from 10 different speakers over 10 days, they learned voices well enough to match a new sample to them, and were presented with a word that needed to identify in noise.

true or false:
subjects made fewer errors identifying words in noise if it was produced by one of the voices they were already familiar with

A

true

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
89
Q

what is priming?

A

previous exposure to one stimulus (the prime) improves processing performance (accuracy and speed) on the task with a later stimulus (the target)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
90
Q

Nygaard and Pisoni 1998
- subjects listen to samples from 10 different speakers over 10 days, they learned voices well enough to match a new sample to them, and were presented with a word that needed to identify in noise.

how does priming help explain the results this study?

A

the priming was greater when the prime and the target were produced by the same voice which implies that the voice was part of the memory representation for the prime

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
91
Q

Goldinger 1996, 1998
- exposed subjects to words produced by different speakers in a study session, they were tested in various tasks involving those words in a test session (e.g. have you heard this word before in the study session?)

what were the results of this experiment?

A

the subjects were quicker and more accurate in making this judgement if they heard the word produced by the same speaker who produced it in the study session

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
92
Q

Goldinger 1996, 1998
- exposed subjects to words produced by different speakers in a study session, they were tested in various tasks involving those words in a test session - they were asked to identify sounds in the word, discriminate sounds in the word, and repeat the word as quickly as possible (shadowing)

what were the results of these tasks?

A

they all were done faster and more accurately if they had previously heard the test word produced by the same voice as in the test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
93
Q

Goldinger 1996, 1998
- exposed subjects to words produced by different speakers in a study session, they were tested in various tasks involving those words in a test session - they were asked to identify sounds in the word, discriminate sounds in the word, and repeat the word as quickly as possible (shadowing)

true or false:
this experiment shows that activating both word and voice at the same time is less effective than just activating the word

A

false - it is more effective to activate both the word and the voice because they both activate memory of that word and that voice

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
94
Q

what is an exemplar?

A

the memory representation of a category like the word “cat” consists of every instance of that word one has ever encountered organized by recency, speaker, context, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
95
Q

true or false:
if you heard a word recently from a speaker, you can more quickly process that word in a new instance from the same speaker

A

true - speaker normalization is partially responsible for this

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
96
Q

what is the perceptual challenge of coarticulation?

A

there is a different version of every phoneme for every preceding sound and for every following sound - the differences can be as large as those between categories

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
97
Q

what is the main problem with speech recognition programs?

A

it is hard for them to account for coarticulation so a speech sound or spoken word from one context won’t match a sample of the same sound or word from another context

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
98
Q

true or false:
to counteract coarticulation problems, listeners remember vowels with the sound that precedes it, not by itself

A

false - they remember the sound that precedes it and the sound that follows it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
99
Q

Lindblom and Studdert-Kennedy 1967
- series of high vowels synthesized, varying just in the freq. of F2, from clear I to with high F2 to clear ʊ with low F2 - the vowels were spliced into three environments: isolation, w__w, j__j

what would be the expectation for the F2 of the vowel between j__j?

A

F2 would be high, like it is in j

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
100
Q

Lindblom and Studdert-Kennedy 1967
- series of high vowels synthesized, varying just in the freq. of F2, from clear I to with high F2 to clear ʊ with low F2 - the vowels were spliced into three environments: isolation, w__w, j__j

what would be the expectation for the F2 of the vowel between w__w?

A

F2 of the vowel would be low, like w

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
101
Q

Lindblom and Studdert-Kennedy 1967
- series of high vowels synthesized, varying just in the freq. of F2, from clear I to with high F2 to clear ʊ with low F2 - the vowels were spliced into three environments: isolation, w__w, j__j - 10 speakers of SAE listened to the words and identified them as containing I or ʊ

true or false:
when the vowel had a high F2 it was identified as ʊ no matter the context

A

false - it was identified as I no matter the context

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
102
Q

Lindblom and Studdert-Kennedy 1967
- series of high vowels synthesized, varying just in the freq. of F2, from clear I to with high F2 to clear ʊ with low F2 - the vowels were spliced into three environments: isolation, w__w, j__j - 10 speakers of SAE listened to the words and identified them as containing I or ʊ

true or false:
when the vowel had a low F2 it was identified as ʊ no matter the context

A

true

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
103
Q

Lindblom and Studdert-Kennedy 1967
- series of high vowels synthesized, varying just in the freq. of F2, from clear I to with high F2 to clear ʊ with low F2 - the vowels were spliced into three environments: isolation, w__w, j__j - 10 speakers of SAE listened to the words and identified them as containing I or ʊ

true or false:
when the vowel has intermediate F2 values, it was identified as ʊ more often in the low F2 w__w environment than in isolation

A

false - it was identified as I

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
104
Q

Lindblom and Studdert-Kennedy 1967
- series of high vowels synthesized, varying just in the freq. of F2, from clear I to with high F2 to clear ʊ with low F2 - the vowels were spliced into three environments: isolation, w__w, j__j - 10 speakers of SAE listened to the words and identified them as containing I or ʊ

true or false:
when the vowel has intermediate F2 values, it was identified as I less often in the high F2 j__j environment than in isolation

A

true

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
105
Q

Lindblom and Studdert-Kennedy 1967
- series of high vowels synthesized, varying just in the freq. of F2, from clear I to with high F2 to clear ʊ with low F2 - the vowels were spliced into three environments: isolation, w__w, j__j - 10 speakers of SAE listened to the words and identified them as containing I or ʊ

how did the identification boundary shift for I and ʊ?

A

shifted toward lower values of F2 in the low F2 context and toward higher F2 values in the higher F2 context

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
106
Q

Lindblom and Studdert-Kennedy 1967
- series of high vowels synthesized, varying just in the freq. of F2, from clear I to with high F2 to clear ʊ with low F2 - the vowels were spliced into three environments: isolation, w__w, j__j - 10 speakers of SAE listened to the words and identified them as containing I or ʊ

what is the interpretation (2) of this study’s results?

A

when listeners hear a vowel in a context that raises F2, such as j__j, they know that part of the height of F2 for that vowel is due to context - to compensate for this effect they raise the F2 boundary between I and ʊ which reduces the range of F2 values that are identified as I

when listeners hear a vowel in a context that lowers F2, such as w__w, they know that part of the lowness of F2 for the vowel is due to context - to compensate for this, they lower the F2 boundary between I and ʊ which increases the range of F2 values that are identified as I

107
Q

Mann and Repp 1980
- investigating the effect of a following vowel on the perceptual distinction between s and ʃ - synthesized a continuum of 9 fricatives varying in center frequency from ʃ (1957 Hz) to s (3917 Hz - each occurred with two different following vowels ɑ and u - listeners were asked to identify “sh” or “s”

true or false:
listeners had more “sh” responses with higher center freq. (to the left of the chart) than with lower ones (to the right)

A

false - more “sh” responses with lower central freq. (to the left) than with higher ones (to the right)

108
Q

what is the difference between s and ʃ ?

A

both are sibilants with intense noise extending down from the highest freq. but the center freq. of the noise is lower in ʃ than in s (the noise extends down lower in ʃ )

109
Q

Mann and Repp 1980
- investigating the effect of a following vowel on the perceptual distinction between s and ʃ - synthesized a continuum of 9 fricatives varying in center frequency from ʃ (1957 Hz) to s (3917 Hz - each occurred with two different following vowels ɑ and u - listeners were asked to identify “sh” or “s”

true or false:
listeners had more “sh” responses at the higher center frequencies in the context of ɑ than in the u context

A

true

110
Q

Mann and Repp 1980
- investigating the effect of a following vowel on the perceptual distinction between s and ʃ - synthesized a continuum of 9 fricatives varying in center frequency from ʃ (1957 Hz) to s (3917 Hz - each occurred with two different following vowels ɑ and u - listeners were asked to identify “sh” or “s”

what is the interpretation of the results of this study?

A

listeners take into account the following vowel when identifying a fricative - they know the effects of coarticulation on each sound - because u is rounded it lengthens the vocal tract and lowers the frequency so the center frequency of a sibilant is lower if it is followed by u rather than “a”

111
Q

Mann and Repp 1980
- investigating the effect of a following vowel on the perceptual distinction between s and ʃ - synthesized a continuum of 9 fricatives varying in center frequency from ʃ (1957 Hz) to s (3917 Hz - each occurred with two different following vowels ɑ and u - listeners were asked to identify “sh” or “s”

true or false:
in order to identify a fricative as ʃ, the center frequency has to be lower before u

A

true - because listeners attribute some lowness of center freq. there to the vowel context

112
Q

Mann and Repp 1980
- investigating the effect of a following vowel on the perceptual distinction between s and ʃ - synthesized a continuum of 9 fricatives varying in center frequency from ʃ (1957 Hz) to s (3917 Hz - each occurred with two different following vowels ɑ and u - listeners were asked to identify “sh” or “s”

true or false:
knowing that u has a lowering effect on center frequency, listeners adjust the expected center frequency for s upward

A

false - they adjust it downward

113
Q

West 1999
- investigate the coarticulatory effects of approximants l and r on neighboring vowels - for the words “a berry” and “a belly” the coarticulation effects extended so far, all vowels had differences in F1 - F3 depending on the medial consonant - the liquid was replaced with noise and listeners were asked to identify which word they heard

what did the study find?

A

the subjects were able to identify the word correctly if noise covered most of the word as long as they could still hear the vowel preceding the liquid

114
Q

West 1999
- investigate the coarticulatory effects of approximants l and r on neighboring vowels - for the words “a berry” and “a belly” the coarticulation effects extended so far, all vowels had differences in F1 - F3 depending on the medial consonant - the liquid was replaced with noise and listeners were asked to identify which word they heard

what does the knowledge from the results of this study show us?

A

our identification is noise-resistant since it spreads out the acoustic evidence a listener can use in identification

115
Q

what is bottom-up processing?

A

when one figures out the bigger units on the basis of the smaller units they contain for example determining what sound is being produced by the sounds it contains

116
Q

what is top-down processing?

A

figuring out the smaller constituent units on the basis of bigger units that contain them for example the identification of speech sounds is informed by out knowledge of words in our language

117
Q

Ganong 1980
- synthesized words varying just in the VOT of an initial alveolar or velar stop - in one class of series (word-nonword), a voiced stop in that position would form a word and a voiceless stop would form a non-word (e.g “dash” and “tash”) - in the other condition (nonword-word) an initial voiced stop would form a non-word and a voiceless stop would form a word (e.g “dask” and “task”) - listeners had to identify the inital sound as voiced or voiceless

what was the measure of this study?

A

the percentage of times that the subject identified a stimulus as voiced (d or g) with 2 factors: VOT of the initial stop and word status (word-nonword or nonword-word)

118
Q

Ganong 1980
- synthesized words varying just in the VOT of an initial alveolar or velar stop - in one class of series (word-nonword), a voiced stop in that position would form a word and a voiceless stop would form a non-word (e.g “dash” and “tash”) - in the other condition (nonword-word) an initial voiced stop would form a non-word and a voiceless stop would form a word (e.g “dask” and “task”) - listeners had to identify the inital sound as voiced or voiceless

true or false:
the hypothesis of the study was that stops with lower VOT values will more often be identified as voiceless than stops with higher VOT values

A

false - stops with lower VOT values will more often be identified as voiced

119
Q

Ganong 1980
- synthesized words varying just in the VOT of an initial alveolar or velar stop - in one class of series (word-nonword), a voiced stop in that position would form a word and a voiceless stop would form a non-word (e.g “dash” and “tash”) - in the other condition (nonword-word) an initial voiced stop would form a non-word and a voiceless stop would form a word (e.g “dask” and “task”) - listeners had to identify the inital sound as voiced or voiceless

true or false:
the hypothesis of this study is that all else being equal, listeners will tend to give the identification answer that yields a word

A

true

120
Q

Ganong 1980
- synthesized words varying just in the VOT of an initial alveolar or velar stop - in one class of series (word-nonword), a voiced stop in that position would form a word and a voiceless stop would form a non-word (e.g “dash” and “tash”) - in the other condition (nonword-word) an initial voiced stop would form a non-word and a voiceless stop would form a word (e.g “dask” and “task”) - listeners had to identify the inital sound as voiced or voiceless

true or false:
listeners had higher proportion of voiced identification responses when VOT was lower - for BOTH word and nonword conditions

A

true

121
Q

Ganong 1980
- synthesized words varying just in the VOT of an initial alveolar or velar stop - in one class of series (word-nonword), a voiced stop in that position would form a word and a voiceless stop would form a non-word (e.g “dash” and “tash”) - in the other condition (nonword-word) an initial voiced stop would form a non-word and a voiceless stop would form a word (e.g “dask” and “task”) - listeners had to identify the inital sound as voiced or voiceless

the proportion of voiced responses was higher in the word-nonword condition when VOT was lower - why might that be?

A

the condition of a voiced consonant actually made a word so people used their knowledge of words to identify the consonant

122
Q

Ganong 1980
- synthesized words varying just in the VOT of an initial alveolar or velar stop - in one class of series (word-nonword), a voiced stop in that position would form a word and a voiceless stop would form a non-word (e.g “dash” and “tash”) - in the other condition (nonword-word) an initial voiced stop would form a non-word and a voiceless stop would form a word (e.g “dask” and “task”) - listeners had to identify the inital sound as voiced or voiceless

VOT changes affected voicing identification - is this an example of top-down processing or bottom-up processing?

A

bottom-up processing - the physical properties of an individual sound help identify the word

123
Q

Ganong 1980
- synthesized words varying just in the VOT of an initial alveolar or velar stop - in one class of series (word-nonword), a voiced stop in that position would form a word and a voiceless stop would form a non-word (e.g “dash” and “tash”) - in the other condition (nonword-word) an initial voiced stop would form a non-word and a voiceless stop would form a word (e.g “dask” and “task”) - listeners had to identify the inital sound as voiced or voiceless

word status affected the voicing identification - is this an example of top-down processing or bottom-up processing?

A

top-down processing - using knowledge of vocabulary to help decide what sound they heard

124
Q

what are phonotactic restrictions?

A

generalizations about what sequences of sounds can occur in some position in an utterance for example at the beginning of a syllable

125
Q

Massaro and Cohen 1983
- speech synthesis to create a series of syllables with a liquid preceding the vowel [i] - the syllables only differed in F3 (low F3 = r, high F3 = l) - ree/lee were preceded by a synthesized consonant: p, t, s, or v - subjects were asked to select of the possible combinations which they heard

what was the main goal of this study?

A

it was expected that the lower F3, the more likely subjects will choose sounds with r not l - but would the identification differ depending on the preceding consonant?

126
Q

Massaro and Cohen 1983
- speech synthesis to create a series of syllables with a liquid preceding the vowel [i] - the syllables only differed in F3 (low F3 = r, high F3 = l) - ree/lee were preceded by a synthesized consonant: p, t, s, or v - subjects were asked to select of the possible combinations which they heard

true or false:
the highest proportion of r responses was for a stimuli beginning with t, which would be incompatible with a following l in English

A

true

127
Q

Massaro and Cohen 1983
- speech synthesis to create a series of syllables with a liquid preceding the vowel [i] - the syllables only differed in F3 (low F3 = r, high F3 = l) - ree/lee were preceded by a synthesized consonant: p, t, s, or v - subjects were asked to select of the possible combinations which they heard

true or false:
the lowest proportion of r responses was for stimuli beginning with s, which is incompatible with a following r

A

true

128
Q

Massaro and Cohen 1983
- speech synthesis to create a series of syllables with a liquid preceding the vowel [i] - the syllables only differed in F3 (low F3 = r, high F3 = l) - ree/lee were preceded by a synthesized consonant: p, t, s, or v - subjects were asked to select of the possible combinations which they heard

p and v responses had mid level proportions - why is that?

A

p is compatible with either l or r and v is compatible with neither - the effect being tested here does not apply to these

129
Q

Massaro and Cohen 1983
- speech synthesis to create a series of syllables with a liquid preceding the vowel [i] - the syllables only differed in F3 (low F3 = r, high F3 = l) - ree/lee were preceded by a synthesized consonant: p, t, s, or v - subjects were asked to select of the possible combinations which they heard

what are the overall results of this study?

A

when F3 was highest, subjects identified it as l regardless or what preceded it, when F3 was lowest, subjects identified it as r regardless of what preceded it, but over all F3 values they avoided identifying sound sequences that cannot occur in English

130
Q

Massaro and Cohen 1983
- speech synthesis to create a series of syllables with a liquid preceding the vowel [i] - the syllables only differed in F3 (low F3 = r, high F3 = l) - ree/lee were preceded by a synthesized consonant: p, t, s, or v - subjects were asked to select of the possible combinations which they heard

how is phonotactic knowledge being used here?

A

subjects used their phonotactic knowledge of where English sounds can occur as a comparison to what they heard to see the likelihood of that sound occurring in that context

131
Q

what is syntax?

A

how words fit together in sentences

132
Q

how do we use syntax in speech processing?

A

we use out knowledge of it to restrict the possible words that could fill a given slot we are trying to identify

133
Q

Miller, Heise, and Lichten 1951
- presented words (real and nonsense) to subjects at varying signal-to-noise rations and in different contexts and had to identify what was being said

true or false:
the higher the signal to noise ratio, the more accurate the identification was

A

true - quieter noise, louder speech

134
Q

Miller, Heise, and Lichten 1951
- presented words (real and nonsense) to subjects at varying signal-to-noise rations and in different contexts and had to identify what was being said

in what context did subjects do better in?

A

when real words in sentences were used and even better with digits in a sequence

135
Q

Miller, Heise, and Lichten 1951
- presented words (real and nonsense) to subjects at varying signal-to-noise rations and in different contexts and had to identify what was being said

why was it significantly harder for subjects to identify nonsense words than actual words in a sentence?

A

for a nonsense word the listener must correctly identify each sound in that word out of an endless possible range - for real words the listener only needs to consider what English word could fit in that context - with digits the pool is even smaller

136
Q

Warren 1970
- in the sentence “the state governors met with their respective legislatures convening in the capital city”, the first s in “legislature” was replaced with a cough or a tone of the same duration - listeners were given the text and asked to circle the sound that had been replaced when they heard it

true or false:
none of the subjects were able to correctly identify the location

A

true

137
Q

Warren 1970
- in the sentence “the state governors met with their respective legislatures convening in the capital city”, the first s in “legislature” was replaced with a cough or a tone of the same duration - listeners were given the text and asked to circle the sound that had been replaced when they heard it

why were none of them able to find it?

A

the context provided so much information that listeners did not have to hear the first s in the word in order to identify it as “legislatures” - they automatically filled in the missing information based on their knowledge of English words

138
Q

Warren 1970
- in the sentence “the state governors met with their respective legislatures convening in the capital city”, the first s in “legislature” was replaced with a cough or a tone of the same duration - listeners were given the text and asked to circle the sound that had been replaced when they heard it

both top-down and bottom-up processing could be used here but which one surpasses the other and causes the phenomenon seen?

A

top-down surpasses because rather than hear the sounds and determine the word, they identified the word out of the possibilities based on their English knowledge

139
Q

true or false:
top-down and bottom-up processing are both useful for different contexts and one usually surpasses the other

A

false - using both at the same time is the most efficient way to process speech rapidly - attack from all angles - and they both can serve as checks for one another in case there is extra noise or unexpected events that make one less effective

140
Q

Sumby and Pollack 1954
- whether listeners can identify words better in noise if they can see the speaker’s face than if they can just hear their voice - varying noise levels over words in both conditions

true or false:
at higher noise levels the accuracy was lower and the longer the word list the lower the accuracy

A

true

141
Q

Sumby and Pollack 1954
- whether listeners can identify words better in noise if they can see the speaker’s face than if they can just hear their voice - varying noise levels over words in both conditions

true or false:
accuracy was greater in the auditory + visual condition, and even more so at higher noise levels

A

true

142
Q

true or false:
visual information can change what sounds we actually hear

A

true

143
Q

McGurk and MacDonald 1976
- recordings were made of the sounds “baba”, “gaga”, “papa”, and “kaka” and the audios and videos were mismatched - the two conditions were video-audio and audio only shown to adults, preschoolers, and primary school kids - subjects were asked to repeat what they heard

true or false:
the video-audio condition had many errors, with subjects resorting to a compromise between what they saw and heard (e.g. v: “gaga”, a: “baba”, they responded with “dada”)

A

true

144
Q

McGurk and MacDonald 1976
- recordings were made of the sounds “baba”, “gaga”, “papa”, and “kaka” and the audios and videos were mismatched - the two conditions were video-audio and audio only shown to adults, preschoolers, and primary school kids - subjects were asked to repeat what they heard

what was the unexpected result gotten from this study?

A

the children were much less susceptible by the McGurk effect and were able to usually correctly identify the auditory stimulus regardless of the video - however not perfectly only 27.5% and 46.5%

145
Q

McGurk and MacDonald 1976
- recordings were made of the sounds “baba”, “gaga”, “papa”, and “kaka” and the audios and videos were mismatched - the two conditions were video-audio and audio only shown to adults, preschoolers, and primary school kids - subjects were asked to repeat what they hear

at what age did the study find people have fully learned to use visual information?

A

after age 8, relatively late

146
Q

the McGurk effect is very robust - what are some examples?

A

it held true even when the subjects were informed of how the stimuli were constructed, the audio and the video were out of sync by as much as 180 ms, the audio and video were different genders, the audio and video are up to 90 degrees apart in location relative to the listener, the video is reduced to a set of light points corresponding to face locations

147
Q

response to integrated audiovisual information is particularly strong where?

A

superior temporal sulcus - below the primary speech processing regions

148
Q

true or false:
reflexive phonation occurs from 0-2 months and is coughing, sneezing, and crying

A

true

149
Q

what is an infants vocal tract like?

A

like a chimp’s, the tongue takes up much of the space in the mouth (more than adulthood) and the larynx is high enough there is no appreciable pharynx

150
Q

at what point in development does a human’s vocal tract develop to the adult form?

A

the first year

151
Q

true or false:
cooing occurs from 1-4 months and is quasivocalic sounds

A

true

152
Q

true or false:
expansion occurs from 3-8 months and is clear vowels, yells, screams, whispers, and raspberries

A

true

153
Q

true or false:
canonical babbling is rhythmically organized, meaningless sequences of speech sounds

A

false - it is strings of alternating consonants and vowels like “bababa” or “mamama”

154
Q

when does canonical babbling occur in child development?

A

5-10 months

155
Q

true or false:
early babbling sounds different depending on the language environment

A

false - sounds the same no matter than language

156
Q

true or false:
towards the beginning of the babbling process, adults can tell if the babbling is of their language or not

A

false - they can tell towards the end of the process - their production is gradually tuned to match the language environment

157
Q

which portion of babbling resembles the intonation and rhythm of the ambient language?

A

late babbling

158
Q

at what point in development is the typical onset of meaningful speech?

A

10 months

159
Q

true or false:
children’s production abilities are always considerably ahead of their perceptual abilities

A

false - their perception is always ahead of production

160
Q

true or false:
fetuses in utero have higher heart rates when listening to a recording of their mothers voice rather than any other person

A

true

161
Q

true or false:
the speech children produce is representative of what they know about their language

A

false - it is never fully representative

162
Q

true or false:
children can distinguish sounds they can not produce themselves

A

true

163
Q

Werker and Tees 1984
EXPERIMENT 1
- recordings of English speakers “da” and “ba”, Thompson Salish speakers “k’i” and “q’i” and Hindi speakers “ta” and “ʈa” - 6-month olds, English speakers, Thompson Salish speakers asked to identify whether they heard “k’i” or “q’i” (button or head turn) - criterion response is 8/10 correct

true or false:
infants from an English speaking environment were much better at distinguishing the sounds than English speaking adults and were almost as good as the Thompson Salish adults

A

true - infants > adults

164
Q

Werker and Tees 1984
EXPERIMENT 1
- recordings of English speakers “da” and “ba”, Thompson Salish speakers “k’i” and “q’i” and Hindi speakers “ta” and “ʈa” - 6-8 months, 8-10 months, 10-12 months head turned - criterion response is 8/10 correct

what was the results of this study?

A

the youngest group vastly outperformed the others - most in the oldest group couldn’t even reach the criterion

165
Q

Werker and Tees 1984
EXPERIMENT 1
- recordings of English speakers “da” and “ba”, Thompson Salish speakers “k’i” and “q’i” and Hindi speakers “ta” and “ʈa” - 6-8 months, 8-10 months, 10-12 months head turned - criterion response is 8/10 correct

what type of experiment is this?

A

cross-sectional study - different ages

166
Q

Werker and Tees 1984
EXPERIMENT 3
- recordings of English speakers “da” and “ba”, Thompson Salish speakers “k’i” and “q’i” and Hindi speakers “ta” and “ʈa” - the same group of children over time head turn tested - criterion response is 8/10 correct

at what point in their development were the children best at discriminating the sounds?

at what point did they lose all ability to do so?

A

6-8 months

a year

167
Q

as children focus on a particular language to master in their environment they lose something else - what is it?

A

the ability to distinguish sounds they aren’t exposed to regularly - they go from versatile generalists -> specializers

168
Q

Khul, Tsao and Liu 2003
- 32 infants aged average 9.3 months, all English environment no Mandarin - 16 were exposed to Mandarin, other 16 exposed to English - tested if they could differentiate between 2 Mandarin sounds

true or false:
Mandarin environment children scored higher differentiating than the English group exposed to Mandarin

A

false - they scored about the same

169
Q

Khul, Tsao and Liu 2003
- 32 infants aged average 9.3 months, all English environment no Mandarin - 16 were exposed to Mandarin interaction, other 16 exposed to English interaction - tested if they could differentiate between 2 Mandarin sounds

true or false:
the English kids who were exposed to Mandarin scored an average of 65.7% while the kids exposed to English scored 56.7%

A

true

170
Q

Khul, Tsao and Liu 2003
EXPERIMENT 2
- 32 infants aged average 9.3 months, all English environment no Mandarin - 16 were exposed to Mandarin video, other 16 exposed to English video - tested if they could differentiate between 2 Mandarin sounds

true or false:
the kids exposed to Mandarin did no gain any advantage in the test because they were not being interacted with - they were just listening to a video

A

true

171
Q

true or false:
phonetic learning is socially driven

A

true - interaction required

172
Q

Khul, Tsao and Liu 2003
- 32 infants aged average 9.3 months, all English environment no Mandarin - 16 were exposed to Mandarin, other 16 exposed to English - tested if they could differentiate between 2 Mandarin sounds

what was the conclusion of this study?

A

even limited exposure to another language can improve discrimination of sounds in that language

173
Q

true or false:
liquids are mastered late

A

true

174
Q

true or false:
trills are mastered early

A

false - late

175
Q

true or false:
fricatives are mastered early

A

false - late

176
Q

true or false:
stops (nasal and oral) are mastered early

A

true

177
Q

true or false:
vowels are mastered early

A

true

178
Q

true or false:
alternating consonant-vowel (CV) sequences are mastered early

A

true

179
Q

true or false:
when children can’t pronounce words while learning to speak they just ignore/skip over the word

A

false - they systematically replace sounds they can’t produce with ones they can (patterns of replacement)

180
Q

what are some of the common replacement patterns children use? (5)

A
  • replacing non-stops with stops (John -> don)
  • cluster simplification (spoon -> boon)
  • replacing consonants so they are harmonious in place of articulation (sock-> gock)
  • changing voicing patterns (initial stop always voiced, final always voiceless)
  • final consonants are deleted
181
Q

at what point in development is there usually an explosion of vocabulary for children?

A

18 months

182
Q

true or false:
children take a while to speak because they are held back by their inability to distinguish adult sounds

A

false

183
Q

true or false:
children learn to speak late because the relevant muscles of their vocal tracts are not yet strong enough (will be after about a year)

A

true - their challenge is in coordination and control of speech gestures

184
Q

why are stops easy for children to master?

A

they require little motor control, the tongue or lip just has to move to touch an opposing surface

185
Q

why are vowels easy for children to master?

A

they require little motor control, and each vowel has a relatively wide range of acceptable vocal tract shapes

186
Q

why are consonant-vowel alternations easy for children to master?

A

they are just a repeated sequence of opening and closing gestures

187
Q

why are fricatives and approximants harder for children to master?

A

the tongue or lip has to be very precisely positioned to form a passageway narrow enough for turbulence but not too narrow

188
Q

why are the sounds l and r particularly hard for children to master?

A

they require precision AND require different parts of the tongue to make separate closures - at first children only use the tongue as one mass

189
Q

true or false:
the sounds of one’s language aren’t actually simpler to produce or distinguish, the native speaker is just more used to them

A

true

190
Q

true or false:
it wasn’t until the 1980’s that psychologists and linguists started doing systematic acoustic studies of early speech

A

false - the 1970’s`

191
Q

true or false:
new speech skills are mastered by kids instantaneously

A

false - it was believed to be the case because the gradual learning of children is too small for adults to hear

192
Q

Macken and Barton 1980
- 4 children followed over 8 months starting at 1.5 years old meeting every 2 weeks and recorded them playing and answering questions - word-initial obstruent stops (b, d, g, p, t, k) were extracted and their VOT was measured

what are the expected VOT results for adults to produce?

A

word-initial voiced stops have a small positive VOT, voiceless stops are aspirated with a large positive VOT

193
Q

what is VOT?

A

the time interval from the release of a stop to the onset of voicing

194
Q

Macken and Barton 1980
- 4 children followed over 8 months starting at 1.5 years old meeting every 2 weeks and recorded them playing and answering questions - word-initial obstruent stops (b, d, g, p, t, k) were extracted and their VOT was measured

true or false:
in early sessions kids generally had negative VOT for both voiced and voiceless stops

A

false - they had small positive VOT for both voiceless and voiced - it was a voiceless unaspirated stop often misheard by transcribers as voiced

195
Q

Macken and Barton 1980
- 4 children followed over 8 months starting at 1.5 years old meeting every 2 weeks and recorded them playing and answering questions - word-initial obstruent stops (b, d, g, p, t, k) were extracted and their VOT was measured

true or false:
over the course of the study, the difference in VOT between voiced and voiceless grew, mainly through an increase in the VOT of the voiced class

A

false - an increase in VOT of the voiceless class

196
Q

Macken and Barton 1980
- 4 children followed over 8 months starting at 1.5 years old meeting every 2 weeks and recorded them playing and answering questions - word-initial obstruent stops (b, d, g, p, t, k) were extracted and their VOT was measured

why were the changes in kids VOT previously seen as instantaneous?

A

the difference was too small for adults to hear

197
Q

Macken and Barton 1980
- 4 children followed over 8 months starting at 1.5 years old meeting every 2 weeks and recorded them playing and answering questions - word-initial obstruent stops (b, d, g, p, t, k) were extracted and their VOT was measured

at what point could the adults tell the difference between the children’s productions?

A

when their VOT met up with the adults average VOT for that sound

198
Q

Macken and Barton 1980
- 4 children followed over 8 months starting at 1.5 years old meeting every 2 weeks and recorded them playing and answering questions - word-initial obstruent stops (b, d, g, p, t, k) were extracted and their VOT was measured

true or false:
this was one of the first studies to show a the gradualness of acquiring phonetic mastery

A

true

199
Q

Macken and Barton 1980
- 4 children followed over 8 months starting at 1.5 years old meeting every 2 weeks and recorded them playing and answering questions - word-initial obstruent stops (b, d, g, p, t, k) were extracted and their VOT was measured

at what age did they find that the difference in voiced and voiceless stops is acquired?

A

around 2 years

200
Q

by what age have children generally mastered almost all the sounds of their language?

A

5 years old

201
Q

true or false:
children acquire language with instruction

A

false - no instruction

202
Q

true or false:
the earlier one is exposed to a language, the more likely it is they will attain a native level knowledge of it

A

true

203
Q

what makes up a foreign accent?

A

a pattern of pronunciation of a language by someone who applies the habits of their L1 to the speaking of L2

204
Q

what is an early bilingual?

A

someone who learned both languages early enough to be a native speaker of both

205
Q

what is a late bilingual?

A

someone who acquired more than one language late enough not to be a native speaker of it/them

206
Q

by what age can bilinguals be exposed to an L2 and speak it without a detectable foreign accent?

A

age 6

207
Q

true or false:
with first exposure to a language after 12, the speaker will generally have a foreign accent in the L2

A

false - age 13

208
Q

true or false:
learners can lessen their L1 habits with L2 even when regularly being exposed to it

A

false - they are less likely to shake those habits if they have regular exposure to L1

209
Q

what is being referenced when saying the “age of first exposure” to a language?

A

when the person moved to where the L2 is spoken, NOT when they started classes

210
Q

true or false:
after age 6 it is no possible to become fluent in another language

A

false - it is likely you will have an accent but fluency is possible with practice

211
Q

what is code-switching?

A

when a bilingual switches from one language to another under the control of the speaker

212
Q

when a fluent bilingual code-switches, the languages become intertwined, rules and patterns often shifting to the other language

A

false - the languages are autonomous and separable

213
Q

what is cross-linguistic priming?

A

exposure to an item in one language facilitates processing of a related item in another language

214
Q

Kim et al. 1997
- two groups of speakers: early and late bilinguals both asked to describe to themselves silently typical events in their lives in L1 and L2 with fMRI monitoring the brain activity

true or false:
there was greater overlap in the areas of activity in early bilinguals than late

A

true

215
Q

Kim et al. 1997
- two groups of speakers: early and late bilinguals both asked to describe to themselves silently typical events in their lives in L1 and L2 with fMRI monitoring the brain activity

what are the results found in reference to the prime areas of speech processing?

A

those areas are occupied early in life and are not available for learning languages later, late L2 acquirers have to use brain areas away from the L1 centers (on the scan early bil. had two colors overlapping greatly, late bil. had colors completely separate and next to each other

216
Q

what is the transfer effect?

A

when the deeply entrenched set of automatic habits for L1 are applied onto F2

217
Q

true or false:
unfamiliar sounds in L2 are replaced with familiar sounds of L1

A

true - systematically replaced

218
Q

most dialects in Spanish don’t have a distinction between what?

A

tense and lax vowels

219
Q

when a Spanish speaker is speaking English, one replacement strategy might be to switch which vowels for which?

A

the lax vowels of English with the tense vowels of Spanish

220
Q

Spanish speakers of English are likely to replace English diphthongs with what?

A

their closest equivilent monophthongs [e] and [o]

221
Q

the most common English vowel [ə] is often replaced by Spanish speakers with what?

A

[a] - only Spanish central vowel

222
Q

in Spanish voiceless plosives p, t, and k are what?

A

unaspirated in all positions

223
Q

when a Spanish speaker speaks English they will generally replace the voiceless plosives at the syllable-initial position with what?

A

unaspirated plosives

224
Q

Spanish speakers tend to replace the English r with what?

A

[ɾ] the alveolar tap

225
Q

Spanish speakers tend to replace English voiced fricatives with what?

A

voiceless Spanish ones

226
Q

true or false:
Spanish speakers often do consonant deletion or vowel insertion in English words with three consonants at the onset

A

true - Spanish can only have at most 2 consonants there

227
Q

true or false:
Spanish speakers often do consonant deletion when speaking English words that have 3 consonants in the final position

A

true - Spanish only allows for at most 2 consonants there

228
Q

true or false:
everyone starts replacing L2 sounds with L1 equivalents

A

true

229
Q

why do different speakers of an L2 have different accents?

A

they vary in where they are in the learning process and at what age they began

230
Q

Fledge 1991
- compared VOT in word-initial /t/ among Spanish speakers, Spanish learners of English, and English monolinguals

true or false:
when speaking Spanish, Spanish speaking mono., early bilinguals, and late bilinguals all had VOT values in the same range

A

true

231
Q

Fledge 1991
- compared VOT in word-initial /t/ among Spanish speakers, Spanish learners of English, and English monolinguals

true or false:
late monolinguals (learned English after age 6) produced /t/ with a VOT in between that for Spanish and English

A

true

232
Q

Fledge 1991
- compared VOT in word-initial /t/ among Spanish speakers, Spanish learners of English, and English monolinguals

true or false:
when speaking English, early Spanish-English bilinguals had the same VOT values as English monolinguals

A

true

233
Q

if a German speaker were to speak English what is a replacement at the end of a word they would make due to their L1 being German?

A

German has no voiced final obstruents so they would replace the voiced final obstruents of English with their voiceless counterparts (ex: Bob -> Bop)

234
Q

German has no dental fricatives, so they replace English ones with what?

A

stops or affricates

235
Q

what is the main factor of a foreign accent?

A

the sounds in L2 that have no counterpart in L1 will tend to be replaced by the closest sound in L1

236
Q

besides replacing sounds with similar ones, L2 speakers often will replace what?

A

replace any sound that occurs in both L1 and L2 if it is occurring in a position in which it couldn’t occur in L1

237
Q

why is a foreign accent so persistent?

A

over our lifetimes we learn processes of production and perception that become automatic which allows us to speak and keep up quickly - mastering these skills becomes a liability when learning another language because of how automatic and engrained they are

238
Q

true or false:
consciously realizing that a similar sound in L1 and L2 is actually different can change the unconscious automatic process of producing and perceiving it

A

false - it does NOT change the unconscious process

239
Q

why do early bilinguals not have a foreign accent?

A

they are exposed to both languages early enough to build separate categories for each and have no trouble keeping them separate

240
Q

true or false:
the hardest sounds for L2 learners in the longrun are the new sounds unlike what they’ve heard before

A

false - the hardest are the “false friends” that are close to those in L1 but not the same

241
Q

why are the most similar sounds hardest to master?

A

because they are similar enough to ones we already know that we subconsciously believe it is ok to just replace them with English ones

242
Q

Flege and Hillenbrand 1984
- speakers of varying French knowledge produced French sentences including words “tous” and “tu” - [y] in “tu” has no English counterpart but [u] in “tous” is close - native French speakers had to identify which word their heard

true or false:
for the group with the lease French experience, their “tu” was much more easily identifiable than their “tous” by the French speakers meaning they pronounced it better

A

true

243
Q

Flege and Hillenbrand 1984
- speakers of varying French knowledge produced French sentences including words “tous” and “tu” - [y] in “tu” has no English counterpart but [u] in “tous” is close - native French speakers had to identify which word their heard

true or false:
there was no significant difference in the ability of the French speakers to identify the “tu” of the most experienced French speakers and the least experienced

A

true - they were equally good at pronouncing the newer/weirder sound

244
Q

Flege and Hillenbrand 1984
- speakers of varying French knowledge produced French sentences including words “tous” and “tu” - [y] in “tu” has no English counterpart but [u] in “tous” is close - native French speakers had to identify which word their heard

true or false:
the results were that the non-native speakers got closer in F2 values to the native French for the familiar [u] than for the new [y]

A

false - they were closer for the newer [y]

245
Q

what is speech technology?

A

any interface between humans and computers involving speech

246
Q

what is speech recognition?

A

automatic identification of spoken words

247
Q

what is speaker recognition?

A

automatic identification of the person who spoke

248
Q

what is speech synthesis?

A

the production of speech by machines

249
Q

why do companies want to use more speech recognition?

A

the more they can automate customer service and sales, the fewer the human employees it needs to pay

250
Q

true or false:
humans are more comfortable typing than speaking so they generally prefer a typing interface to a speech one

A

false - they prefer speaking to typing

251
Q

why does the government want to invest in speech recognition?

A

they want an automatic method for filtering recorded speech to locate particular references or voices

252
Q

how does speech recognition work?

A

digitized recordings (numerical version of a spectrogram) of speech samples are stored in memory, each is labeled to identify what is said, when a new word is said it is digitized too and compared to all the samples in the memory point by point, the program selects the soundfile in memory with the smallest summed difference from the new soundfile

253
Q

in a quantized spectrogram, what do the lighter vs. darker colors represent?

A

the lighter are lower amplitude, the darker are higher amplitude

254
Q

what is the challenge of alignment in speech recognition?

A

its hard to choose which point to use as the memory sample because even two productions of the same word by the same speaker won’t sound the same

255
Q

how is the challenge of alignment between two files solved?

A

expanding or contrasting the timescale to find the best match (time warping)

256
Q

what is the challenge of segmentation for speech recognition?

A

people don’t pause between words so it’s not clear what interval in the soundfile needs to match to the file in memory

257
Q

how is the challenge of segmentation solved in speech recognition?

A

they push for one word answers or they have to try different segmentations and check which is the best fit

258
Q

what is the problem of vocabulary size for speech recognition?

A

the larger the vocabulary the system has to keep in memory, the more words it has to search through, the longer it takes and will be more likely to have incorrect responses

259
Q

what is the challenge of variability in speech recognition?

A

any given word is pronounced differently by different speakers - early dictation programs were speaker-dependent, in order to be speaker-independent it must have a huge variety of speakers

260
Q

in order to be effective a speech recognition program needs to be what?

A

adaptive

261
Q

do successful speech recognition programs use top-down or bottom-up processing?

A

top-down processing

262
Q

true or false:
the challenges for speech tech are the same challenges face by human listeners

A

true - segmentation, variability due to speaker, variability due to context, and speech errors

263
Q

true or false:
a program can distinguish between two voices that it’s never heard before

A

false - it can’t

264
Q

how does speech synthesis work?

A

from a digital recording, sound is converted into a series of numbers representing amplitude at each instant in each freq. band

265
Q

why do speech syntheses not sound like humans?

A

they typically get the intonation wrong, and do not accurately mimic the effects of coarticulation