Speech Perception Flashcards

1
Q

stages of speech processing

A

speech perception+speech comprehension

decode->segment->recognise->integrate

decode:
auditory input->
select speech from acoustic background+transform to abstract representations

segment:
word recognition
-activation of lexical candidates
-competition
-retrieval of lexical information

recognition:
utterance intepretation
-syntactic analysis
-thematic processing

integration:
into discourse model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

phonetics

A

physical properties: loudness, duration, pitch

speech signal is distributed over time: rapidly changing, fast-fading

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

the speech organs

A

air flows from lungs, through vocal tract, out of mouth, nose

pressure fluctuations modulated by shape, constriction of vocal tract->sound waves

sources of sound:
-larynx: regular, periodic vibration of vocal folds
-> characteristics pitch
constriction of vocal tract by lips, palate, tongue
->phonemes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

phonetics of vowels

A

mouth as a vibrating chamber

vowels: depend on position of tongue, especially up/down, front/back
- >changes shape of resonating chamber

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

phonetics of consonants

A

created by constricting the vocal tract in different ways

phoneme perceived depends on:

  • place of articulation
  • manner of articulation
  • voicing (vibration of voice box)]
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

speech spectrogram

A

loudness of all frequencies over time (frequency x time)

third dimension: formants

  • prominent resonances
  • specific frequencies amplified by the shape of mouth
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

formant transitions

A
  • formant transitions over time due to constrictions of vocal tract produces different consonant phonemes
  • change in relationship between 2nd and 3rd formant due to place of articulation for stop consonants (e.g. b, d, g)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

challenges in speech perception

A
  1. segmentation problem

2. variability problem

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

problem 1: speech is not segmented in separate words

A

written words have spaces between them to show boundaries

listening to native language in clear conditions we identify words despite ambiguity, but difficulty revealed when:

  • listening to foreign language
  • misunderstood song lyrics “mondegreens”

segmentation is a major problem for understanding unfamiliar language and for automatic speech recognition systems
-> how do listeners overcome the ambiguity of the continuous speech stream in familiar language?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

“mondegreens”: why are misperceptions particular common for song lyrics?

A
  • two signals: music, words
  • rhythm of music changes stress, durations
  • tune of music changes intonation, durations
  • articulation may be imprecise
  • pragmatics/semantics of lyrics/poems can be unexpected
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

problem 2: speech is highly variable

A

Substantial variability in pronunciation of phonemes, syllables, words both between and within speakers

between speakers:
-gender, accent, language, age all affect acoustic properties of speech

within speakers:

1) linguistic:
- coarticulation: the articulation of the same phoneme can sound different in different words e.g. s in soon vs seen because we’re moving the mouth towards different vowels
e. g. leaf and feel reversed are not each other

2) non-linguistic:
- physical state, emotions

3) paralinguistic:
- speech rate: durations, precision of phonemes differs
- clarity: special effort to reach articulatory targets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

effects of audience

A

adult-directed, infant-directed, pet-directed speech

relationship between 1st and 2nd formants -> defines vowels

vowels much more differentiated in speech to infants (hyperarticulation)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

how do people solve the problems

A

acoustic information (bottom-up):

  • categorical perception
  • prosody
  • lexical stress

information in long-term memory (top-down):
-context effects: phonotactic, lexical, sentence

multi-modal information:
-lip movements

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

bottom-up processing: categorical perception

A
Listening to speech, we hear distinct
phonemic categories even though the
acoustic changes are gradual
i.e. the speech signal is perceptually
categorised into phonemes despite
variability of the actual acoustic signal,
especially for consonants

This efficient, automatic categorisation into
phonemes reduces sensitivity to ambiguity
caused by variability

Continuous changes to the 2nd and 3rd formant of synthetic speech yields ‘categorical’ changes in the perception from /ba/ to /da/ to /ga/

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

prosody (melody, up and downs) and stress

A

The melody of spoken language

Intonation contour: patterns of rising/falling pitch that help to chunk speech into meaningful units (eg phrases, clauses), convey aspects of meaning
e.g. rising pitch for questions

Rhythm of speech influenced by the
pattern of prominent vs not prominent
syllables (stress)
– In some languages stress is very regular (eg Spanish, Italian) ➔ strong cue for word
segmentation to word boundaries
– More variable in English – typically first
syllable but correlated with grammatical class
e.g. contract: CONtract-noun, conTRACT-verb

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

context effects (top-down)

A

Phonotactic rules (permissible sound sequences)

i. e. certain sounds can’t occur together
- Combinations of sounds that do not occur within words help segmentation
e. g. [stw] must/win, [tnr] hit ‘n run-even for infants

Lexical knowledge

  • Ganong effect: the same ambiguous phoneme is heard differently in different word contexts (e.g. task vs dash but actually in between t/d)
  • Interpretation of the same ambiguous sound depends on whether it makes a word eg /k/ in ?iss in but /g/ in ?ift

Sentence context
-Phonemic restoration effect (warren&warren, 1970)
e.g. listeners hear “wheel”, heel”, “meal”, not cough, could not correctly locate cough, effect also found with tones/buzzes
BUT effect disappears if sound replaced by silence

17
Q

Multi-modal effects

A

Visual information influences perception of acoustic information
– The McGurk effect: visual information changes how we perceive acoustic information

18
Q

hierarchy of cues

A
top-down: lexical
sentential context (pragmatics, syntax, semantics) ->
lexical knowledge

bottom-up: sub-lexical
phontactics (acoustic-phonetics) ->segmental
word stress ->metrical prosody

people prefer to use top-down lexical and semantic information to resolve ambiguities, but make use of lower level phonetic and stress cues in difficult situations

19
Q

summary

A

Listeners need to solve:
– Segmentation problem
– Variability problem

The sounds that make up spoken language are often
ambiguous and vary between contexts and speakers

We identify the most likely phoneme/word using a
combination of acoustic (bottom-up) information AND contextual (top-down) information

Top-down lexical information is used to resolve ambiguities in optimal conditions, but phonological and prosodic information is used when interpreting degraded or ambiguous acoustic information