Speech Perception & Sentence Comprehension Flashcards
Which of these is incorrect? There can be more than one answer.
Prosody is a general term that refers to the aspects of an utterance not specific to words themselves. It does several things:
A) Distinguishes words
C) Determines the semantic meaning of a word
B) Signals emotional content
D) Allows for sentence parting
Incorrect:
C) Determines the semantic meaning of a word
D) Allows for sentence parting
——— is the study of speech sounds. It studies place and manner of articulation as well as voicing.
Phonetics
What are some challenges our brain has to overcome when it comes to spoken word recognition?
There are quite a few challenges it must overcome, the most notable is that there is no 1:1 relationship between a certain waveform and a phoneme. Additionally, there is great speaker variability (gender, dialect, age, …) and there is also the cocktail party effect - how do we distinguish background noise, speech, … from one another?
Researchers often used sound spectrograms in audio signal processing. Series of dark bands on a sound spectogram are called ———-. The first one signals the lowest frequency. The darker it is, the more ——– it is. It also varies by speaker.
Researchers often used sound spectrograms in audio signal processing. Series of dark bands on a sound spectogram are called formants. The first one signals the lowest frequency. The darker it is, the more intense it is. It also varies by speaker.
Choose one or more correct answers.
Which aspects of formants are important for word recognition?
A) Frequency
B) Steady state
C) Transition
D) Reverberation
B) Steady state
C) Transition
Transition: movement of formants. They occur either at the beginning or the end of a syllable.
Steady state: the period between the transition
→ Transitions correspond to the consonantal portion of the syllable and the steady state to the vowel
Seek → see the difference between the duration of ‘ee’ outside of a context and how within a context like ‘seek’, the duration of ‘ee’ is much shorter. ‘Ee’ doesn’t show the same steady state (flat) formant patterns
Which of these is not correct?
There are three levels of word recognition processing:
A) Phonetic
B) Semantic
C) Phonological
D) Auditory
B) Semantic
Fill in the blanks.
There are two mechanisms for perceiving sound. The ————- perception is found most strongly with consonants, while the —————— perception is more commonly used for vowels. This is because with consonants, ——–.
There are two mechanisms for perceiving sound. The categorical perception is found most strongly with consonants, while the gradual perception is more commonly used for vowels. This is because with consonants listeners are unable to perceive within category differences while with vowels, the change is more gradual
Fill in.
The ————- is the point at which voal cords begin vibrating.
voice onset time (VOT)
What is the advantage of having a categorical perception of speech sounds?
It allows us to ignore irrelevant information and be better at speaker variation (distinguishing between speakers).
We can manipulate the voice onset time in categorization tasks. Researchers often do this with /ta/ and /da/. For example, they play /da/ and gradually move to /ta/ and the participants have to decide what sound they are hearing. For about 30ms, there is an uncertainty point. What is this uncertainty point? What does it show?
Experimenters manipulate the voice onset time, which is when the vocal cords vibrate. Vocal cords vibrate with voiced sounds, like /da/ but not with voiceless like /ta/. This uncertainty point is when participants have about a 50/50 response as to wheter they are hearing da or ta, but otherwise they are very accurate at deciding if they’re hearing ta or da. This is because the way we recognise consonants is categorical and this uncertainity point signifies the shift from one category to another.
What is the lack of invariants problem?
The lack of invariants problem is that there is no 1:1 mapping between phonemes of language and their acoustic instantations. A certain phoneme may have a number of different wave forms but our brains are really good at distinghuishing one phoneme from another regardless of this.
Due to the lack of invariants problem, two theories cropped up to try and solve it. Name the theories, who proposed them and compare them - how do they compare, what are some strong points, what are some problems, …?
The theories are the auditory theory by Stevens and the motor theory by Liberman. Stevens states that the invariance is normalized by our systems in the auditory step. Liberman’s theory, on the other hand, states that there is no variability in the way we move our mouth and how we use our vocal tract. As such, listening is actually a process of subvocalisation (the listener silently articulates the sound s/he is hearing). The link between articulation and perception is much more direct than the link between the acoustic structure and perception. Steven’s theory is a more general model, which can be applied to all sensory modalities and focuses on perceptual normalization while Liberman’s is more language specific. Additionally, only Liberman’s motor theory of speech adresses the McGurk effect (listeners see ga, hear ba, report da) as it is the only theory in which our motor movement is relevant.
While Liberman’s theory circumvents the problem of finding acoustic invariants for sounds, minimizes the mechanisms used in production and perception and explains the McGurk effect, it has some problems. The first is allophonic variations (bad at discriminating allophones but good at producing distinguished ones) and also, infants cannot produce much but can perceive a lot of contrasts. Steven’s theory also has some problems because it does not adequately adress acoustic variability because it is more than just a power law transformation of intensity.
Fill in.
Syntactic parsing obeys the —————– principle, which states that me make decisions immediately as we encounter each word.
Syntactic parsing obeys the immediacy principle, which states that me make decisions immediately as we encounter each word.
Answer and explain.
Linguistic elements have two syntactic levels. What are the levels?
Linguistic elements have:
* a deep level (syntax)
* surface level (what we see/hear)
Describe incremental structure building
We process sentences bit by bit and during that, we build a structure. The first two things we hear form a constituent - the background against which ou evaluate new information. We have a constituent, ‘the actor’, then you get a new one, the first gets destroyed and a new, bigger one is formed. Because whenever we encounter things in language, we have to understand the thematic roles. When we speak, language comprehension is exactly this, ‘who does what to who’. This is repeated until the sentence is spoken.