lecture 3 - speech and language Flashcards
language
System of visual/vocal symbols that have meaning to user and recipient
use of language
- Communication is one of most important human behaviours - language has evolved through social contacts among our early ancestors
- Speaking and writing are social behaviours - we learn them from other people and use them to communicate with them
- An effective language system also abides by certain rules
- Harley 2012 - language can be characterised as a system of visual and/or vocal symbols which have meaning to the user and the recipient
- Around 6,000 distinct languages in the world - worlds largest language is Chinese and most popular foreign language is English.
Language allow us to consider complex and abstract issues by encoding them in words and then manipulating the words according to specific rules. These rules are the subject of an area of study called linguistics
linguistics
Study of the rules of language
psycholinguistics
Study of the role of cognition in language acquisition, production and comprehension including how verbal behaviour develops
How we turn ideas into air and back again
- Study of linguistics involves determining the ‘rules’ of language and the nature and meaning of written and spoken language whereas Psycholinguistics the study of verbal behaviour examines the role of human cognition in language acquisition and comprehension - the interpretation of psych and linguistics - Psycholinguists are interested in how we acquire language, how verbal behaviour develops and how we learn to speak from our interactions with others. They are interested in the interaction between the structure and processing of langauge - Psycholinguistics = relatively recent but studied since early experimental days eg by Wundt = father of linguistics argued sentence was the most basic element of speech production and comprehension. Speech production involved the transformation of thought process into sequences of speech segments, comprehension was the reverse process. The linguist Hermann Paul argued words, not sentences were the buildings blocks of speech. - 1920s and 30s Wundt’s form of psych was challenged by behaviourism that argued psych should concern itself only with observable behaviour. 1950s psych had renewed interest in the nature of language spurred by linguist Chomsky
perception of speech
- Speech is the production of a series of sounds in a continuous stream, punctuated by pauses and modulated by stress and changes in pitch.
- Speech is a more flexible means of communication than writing as the sentences we utter are a string of sounds, some of which are emphasised and some are quickly glided over.
- We can raise pitch of our voice when uttering some words and lower it when speaking others.
- We maintain a regular rhythmic pattern of stress.
- We pause at appropriate times eg between phrases but we do not pause after pronouncing each word
Speech doesn’t come to us as a series of individual words we must extract the words from a stream of speech.
recognition of speech sounds
- Human auditory system performs complex task of enabling us to recognise speech sounds
- Sound system of speech is phonology
- These sounds vary according to the sounds that precede and follow them, the speakers accent and the stress placed on the syllables in which they occur
- Phonemes are the elements of speech - the smallest units of sound that contribute to the meaning of a word eg pin has three phonemes p I and n
- Phonemes are not the same as letters eg ship has 4 letters but three phonemes - sh I and p
- In linguistics phonemes are indicated by two forward slashes to indicate they are phonemes not letters
Phonemes is the first step in recognising speech sounds
communication model - turning ideas into air and back again
agent produces speech and recipient receives the speech and cognition does speech perception
Speech production and perception - making consonants and vowels
Speech is produced by a set of muscles in face, mouth and throat
diagram in notes
Changes in air, changes in meaning - Phonemic (or phonetic) contrasts
- Phoneme: smallest unit of speech sound (≠ letters)
- Pin: /p/ + /i/ + /n/
- Ship: /sh/ + /i/ + /p/
- Group of phonemes: smallest unit of speech that influences meaning
- Bet à Bit
- Dig à Gig
- Big à Pig
Not all changes in sound change meaning
making consonants (VPM)
voice - whether/when the vocal cords vibrate
Voice
“zip”/”sip”
“bat”/”pat”
“dip”/”tip”
place - where in the vocal tract the constriction takes place
Place
“pat”/”tat”/”cat”
“bot”/”dot”/”got”
manner - how the air moves out of the vocal tract/ what sort of constriction takes place
Manner
“nose”/doze” (nasal/stop)
“dip”/”zip” (stop/fricative)
NB in physical (acoustic) terms, these dimensions are typically continuous, not either/or
categorical perception of phonemes
- Pa’ and ‘Ba’ differ in Voice Onset Time (VOT)
- VOT refers to the delay between the start of a speech sound and the onset of the vibration of the vocal cords, i.e. when the lips open relative to when the vocal chords start vibrating
‘Pa’ VOT tends to be about 50 ms slower than ‘Ba
Pa’ and ‘Ba’ differ in Voice Onset Time
– when the lips open relative to when the vocal chords start vibrating
‘Pa’ VOT tends to be about 50 ms slower than ‘Ba’
So what do we perceive as we gradually change VOT?
huge variability in actual acoustic changes – due to differences in anatomy and context – perception groups things together into categories
hypothetically we would except a gradual shift from ‘ba’ to ‘pa’
adults should be able to discriminate between each VOT
the actual perception is categorical and there is an abrupt shift typically at about 20-25 ms
consequences of categorical perception
We’re good at perceiving changes across category boundaries
We’re bad at perceiving changes within category boundaries
production of speech
- Lister and a damson 1970 presented ptps with a series of computer generated sounds consisting of a puff of air followed by an ah. The sound varied only in one way, the amount of time between the puff and the ah. When we speak we make a puff for Pa but not for ba. However even though the computer always produced a puff ptps reported that they heard ba when the delay was short and Pa when it was long. Ptps discriminated between the phonemes /p/ and /b/ strictly according to the delay in voicing. The experiment demonstrates that the auditory system is capable of detecting very subtle differences.
- The fundamental unit of speech, logically and descriptively is the phoneme research suggests that psychologically the fundamental unit is larger eg the two syllables do and Dee each consist of two phonemes. When spoken the same phoneme /d/ is heard at the beginning. However when Lieberman et al 1967 analysed the sounds of the syllables they found at the beginning phonemes were not the same. In fact they could not cut out a section of a tape recording of the two syllables that, would sound like /d/.
- The results suggest that the fundamental unit of speech consists of groups of phonemes such as syllables.
The perception of a phoneme is affected by the sounds that follow it (Ganong 1980). Using a computer to synthesise a novel sound that fell between those of the phonemes /g/ and /k/, Ganong reported that when the sound was followed by ift the ptps heard the word gift but when followed by iss they heard kiss. These results sugggest that we recognise speech sounds in pieces larger than individual phonemes.
making vowels
Height
Vertical position of tongue in the mouth
Backness
How far back in the mouth the tongue is.
/i/ (“ee”) – front
/u/ (“oo”) – back
Roundedness
Shape of the lips.
Correlated with tongue position in many languages
formants
=distinctive frequency components/peaks of the acoustic signal we need to distinguish vowels
F0 – fundamental frequency – frequency of vibration of the vocal chords
= peaks of the acoustic signal
F0 – fundamental frequency – frequency of vibration of the vocal chords
First two formants F1 and F2 are usually sufficient to identify a vowel; each vowel has a unique F1 and F2 difference
speech perception
The role of formants (peaks of the acoustic signal) in perception
What else contributes to how we perceive speech?
Is it just the sound?
the McGurk Effect
McGurk effect is an example how one sensory modality (vision) can influence another (hearing)
Speech perception is a multi-modal process; not only based on auditory but also visual cues
you hear things that aren’t there
Word recognition -the importance of context
Computer generated novel sounds that fell between /g/ and /k/ (Ganong, 1980)
When followed by “ift” -> “gift”
When followed by “iss” -> “kiss”
Speech is full of hesitations, sloppy word productions etc.
47% recognition of isolated words vs 100% recognition within context of original conversation (Pollack & Pickett, 1964)
Brief recap - ideas -> air -> ideas
- Producing speech involves movement
- fine control of the vocal tract to shape the sound wave to convey particular meaning
- Perceiving speech involves ‘reconstructing’ the meaning from the sound wave
- range of sources
○ key clues (but they’re only ‘clues’) in the wave
○ knowledge of language
○ Contextual cues
multimodal sources of information including visual cues
- range of sources
procedures to investigate speech perception in infants
1) High amplitude sucking - rate of sucking increases when new sound is detected, then slows down again if sound is repeated
2) Head turn preference - If infants turn their head and listen for longer (or shorter) to one type of stimulus compared to another, then they must be able to perceptually distinguish them (requires being able to hold head up)
3) Preferential looking - If infants look for longer (or shorter) to one type of stimulus compared to another whilst hearing the names of the stimulus (eg Dad), then they must recognise the name
Language learning
- Using the high amplitude sucking procedure, infants as young as 1 month old could tell the difference between consonants ‘ba’ and ‘pa’ (Eimas et al., 1971).
- However, phonemes vary between different languages. E.g. ‘ba’ and ‘pa’ are common to English but are less common in other languages.
So is this ability to distinguish between phonemes language specific?
language learning - categorical perception
Categorical perception
Phonemic categories are language specific
English versus Hindi
/d/ (alveolar) - /D/ (retroflex)
/t/ (alveolar) - /T/ (retroflex)
English versus Japanese
/l/ (alveolar) - /r/ (retroflex)
Infants are sensitive to all categorical boundaries in the first 6 months
By 12 months, they becoming sensitive only to their native categories