Language Flashcards
What is a phoneme?
The smallest unit of sound in a language that is capable of conveying a distinction in meaning. e.g. bat vs pat
Consonants
Defined by features that affect how the air is blocked and how that interacts with the sound.
- Voicing: when do vocal cords begin to vibrate?
- Place: where is the obstruction being made?
- How is air passed through?
Voicing
Voiced/voiceless phonemes differ by voiced onset time (VOT) the time it takes for your vocal cords to start vibrating
Vowels
Defined by features that affect the shape of the mouth and where the vowel is pronounced
- Tongue position: how big is the space in the mouth?
- Lip posture: where in the mouth are the vowels pronounced?
Langauge differences
Different languages have different phonemes - english has about 40, some polynesian languages have as few as 11
Categorical perception
Consonant phoneme boundaries are perceived categorically. As a result of this categorical perception, native speakers of one language can’t hear the phonemes of other languages very well (unless they happen to be the same)
How are phonemes learned? Developmental trajectory
0-4 months
- infants can discriminate phonetic contrasts in all languages
- Around 3 months, they start to babble (mainly vowels at this point)
4-8 months
- Infants begin to learn the speech sounds of their language
- babbling becomes more speech-like (‘da-da’)
8-12 months
- lose the ability to perceive sound distinctions that are not in their language
- babble sounds appropriate to their language
Phoneme learning is just category learning!
We can therefore test whether we use the same mechanisms and representations to learn phoneme categories as any other kind of category.
is there reason to believe that the distribution of sounds in the input affects what phonemes people learn?
lecture1
Empirical evidence: training study
lecture1
How does phoneme learning link to other aspects of language learning?
- Training studies: teaching adults new phonemes can help them learn words with those phonemes
Results: training adults on a new phoneme contrast improved their ability to learn words with that phoneme contrast - Longitudinal experiments: Infants who are better at hearing phonemes have better vocabularies later on.
6 months: infants trained to hear new phonetic contrast. Measure: trials to criterion=speed of habituation (faster=better learning)
13 months: parents filled out vocabulary list
CDI = communicative development inventory (measure of vocab size)
RESULTS: phoneme learning at 6 months predicts later vocabulary size
The problem of word segmentation
Spaces between words cant be heard
Possible solutions
- Language-specific phonotactic constraints: restrictions on which sequences of sounds are permissible in that language (Which is more likely to be a word in English? Thipe? Ndimi?)
- Language specific prosodic constraints: which stress patterns are common (How should i pronounce this word?)
Problems with phototactics and prosody alone?
- only get you so far - the constraints arent sufficient enough to come up with a very good full segmentation
- for both, you need to something about what things are words before you can use them
Another idea: Transition probabilities
Probably used in conjunction with these other cues
- this is a type of statistical or distributional learning (i.e. it is based on observing the statistical distribution of things)
Probably a word - probably not a word
An empirical test
Can people segment words in an artificial language simply on the basis of transition probabilities?
- test by seeing if they recognise the difference between partial words and non-words
- habituate infants to a long stream of this speech. After they get bored, play either a speech stream containing partial words like ‘daku’ or non words like ‘kupa’
- Infants listened longer (indicating surprise) to the non-words than the partial words
One problem: the partial words and non-words differed not just in transitional probabilities, but also in frequency.
Idea for empirical test
Match for frequency by making a language with words of varying frequency
Results - infants listened longer (indicating surprise) to partial words than the words
Transitional probabilities seem fairly effective! Leads to several questions, though…
- how well does this scale to real language? (i.e. how much of the word segmentation problem do TPs solve?
- what kinds of things can people do this sort of statistical learning over?
How well does this scale? 2 ways to answer this
- experiment with people - use languages that are less artificial
- computational model - take a program that can calculate TPs, give it a corpus of typical data and see how many words it segments correctly
What does TP learning apply to?
- Can we learn them if its not about language?
2. Can we learn them if its not over adjacent units?
lecture 2
lecture 2
lecture2
lecture2
TP summary
TPs are probably very useful for word segmentation
- people (even infants) can track them
- Scales to real language moderately well
- computational algorithms using them segment okay
- not a language-specific skill
How does word segmentation link to other aspects of language learning?
- learning words
- learning lexical categories (e.g. nouns, verbs etc)
- improved efficiency/speed of processing
word segmentation –> word learning
it is easier for people to learn words that they have previously learned how to segment from rapid speech
Why is word learning difficult?
Many possible meanings for every single word. e.g. ‘quidditch’ - weird armour stuff, sitting on a broom, the game he’s playing, being a wizard, suspended in air..
Children do quite well at word learning
The first words generally come in between 8-14 months but there is tremendous individual variability
After the first words there is generally a vocabulary spurt, characterised by faster learning.
This can occur between 14-24 months and depends on child
This continues to accelerate throughout childhood and slowdown in adulthood
Age of acquisition effects
Interestingly, the age at which a word was learned has effects later in life. impacts on: - Speed of lexical retrieval - Lexical decision tasks - word familiarity
Learning words quickly
The fast rate of word learning in later life is probably in part due to people’s capacities for fast mapping: learning the correct referent of a word after only one or two labellings
How do people learn words?
A bias of some sort is necessary to solve the word-learning problem.
Possible biases:
1. Labels are ‘special’ to people (especially infants) in some way
2. it’s not that labels are special; rather people can learn over time about how labels are used
Are labels special to infants?
Even if infants do appear to think they are special, why?
- Auditory (or verbal) input is more interesting/easier to encode or remember
- They know that labels are for communication, and that is special.
Do infants/children have a preference for auditory input?
Experiment: prediction task in which the visual and auditory cues predict different things. Which do people use?
Which side do people think the puppet comes out of? (visual predicts right, auditory predicts left)
Results - children prefer to make predictions on the basis of the auditory cue, at least if the visual cue is complex
Results: problems
- these auditory cues were not language
- doesn’t explain previous experiment
- might be simple stimulus complexity, not visual/auditory (but the, maybe that describes the real world)
- Preference in a prediction task doesn’t necessarily map onto word learning
therefore, we can’t really draw conclusions about whether labels are special because they are auditory
Are labels special to infants: 2. They know that labels are for communication, and that is special
How would the fact that labels are for communication matter?
- May be a lot more interested in it; it really matters, even (or especially) in a child’s world
- May guide what kinds of inferences about categories make sense (e.g. knowing people want to be clear and informative)
Infants use social cues
Infants only learn labels if the speaker is looking at the object
Children do not learn labels if the speaker doesn’t seem to know what’s going on
15 month olds only learn certain labels if presented orally, rather than with a voice recorder
Mutual exclusivity
e.g. Which one of these is a greebo?
Some have suggested that this indicates that children have an innate bias to assume that each object has only one label.
Problem: many objects have multiple labels- e.g. cat, kitten, pet, animal
Another idea: mutual exclusivity arise out of the logic of conversation - ‘i know the person knows the word for shoe, so why would they call it a greebo? Therefore a greebo must be the other thing’.
Bias possibility 2: its not that labels are special; rather people can learn over time about how labels are used. eg. the shape bias
The Shape Bias: The assumption that labels for objects classify by shape rather than colour or texture.
May be learned based on statistical associations between words and features of the categories they pick out.
Another example of statistical learning
Links between word learning and other aspects of language
- word learning and sentence processing
2. word learning and grammar.
Sentence processing in infants
“where’s the baby?”
if they were looking at the dog, time taken to look at the baby is an indication of their linguistic processing speed
Higher vocabulary is associated with faster sentence processing
Grammar in young children
Higher vocabulary is associated with more grammatical complexity
Psychological difference between open and closed class
- they are disrupted differently in different kinds of aphasia (brain damage)
- children’s first words are almost always mostly open class (mummy, doggie, cookie, want, up)
- closed class words are the ones that second-language learners have the most trouble with
Parts of speech are associated with different rolls in a sentence
Subject - generally denotes the actor or the agent of the action
verb - the action
object - generally denotes the non-actor things involved in the action (often called patients)
these roles, e.g. subject, object, are often called arguments
Many languages have a default word order
English: primary word order is Subject-Verb-Object (SVO) ‘I eat broccoli’
Secondary word order is Object-Verb-Subject (OVS) ‘my homework was eaten by the dog’
Verbs govern the grammar
Most linguists agree that, regardless of the word order, verbs in every language are the ‘heads’ of the sentence. - they are in charge! they determine what the arguments are. The verb requires both a subject an object
Bob likes money. The subject requires someone (bob) to do the liking. Object: like requires something that is liked (money)
The problem of overgeneralisation
A.k.a. the problem of no negative evidence/the logical problem of language acquisition
- different verbs take different arguments; how are these learned?
idea 1: mimicry
The problem with mimicry
It doesn’t seem to explain people’s behaviour.
They dont just generalise arbitrarily though.
Verb argument patterns
there are patterns which kinds of arguments a verb can take
how do people learn the exceptions?
it cant simply be that people just don’t say things they haven’t heard… this is the problem of overgeneralisation
How do you figure out which words you haven’t heard are allowable?
One way to do this is to receive negative evidence: information about what isn’t allowed
Negative evidence
Do children receive negative evidence when they are language learning?
- no, kids seem to not receive (or be attentive to, when they do receive) much negative evidence.
- adults tend to only correct the truth of a child’s utterance not the syntax. ‘mama isn’t boy, he a girl.’ ‘thats right’
- when adults (rarely) try to correct a child’s syntax, the kid doesnt get it.
Maybe they are sensitive to more subtle kinds of negative evidence?
For instance, maybe there are statistically different rates of ‘rephrasing’ when they said something ungrammatical.
its hard to tell whether the child is actually attentive to evidence this subtle - or even if they were, if it would be sufficient to solve the overgeneralisation problem.
It also wouldn’t explain why, when people do over-generalise completely novel items, they don’t do so arbitrarily
Semantic Bootstrapping
Maybe people notice semantic regularities among the verbs that can take certain arguments, and generalise according to those
This would explain why people don’t over-generalise arbitrarily; indeed, people do seem to take semantics into account
problem: people have not succeeded in coming up with what those subtle meaning differences might be.
Statistical learning?
Maybe people take note of how often they have failed to hear something (relative to the number of times they could have, if it were okay) and after a while, if they haven’t, take that as evidence that it is ungrammatical.
this is called implicit negative evidence
Implicit negative evidence
at first, when you’ve only heard verbs a few times, most things seem ok..
but then, if you’ve heard the verb many times, it begins to be suspicious that you havent heard it with certain arguments
This predicts that you should be less likely to over-generalise verbs if they are very frequent
How to explain non-arbitrary generalisation of completely novel verbs, like moop?
Statistical relationships based on:
1. Frequency of occurrence
2. semantic co-occurrence may also help
Summary: verb arguments
- Learning verb arguments is an example of the problem of overgeneralisation
- People do not simply mimic what they hear, and they overgeneralise in a non-arbitrary way
- People are aware of semantics, but semantic bootstrapping is probably not sufficient to explain how they overcome the problem
- Implicit negative evidence can capture this qualitatively and quantitaively
Does implicit negative evidence solve everything?
No, it still doesn’t explain how children know what arguments are in the first place, or how they know which of an infinite number of semantic features might be plausible.
but it does explain how, once you know the features, you can learn even without ever getting any negative evidence
Morphology
A morpheme is the smallest unit in a language that conveys meaning
The process of combing morphemes in order to convey meaning is called inflection, and the rules governing how this can be done constitute inflectional morphology
Usually, inflections are combined with a root or stem, which is the head of the resulting word in the same way that a verb is the head of a sentence.
Usually morphemes reflect different grammatical categories, although languages differ in what kinds of inflections exist or are obligatory
Kinds of grammatical categories
Indicates when in time a situation took place (often overlaps with aspect, which indicates ongoingness of an action, and mood, which indicates things like conditionality) present - walk, past- walked future - will walk perfect - have walked progressive - is walking
Verb tense
There are two kinds of past tense verbs in English: regular and irregular
Regular verbs
The majority of verbs are regular; in them, the past tense is formed by adding ed to the stem of the verb. walk - walked, smile - smiled, push - pushed
New Verbs
English makes it easy to turn nouns and adjectives into verbs. As a result, we constantly have new verbs entering our language
Almost all new verbs can be put into past tense by adding -ed. This indicates that the past tense rule is productive and has some psychological reality.
irregular verbs
- may occur in clusters based on similarities in the sound of the stem.
Sing-sang, ring - rang, shrink- shrank, drink - drank.
Drive - drove, ride-rode
blow-blew, grow-grew - Others a more idiosyncratic - go- went, eat - ate, teach-taught
- These clusters are also psychologically real; people are uncertain about the past tense of verbs that sound like they should be in a cluster, but arent
Acquisition of verbs- How do children acquire inflectional morphology of verbs?
General U-shaped curve of acquisition - early on: few verbs are used, but most of them are used correctly.
then, past-test rules are often over-generalised (e.g. goed instead of went)
Eventually, these mistakes are corrected
The rules vs. statistics debate
quite ill-defined.
- are the regularities learned represented as actual rules, or are they simply statistical patterns?
Rule-based - the representations are rules, and behaviour results from the mind applying the rule to the situation ‘When in situation A, do B.’
Statistical - the representations are simply statistical patterns which tend to be followed if the situation is right. ‘B responses tend to occur in situation A.
Rule-based/symbolic models
These tend to be (deterministic) computer programs. The idea is that the brain is acting, metaphorically, like a computer.
input–> processing –> output.
Importantly, this approach theorises that - even if the brain doesn’t actually have little rules floating around in it - the best approximation to its behaviour is that it is a ‘cognitive machine’ that acts as if it implements such rules
Statistical/non-statistical models
Also are computer programs, but the most common kind of non-symbolic model is one which is modelled on the structure of the brain. These models are called connectionist models.
How connectionists systems work
Input nodes –> hidden nodes –> output nodes.
Nodes are like neurons (or groups of neurons)
Information enters the input nodes, then passes along the links, which changes the later nodes and affects the output behaviour
If there were errors, the system changes the links so as to avoid future errors - thus, it learns
Statistical/non-statistical models. Overall this sorts of models
- have representations of knowledge that are distributed over lots of nodes (i.e. non-symbolic)
- learn slowly on the basis of statistical patterns in the data
- produce probabilistic behaviour
Two theories of verb learning
Words and Rules (WR)
- Two systems working together: Grammar: Regular verbs are learned as a rule. Lexicon: Irregular verbs are simply memorised.
Connectionist (PDP)
- All verbs are learned the same way, but a network (the brain) that does pattern association based on the sounds (phonology) of the words. Regular verbs simply have stronger patterns.
Which theory best accounts for the empirical evidence about how we learn the past tense of english verbs?
Both account for many of the same empirical phenomena - U-shaped curve of acquisition
Words and rules (WR)
- at first, the child simply memorises each word. When the child notices that there is a rule, she applies it too broadly. Finally, she slowly memorises the exceptions.
Connectionist (PDP)
- the connectionist system learns the high-frequency items first, and slowly
The ‘+ed’ pattern is very common, and so is over-generalised
Eventually exceptions are learned
Criticisms of connectionist models
- Produce some very odd verb forms, which people don’t do anything like. Can only fix this by building the network so that it essentially hardwires a rule.
- Doesn’t explain why some verbs (e.g. ring/rang) can have different past tenses depending on their etymology or context, not their phonology. ‘she rang her grandma,’ ‘the walls ringed the city’
Criticisms of words and rules
- predicts that people should go from 0% to 100% using the rule very fast, but they don’t appear to
- People do not apply the regular past tense uniformly: phonology, frequency, or semantics play a role.
- frink: if presented in a context like that of drink, people said the past tense was frank, but if presented in a context like that of blinked, people said the past tense was frinked.
Effect of encoding verb tense
In english, tense is a part of the word and obligatory. In indonesian it is not part of the word/obligatory but other particles can indicate it.
Hierarchical phrase structure
‘she saw the man with the telescope’
- this sentence has hierarchical phrase structure, it consists of phrases nested hierarchically within one another
Regular grammars
can be written as a list of possible production rules.
does not capture the hierarchical phrase structure
Context-free grammars
Do capture the hierarchical phrase structure. Words in sentences are produced based on the phrase structure of the sentence. results in parse trees.
can also be written as a list of production rules
Which are better descriptions of human language?
Context-free grammars, they capture hierarchical phrase structure
Why believe that language has hierarchical phrase structure?
Argument from simplicity: word-chain grammars would have to be enormously large and complicated to account for long-distance dependencies.
Argument from substitution: phrases appear to have psychological reality: we can substitute instances of a kind of phrase freely