Week 11: Language Flashcards

1
Q

How is memory for language different from memory for events

A
  • Episodic memory :Memory for events!
    -sematic memory: memory of fact
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Differences between episodic and sematic memory

A
  • Semantic memory more resistant to forgetting or brain damage than episodic memory
  • Retrieval from episodic memory often described as “mental time travel” – re-experiencing events
  • Retrieval from semantic memory is often automatic and does not have the same experience
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

The Lexical Decision Task

A
  • Participants are presented with letter strings that are either words or nonwords, have to make a decision as to which is which
    -In this task, accuracy is often at maximal levels (unless participants are pressured to respond very quickly). Therefore, the dependent variable in these tasks is the response time (RT). The core component of these RTs is assumed to be the latency of lexical access
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Lexical access is enhanced by:

A
  • Repetition priming – there are faster RTs for repeated words than non-repeated words, even if they are separated by other words.
  • Semantic priming – there are faster RTs for words semantically related to the just presented word (e.g. faster RTs for ‘doctor’ after being preceded with ‘nurse’).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What causes priming effects?

A

® Spreading activation – reading a word increases its activation. Reading a word also increases the activation of related words in the lexicon. This activation decays over time, which is why priming effects are often short-lived

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

The word frequency effect:

A

® High frequency words have faster RTs than low frequency words.
® Frequency refers to how common a word is in the natural language. High frequency words are commonly used words, whilst low frequency words are uncommon words.
® This is often quantified by a corpus analysis – counting the frequencies of each word across a large number of texts. Older estimates of word frequency came from books but today there are very large digital databases that are used (e.g. subtitles in films, conversations of twitter)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Estimate of words

A
  • Older estimates of word frequency came from books (Kucera & Francis, 1967)
  • Today, there are very large digital databases that are used
  • Subtitles in films (SUBTLEX database)
  • Conversations on Twitter
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

implication of word frequency effect

A
  • Word frequency effect implies that HF words are accessed more easily in the mental lexicon
  • Some have even argued that the mental lexicon is searched in a serial fashion by word frequency (Murray & Forster, 2004)
  • Advantage for HF words is eliminated when words are repeated. Repetition boosts words to their maximum level of activation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Does the word frequency effect really reflect faster reading times for HF words? experiment

A
  • Research in eyetracking says ”yes”!
  • Eyetrackers measure where people are looking at on a screen and for how long
  • Rayner and Duffy (1986): longer gaze durations to LF words than to HF words while reading sentences
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Word frequency effects in memory task

A

. In recall tasks, there are advantages for high frequency words. However, these advantages only occur in ‘pure’ lists of words (when lists are composed of HF or LF words entirely), and there is little to no frequency effect when mixed lists of HF & LF words are studied (Gillund & Shiffrin, 1984). This is referred to as the mixed-list paradox.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

word frequency effect in recognition task

A

Recognition memory shows an advantage for low frequency words. Low frequency words have a higher hit rate (more ‘yes’ responses to studied words) and a lower false alarm rate (fewer ‘yes’ responses to new words) compared to high frequency words. This pattern is referred to as the mirror effect.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the cause of the word frequency effect?

A
  • There has yet to be a single unified explanation of word frequency effects across all tasks
    Lexical decision:
  • Stronger ”base level activation” for HF words
  • In other words, HF words are already active from their heavy repetition in language
    Free recall:
  • HF have stronger associations to other HF words, making them easier to learn associations in an experiment
  • This is evident in free association data – HF words tend to elicit other HF words
    Recognition
  • HF words are more similar to other HF words, both semantically and in terms of their perceptual characteristics (they have more overlap in their letters and phonemes)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Why are there so many different explanations of word frequency effects?

A
  • Word frequency is correlated with many other variables!
  • Word length: high frequency words tend to be shorter
  • Concreteness: low frequency words tend to refer to concrete things while high frequency words tend to be abstract
  • Neighborhood size: high frequency words have more similar words in the lexicon
  • E.g., a common word (e.g., HF) like HOT also has other similar words like TOT, ROT, and POT, but a less common word like COMPUTER doesn’t have as many similar words
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Does word frequency even matter on its own? context variability

A
  • Stronger predictor of lexical decision latencies: context variability (Adelman, Brown, & Quesada, 2006)
  • Context variability is defined as the number of documents a word occurs in
  • E.g., words like “where” or “people” are used across many linguistic contexts, words like “dog” or “baseball” are used in particular contexts
  • it is different from word frequency
  • A high frequency word that is repeated a lot in one context is a low context variability word
  • Likewise, a word could have low frequency overall but appear in a lot of different contexts
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Does word frequency even matter on its own? context variability vs word frequency

A
  • Adelman et al. (2006) found that context variability, not word frequency, predicts performance in lexical decision
  • Word frequency had almost no effect when context variability was controlled
  • High context variability words have shorter RTs than low context variability words
    -Almost no effect of word frequency after contextual diversity is controlled
    -Clear negative relationship between context variability and RT when word frequency is controlled!
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Why would there be such strong context variability advantages?

A
  • Adelman et al. (2006) related these findings to the rational analysis of memory and language by John Anderson
  • Rational analysis states that cognition – and memory in particular - is shaped around need probability in the environment
  • Recency is one example: we tend to need recent things more than non-recent things, which may be why human memory is centered around recency
  • High context variability words are more likely to be needed in future contexts than low context variability words
  • Analogy: high context variability words are like tools that can be used in a lot of different situations (e.g., hammer, swiss army knife) – you’re most likely to use these in future situations
  • Low context variability words have very specific or niche usages
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Context variability and memory

A
  • Studies on context variability were directly motivated by findings that memory benefits for presentation of words in different contexts
  • Stronger benefits of repetition when words occur in different contexts (e.g., different backgrounds or font colors) than when presented in the same context
  • Stronger memory when repetitions are separated in time than massed consecutively (the spacing effect)
  • Similar advantages for low context variability words in language have been found in memory tasks
  • Free recall and recognition memory also show advantages for low CD words over high CD words
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Classical approaches to language and word identification

A

“Classical” (traditional) approaches emphasize rules
* When reading, we use rules about spelling-sound correspondence
* When hearing speech, we use rules about how words begin and end to understand where word boundaries are
* E.g., in English, words tend to end with consonants, so we can use this to infer when a word has ended and another has begun
* Most languages have exceptions to rules
* These exceptions are stored in long-term memory
* Reading:
* RULE: “X” is pronounced as \eks
* EXCEPTION: “Bordeaux” where the ”x” is silent

19
Q

Problems with Classical approaches to language and word identification

A

there are a number of problems with the idea that word perception only operates via usage of rules and exceptions
* Not always clear when to prioritize rules or exceptions
* Not clear how rules are acquired during linguistic development
* Brain damage/aging rarely shows the complete loss of rules
* Brain damage instead suggests “graceful degradation” – loss of some specific words or phrases
* Not clear how context affects perception

20
Q

Context influences letter perception

A
  • Letters are perceived more accurately when they are in words than when they are in non-words or random letter strings
  • Faster perception of the letter ”A” in “CATS” than in “ZAZX”
  • Classical approaches did not have insight into this problem
21
Q

Overview of Interactive activation model of letters and word perception (McClelland & Rumelhart, 1981)

A
  • This is a computational model of how we perceive words
  • Computational model: a theory made explicit with computations
  • We don’t know whether this is the truth or not – we can postulate (suggest the existance) some unobserved mechanisms and evaluate how well they can explain phenomena
  • In this model, the model explains how context affects performance through the interaction of its various mechanisms, namely how top-down and bottom-up perception influence each other
22
Q

Interactive activation model of letters and word perception (McClelland & Rumelhart, 1981)

A

Model consists of three layers:
* Feature layer: basic perceptual features like lines in text or handwriting
* Letter layer: abstract letters which may look like the features but may not
* Word layer: word representations in the mental lexicon
Activation flows back and forth between these layers * Higher activation = stronger perception
Lateral inhibition:
* In the word layer, the activations inhibit each other so only one word can be strongly activate
* This is why we tend to only perceive a single word rather than multiple words

23
Q

Top up and bottom down perception in context influence impression

A

Bottom up: sensory perception from the environment
* Features from the stimulus – these become activated when a letter string is perceived
* The activations of the features are used to activate the letters that contain them
* The activations of the letters activate the words that contain them (e.g., the letters C, A, and T activate CAT but also CART)
Top down: knowledge and expectations shaping our perception
* Word layer in the model ”feeds back” to influence the letters
* When words become activated, they add activation to their own letters, but inhibit letters that are not present in them
* E.g., for the word “cats” strengthens the letters “c”, “a”, “t”, and “s”, but will inhibit other letters like “x” and “z” that are not present in the word

24
Q

The Interactive Activation Model

A
  • The interactive activation model has become the cornerstone of theories of reading
  • They are all centered around interactions between the bottom-up influences of perception and the top-down expectations from our understanding of words
  • …and they can also be used to understand speech perception
25
Q

why should we study Speech Perception

A
  • For a long time, a question in speech perception has been: how do we segment speech?
  • Audio is completely continuous – there are no actual breaks between words when we speak, even though it sounds like there are!
  • How do we mentally divide up continuous audio into discrete words?
26
Q

The TRACE model (McClelland & Elman, 1986) for speech perception

A
  • TRACE is basically the interactive activation model applied to speech perception
  • Feature layer, phoneme layer, and a word layer
  • ”Phonemes” refer to the basic sounds in a language
  • Phonemes and letters are not necessarily the same thing!
  • Letters can correspond to multiple phonemes
  • The “c” in count is not the same as the “c” in cylinder
  • One key difference:
  • In reading, all of the letters are available simultaneously
  • In TRACE, the phonemes are activated one at a time as the speech signal is processed
27
Q

What TRACE can explain

A

Right context effects (Thompson, 1984)
* Many times spoken words have missing phonemes - they are either misheard or not pronounced, but we can understand words just fine!
* Gift being pronounced as ift – we likely still hear it as “gift”
* Because “ift” occurs after the letter “g”, it implies that what we hear in the present can alter our understanding of the past
* How does TRACE explain this?
* “Gift” may be the only word that “ift” can activate!
* “Gift” becomes activated and feeds back to /g in the phoneme layer
Speech segmentation

28
Q

What TRACE can’t explain

A

Influence of semantics on word perception!
-Let’s say you hear “The _ing had feathers”
-Which word do you think this was? “WING” or “RING”?
-People generally think it’s “WING” because it’s semantically consistent with what came after (“feathers”: Szostak & Pitt, 2013)
* Another example: ”BIG GIRL” and “BIG EARL” can sound almost identical!
How do we hear one but not the other?
* Linguistic context! We say “Big girls don’t cry”, not “Big Earls don’t cry!”
* TRACE would require some additional semantic layer to further
constrain it

29
Q

What do these models tell us about language perception more generally?

A
  • Reading and speech perception can be processed simply as an interaction between bottom-up perception and top-down knowledge
  • The model can perceive these without using rules and exceptions! And this might be a good thing!
  • Many linguists often discuss word perception involving rules, e.g., English words tend to end with hard consonants like /k
  • But there are always exceptions – how does the system know how to manage both rules and the many exceptions that are present?
30
Q

Language learning history

A
  • This is an extremely old question! Many philosophers have debated whether language is inborn or acquired
  • BF Skinner in 1957 argued that language learning is learned via operant conditioning
  • Example: If a child says “Mom can you give me milk?” and receives milk, there is reinforcement of the successful use of language
  • Repeated reinforcement of successful uses of language lead to its acquisition
31
Q

Noam Chomsky and learning via operant conditioning

A
  • Chomsky, a linguist, wrote a scathing review of Skinner’s book
  • He argued that it was virtually impossible for language to be learned via operant conditioning
  • Key problem: the poverty of the stimulus
  • Translation: Children just don’t exposed to that much language!
  • Children often produce sentences that they have never even heard before A child saying “I hate you mommy!”
  • Chomsky argued that language learning is innate and due to a universal grammar
  • All languages are mapped onto this grammar
32
Q

Impact of Chomsky’s critique

A
  • Enormous!
  • Led to a renewed interest in nativist accounts of language learning (biological preprogram, innate)
  • Many researchers have documented the extremely rapid rise in language use through the early years
  • 5 years old learn on average of 2-3,000 words a year – many words a day!
  • Was also one of the cornerstones of the cognitive revolution in the 1960’s and the death of behaviorism
  • Behaviorism was entirely about stimulus – response associations
  • After the cognitive revolution, researchers began considering internal representations as a mediator between stimuli and responses
33
Q

Chomsky’s account of language comprehension

A
  • Noam Chomsky argued that sentence comprehension is first and foremost dependent on syntax
  • Syntax: rules for word order
  • This is another example of a “classical” approach to language comprehension, also referred to as a “structural” approach
  • Sentences are divided into their parts of speech and grouped into noun phrases and verb phrases

-syntax is process first, then meaning of word is use to create the meaning of the sentence

34
Q

Problems with the classical account of sentence comprehension

A

Sentence interpretations cannot always be recovered using rules!
* “The spy saw the policeman with binoculars” vs. ”The spy saw the policeman with a revolver”
* The first case: the spy had the binoculars, The second case: the policeman had the revolver
* But The structure is nearly identical
> The word meanings determine the structural interpretation, not the other way around

35
Q

Parallel distributed processing (PDP) accounts of language acquisition

A
  • In the 1980’s, there were a number of neural network models of language acquisition that were developed. also referred to as connectionist models
  • The interactive activation model and TRACE are similar, but these models do not learn
  • No changes in connections between words, letters, or phonemes occur during training
  • These networks embody the following principles
  • Learning by the difference between predictions and what was heard
  • On each iteration, the model makes a prediction of some kind
  • If the prediction is in error, the connections in the network are modified to better predict future outputs
36
Q

Parallel distributed processing (PDP) accounts of language acquisition thought on knowledge

A
  • Knowledge is distributed across many connections, like in the human brain
  • Knowledge is not stored in fixed units anywhere like in classical accounts of language This allows for graceful degradation
  • If you cut certain units or connections in the network, they don’t lose entire words or phrases
  • Each word is represented across many units, so losing a small number is not consequential
  • The only learning that takes place is modifications of connections in the network – no new units are added
  • Connections in the model can be thought of as associations or relationships
  • The models learn relationships between different levels of language
  • These can be relationships between the way words appear and how they sound (Plaut et al., 1996)
  • This can also be relationships between words in a sentence (Elman, 1990)
37
Q

Parallel distributed processing (PDP) accounts of language acquisition: the learning process

A
  • The networks do not start with any knowledge!
  • Models often begin performing very poorly
  • They learn across many, many iterations – adjusting in response to the errors they make
  • Performance gradually increases through the course of training until it approximates human performance
  • Errors made during the course of training are another testbed for such models
  • These errors should resemble the errors that humans make
38
Q

Past Tense Acquisition problem

A
  • Past tenses are of interest because of irregular verbs
  • Most verbs are made past tense by adding “-ed”
  • However, there are several other past tense verbs such as ran and went that don’t conform to this pattern
  • Even crazier – children often go through a phase where they get worse in their use of irregular past tense forms
  • E.g., a child will use the word “ran” at around age 3
  • Later, the child says “runned”!
  • Eventually, the child properly uses both regular and irregular verbs
  • Steven Pinker and others have argued that this is due to the usage of rules
  • The erroneous “runned” reflects an overusage of the rule
  • The exceptions to the rule are eventually learned and this allows children to perform well
39
Q

Rumelhart and McClelland’s (1986) model of Past Tense Acquisition

A
  • Present tense verb is presented to the network on the INPUT layer
  • Word is converted into “Wickelfeatures” – trigrams of the letters
  • The word “Foster” broken up into all consecutive three letter combinations: FOS-OST-STE-TER
  • The Wickelfeatures of the present tense verb are used to produce a past tense version of the word
  • Converted back into the letter string of the predicted past tense word
  • How does the model learn to produce past tense verbs?
  • Connections are present between each layer
  • When an error is made, the connections are adjusted based on the difference between the current prediction and the correct past tense verb
40
Q

Key point from Rumelhart and McClelland’s (1986)

A

argued that there is a sufficiency in general learning mechanisms, rather than rules or syntax
* Their model – and other PDP models – merely learn on the difference between the correct input and the prediction from the network

41
Q

Crticism on Rumelhart and McClelland’s (1986)

A
  • Steven Pinker and colleagues heavily criticized this model on a number of grounds
  • The model does not succeed on all exception words
  • There are certain neurological double dissociations that support the idea that verb use is subserved by two systems
  • “Double dissociations” – manipulation of a variable affects system 1 but not system 2, manipulation of another variable affects system 2 but not system 1
  • Patients with Alzheimer’s, who have LTM deficits, have difficulty with irregular past tense verbs
  • Patients with Parkinson’s disease, who have damage to the basal ganglia, have difficulty with regular past tense verbs but not irregular ones
  • Severing connections in connectionist models tends to affect the irregular verbs but not the regular ones
42
Q

Other criticisms of connectionist models

A
  • They are sensitive to the training sets
    Very different behavior emerges from different training sets!
  • They can learn things people can’t learn
    The learning algorithms are so powerful they can reproduce just about any patterns with enough training
  • They are difficult to understand!
    If you don’t understand how the model worked, you’re not alone
    They often reproduce patterns of interest after very small incremental adjustments to connections across thousands of iterations of training
    Often the creators of the models cannot explain how the models succeed
43
Q

So who is right

A

Modern neural network models: deep learning models, which are used by Google and others
* Neural network models with many layers (around 10-20 layers)
* These models are used a lot for web searches, face recognition, etc
* Modern language production models like GPT-3 are extremely impressive, but not without their criticisms