Lecture 11 - Language Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

What is the difference between episodic and semantic memories?

A

Episodic memory is the memory of an event or episode - it includes memory of context details, spatio- and temporal locations.
Semantic memory is the memory of facts and language.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Is episodic memory or semantic memory more resistant to memory decline from brain damage or even certain diseases, such as Alzeihmers?

A

Semantic memory tends to be more resistant to forgetting that episodic memory.
Reasons for this may be due to the idea that semantic memories are encoded in a wide network of neurons compared to more specific episodic memories that may be encoded in a small network of neurons.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a Lexical Decision Task?

A

A Lexical Decision Task is used to study language and involves showing participants strings of letters and asking them if they are words or not.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

In Lexical Decision Tasks, what is the DEPENDENT variable?

A

The Response Times. This is because accuracy in Lexical Decision Tasks is very high, but response times differ.
Response Time in these tasks is considered to be representative of LEXICAL ACCESS or how latent these words are in our brains.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are some ways that improve LEXICAL ACCESS?

A

Repetition Priming - RTs become faster when words are repeated, even if there are other words in between.
Semantic Priming - RTs are faster when the prior words presented are associated with the word, such as pet then dog.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

The fact that Lexical Decision Task RTs are faster with Repetition Priming and Semantic Priming tell us what?

A

It tells us that the way we access information in our minds has something to do with associations of concepts or networks in our brains.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is one of the major explanations for why we see priming effects (such as Repetition or Semantic) in Lexical Access Tasks?

A

Spreading Activation (or Activation Monitory Theory), which says that when we are presented with something, such as a particular word, then that word is not only activated in our minds/networks or neurons, but associated words or events etc are also activated.
That is why we see faster RTs when an associated word is presented prior to a certain word.
This activation decays rapidly over time, which is why Priming Effects tend to be short-lived.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the Word Frequency Effect seen in Lexical AccessTasks? And what does it tell us about how words are accessed in the mind?

A

The Word Frequency Effect refers to the fact that participants have faster response times in Lexical Access Tasks for more common words.
The fact that we have quicker acces to more common words has been taken to mean that these words are either encoded in more places/have stronger networks, or we scroll through known words during a lexical access task and more common words are at the top.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is meant by WORD FREQUENCY and how is it determined?

A

FREQUENT WORDS are just common words.
Word Frequency is determined in different ways, but in the past it was done through books, but now it is done digitally, either through data bases on subtitles from movies, or conversations on X.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

When will the Word Frequency Effect not be seen?

A

The Word Frequency Effect is eliminated when the word is repeated in a Lexical Access Task.
So, what this looks like in the data is that when a word is shown for the first time in a LAT then the response time is dependent on the frewuency of that word.
However, when the word is repeated then the frequency of the word does not affect RT. Repeated words have the same (faster) RT regardless of frequency.
This is understodd to occur because once words are repeated they are as activated as they are going to get and there is only so fast we can actually read and respond, so whilst actually they might get more activated, we cannot measure that in RTs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Do people read high frequency words faster?

A

Yes, eyetracking data can confirm this.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the MIXED LISTS PARADOX when it comes to word frequency effects in memory tasks?

A

Pure lists of words are lists that contain words of the same frequency/commonality only.
When using pure lists, memory is better for high frequency words than for low frequency words.
When using mixed lists on the other hand, there is no better recall/memory for the either the high or low frequency words, i.e. memory is not influenced by frequency of the words.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the MIRROR EFFECT seen in memory recognition tasks of words?
Hint: hits and foils (false alarms)

A

In memory recognition tasks, the hit rate for low frequency words in higher, and the recognition of foils, aka false positives, is lower.
For High frequency words the hit rate is lower, and the false positives are is higher.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

From a lexical decision task perspective, high frequency words are accessed/recognised more quickly as words because they are already activated to some degree because we use them so frequently and therefore require less activation than lower frequency words.
T or F?

A

True.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

In free recall tasks using words, what is one of the main explanations for why high frequency words have higher levels of recall?

A

One explanation is that high frequency words have a lot of associations with other high freuwncy words and therefore it is easy to form an association between them and therefore aid in recall.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Word frequency is correlated with a lot of different variables, and so understanding why it is we see certain effects of word frequency on memory can be complex.
What are some of the other variables that are associated with word frequency?

A

Word length - high frequency words tend to be short, such as “and”, “the”.
Concreteness -low frequency words tend to refer to concrete things, whereas high frequency words are more abstract.
Neighbourhood size - high frequency words have lots of similiar words, whereas low frequency words tend not to have that many similar words.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is context variability when it comes to words in a language?

A

Context variability refers to the number of contests words are used in.
So, words with high context variability can be used in a wide range of contexts.
Whereas, words with low context variability occur in only a small number of contexts.
Whilst word frequency and context variability are correlated they can differ in important ways.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

In a study done by Adelmann (2006) that looked at context variability in lexical decision tasks, what did they find?

A

They found that CONTEXT VARIABILITY not WORD FREQUENCY predicts performance in lexical decision tasks.
High context variability words had faster RTs than low context variability words.
When context variability was controlled for, word frequency has almost no effect on RTs in a lexical decision task.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

When context variability is controlled for, does word frequency have an impact on RTs in lexical decision tasks?

A

No.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Why do we see strong context variability advantages? That is, why would do we better remember high context variability words?
Adelmann et al (2006) offered an explanation based on their findings that context variability, as opposed to word frequency, is associated with lower response times in LDTs. i.e. we have better memory of ligh context variability words.

A

The Rational Analysis of Memory by John Anderson states that memory (and cognition in general) is shaped by need probability in the environment.
So, we are more likely to need high context variability words than low context variability words and therefore perhaps they are easier to access and are generally more activated and therefore we have a better ability to remember them.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Do we have better memory for words when they are presented in multiple fonts and/or backgrounds?

A

Yes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Is memory for words better when the words are repeated consecutively or when words are repeated in a temporally spaced manner?

A

Words are better remembered when they are repeated in a temporally spaced manner (known as the SPACING EFFECT).

23
Q

High context variability words have faster RTs compared to low context variability words in Lexical Access Tasks.
When it comes to recognition and recall of words, out of high and low context variability words which shows advantages to memory?

A

In regards to episodic memory, as measured by recognition and recall memory tasks, low context variability words show an advantage to memory (i.e. participants had better memory for these words), compared to high context variability words.
This has been suggested to occur, because it is easier to remember that a word was on a study list if it is not used in many contexts, as it stands out to us and our memory of it is not interfered with by other associations. For example, with a high context variability word we have used that word several times that morning and therefore it is less clear to us whether that’s why we are thinking of /recalling the word, as opposed to having seen it on the study list.

24
Q

What are “Classical” Models of Word Identification based on?

A

Classical Models of word identification are based on the idea that we use rules to identify words (either when we are reading or hearing language).
Some examples of rules are that certain letters make certain sounds, which we know and allows us to read words, even if we have never come across them before.
In english most words end in a consenant which allows us to make our when words have ended in a stream of speech.
Most languages have rules and exceptions to those rules, which are just other rules, and so we learn those rules as well and store these in long-term memory.

25
Q

What are some of the limitiations with classical approaches to Word Identification?

A

The idea that we use rules and exceptions in language and word identification has several limitations:
When to prioritise rules over exceptions can be unclear.
Brain damage rarely leads to a complete loss of rules, but rather tends to see loss of particular words and or phrases.
It’s unclear how context affects perception of words (see “the cat” example).

26
Q

What was one of the theories proposed as an alternative to Classical approaches to word identification?

A

The Interactive Activation Model of Letters and Word Perception (first proposed by McCelland and Rumelhart (1981).
This model is a computational model.
This model proposes a to-down and bottom-interaction between its three layers:
1) Feature Layer
2) Letter Layer
3) Word Layer
The higher the activation between these layers the stronger the perception.

27
Q

Which layer is most susceptible to lateral inhibition and how is this helpful?

A

Lateral inhibition occurs most strongly at the word layer to prevent us from seeing or recognising multiple words in the same word.

28
Q

How does the Interactive Activation Model of word identification explain how context influences word identification?

A

The Interactive Activation Model of Word Identification states that context influences word idenitifcation in both a bottom-up and top-down way.
Bottom-up influence:
We see lines and shapes (feautre layer), which activate letters in the letter layer, which then activate words in the word layer.
Top-down influence:
Word activation inhibits letter activation in letter layer.
e.g. if we recognise cat then we inhibit activation of letters such as “x” or “p”, which are not in the word.

29
Q

What is the sequence of bottom-up influence in the Interactive Activation Model of word identification?

A

Feature layer to letter layer to word layer.

30
Q

What is the top-down sequence of influence in the Interactive Activation Model of Word Identification?

A

Word layer to letter layer.

31
Q

What is the cornerstone Model for theories of reading?

A

The Interactive Activation Model of Word identification.

32
Q

Why is speech perception so interesting and confusing to computational models?

A

Well, when we hear natural speech we are actually hearing a continuous combinations of sounds. Words are not actually separated by a pause.
This has been difficult for computational models, as it is unclear how we are able to hear a continuous stream of sound, yet be able to hear it as discrete words.

33
Q

What is the TRACE model (McCelland & Elman, 1986)?

A

The TRACE MODEL of speech perception is very similar to the Interactive Activation Model of Word identification, but instead of having a letter layer it has a phoneme layer.
Phonemes are activated one at a time as speech is processed.

34
Q

What is the IPA?

A

International Phoneme Alphabet.

35
Q

What is the RIGHT CONTEXT EFFECT?

A

The Right Context Effect refers to the phenomenon where we understand a word even if we do not hear one of the phonemes in it, if we hear it in the right context.
An example would be hearing, “I have to go feed my _at.”
Given the context we hear, or at least perceive, the sentence as “I have to go feed my cat.” despite not hearing the “c” phoneme.

36
Q

What is a phoneme?

A

A basic sound in language.

37
Q

How does TRACE explain the right context effect?

A

An example of a right context effect would be hearing “I’m going to feed my _at.” TRACE would say that due to the context the word “cat” is the only word that fits what we heard. The word “cat” feeds back to the phoneme level and activates the phoneme c makes in cat.
In other words, “what we hear in the present can alter our understanding of what we heard in the past.”

38
Q

What is one of the main things that TRACE cannot explain about speech perception?

A

TRACE is yet to explain how semantic context influences they way we perceive speech. (To be honest, I’m not sure how this is different to the right context effect).
TRACE would need to have a semantics layer to be able to explain how semantic context influences speech perception.

39
Q

What is the key way that the Interactive Activation Model of Word identification and TRACE model of speech perception differ to Classical approaches to word and speech perception?

A

The key difference is that these models are based on the interaction between our knowledge and our perception and actually is NOT reliant on rules or exceptions. This is entirely different to that of classical models which are based largely on the use of rules and exceptions.

40
Q

How does BF Skinner propose we learn language?

A

Through operant conditioning - reinforcement and punishment.

41
Q

Noam Chompsky wrote a scathing review of Skinner’s proposal of operant conditioning as the mode of language learning.
What was the gist of Chompsky’s critique?

A

Chompsky’s main critique is referred to as the “poverty of the stimulus” critique, which states that the idea of learning language via operant conditioning does not account for the fact that children (and all of us) are able to generate new and novel sentences most of the time. There is no way that we have received punishment or reward for these sentences because they are NOVEL.
Chompsky’s proposal is that of UNIVERSAL GRAMMAR, which states that we are born with the innate capacity to learn language, based on hard-wired understanding of grammar (a universal grammar that all languages map on to).

42
Q

On average, how many words does a 5-year-old learn a year?

A

2-3,000 words a year.

43
Q

Why was Chompsky’s critique of Skinner’s operant conditioning approach to language learning the death of behaviourism?

A

Because behaviourism was based almost solely on the idea that we learn based on the stimulus-response associations. If this idea simply cannot explain how we learn language (one of the major achievements of the human race) then does it have much footing at all?

44
Q

What is Chompsky’s theory of language acquisiton?

A

UNIVERSAL GRAMMAR.

45
Q

According to Chompsky, what is sentence comprehension first and foremost based on?

A

SYNTAX, i.e. rules for word order.

46
Q

Chompsky argues that we put SYNTAX before SEMANTICS.
T or F?

A

True.

47
Q

What are two major issues with Chompsky’s (the classical) approach to language acquisition?

A

1) Chompsky proposes that syntax comes before semantics, however, semantics seem to influence the way we interpret sentences as opposed to sentence structure.
2) Chompsky proposes that language acquisition is an innate ability. However, we have yet to find any evidence to support this theory. (But to be fair we are far from being able to study this directly).

48
Q

What are Parallel Distributed Processing accounts of language acquisition?

A

These models are also referred to as CONNECTIONIST models.
They are based on neural network models of language.
An example is chatGPT.

49
Q

Are Parallel Distributed Processing Models of Language Acquisition the same as TRACE or Interactive Activation Models?

A

No.
Trace and IAM do not allow for learning or changing of connections between words, letters and phonemes do NOT occur.
In Parallel Distributed Processing Models, however, learning does occur.
PDP models are based on a network of connections. Knowledge is spread out across a network of connections, as opposed to discrete units or section s. This idea can explain “graceful degradation”. Some areas can be lost or affected, but due to the knowledge being spread out across networks then it may not be completely lost.
The connections between words, phonemes, letters etc are like relationships, and PDP models LEARN these relationships. So, whilst no information may be added, the connections or relationships can evolve and change.

50
Q

Do children tend to go through a phase of language learning where they can use past-tense irregular verbs correctly and then they start using them incorrectly and then go back to using them correctly?
What is Steven Pinker’s interpretation of what has happened here?

A

Yes.
Steven Pinker proposes that this occurs because at first the child uses the correct past-tense verb becuase they have heard it before, then they learn the general rule, but do not realize there are exceptions (hence they start making mistakes with irregular verbs) then they learn the irregular verb rules and so stop making the mistakes as often.

51
Q

What is the name of the PDP model that produces patterns of past-tense verb learning for both regular and irregular verbs that match that of children?

A

The Model is Rumelhart and McCelland’s (1986) model of Past Tense Acquistion (which involves wicklefeatures which are trigrams of letters in a verb).
This model states that we have connections between layers of phonological representations of present tense verbs, wicklefeatures of present tense verbs, wicklefeatures of past tense verbs, phonological representation of past tense verbs.
We try out past tense verbs and then based on error feedback adjust the connections between these layers.
This model proposes that we learn irregular past tense verbs via error feedback, and not really based on rules.

52
Q

Which model of language acquisition does Steven Pinker highly criticise?

A

The Model of Past Tense Acquisition proposed by Rumelhart and McCelland in 1986.
He criticised it because of a myriad of reasons, one being that it does not explain or predict the learning of all irregular past tense verbs.

53
Q

What are some of the criticisms of Parallel Distributed Processing Models, such as chatGPT, as models of language acquisition?

A

They are very sensitive the training set they receive.
They can learn things people cannot learn.
They are difficult to understand - even the creators do not fully understand them…push toward EXPLAINABLE AI.

54
Q

What is the definition of CONTEXT VARIABILITY?

A

The number of documents a word appears in.
That is, the number of contexts a word can be used in.