Reasoning and decision making Flashcards

1
Q

Lecture 1: Judging Probabilities

A

We will focus on some of the ways in which intuitive probability judgments violate the prescriptions of probability theory, and what these patterns of response reveal about how people estimate probabilities. Two broad approaches will be considered:
* The idea that probability (and other) judgments are sometimes “biased” because people use simplifying strategies that reduce effort but are prone to error
* The idea that we should consider the ecological context in which judgments are made, and that apparent biases may be rational responses given the informational and cognitive constraints of the human decision-maker

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

2 approaches of how people make probability judgments

A

The fallibility of probability judgements was central to the Heuristics and Biases research program developed by Amos Tversky and Daniel Kahneman in the 1970s.

The “two systems”/ “heuristics and biases” approaches suggest that probability judgments (and other kinds of judgment) are biased/violate rational prescriptions because the human judge has adopted a quick-and-dirty strategy rather than a more effortful consideration of relevant material.
An alternative framework emphasizes the role of ecological conditions and informational constraints on people’s judgments and decisions. We will consider two examples.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

The “Availability” Heuristic

A

If an event happens a lot then it will be easy to think of many past instances, so basing judgments on availability is sensible and people’s frequency judgments are often very accurate. However, availability can entail bias if:
1. our experience of past events does not reflect their true frequencies, or
2. events are easy to recall for some reason other than their frequency of occurrence.
One observation that is often taken as evidence for the availability heuristic is that people commonly overestimate the frequency or probability of rare events and underestimate common ones. For example, Lichtenstein et al. (1978) had participants estimate the number of US deaths per year due to 40 causes ranging from very rare (e.g., botulism, with one death per 100 million people) to very common (e.g., stroke: 102 000 deaths per 100 million people).
As shown in the graph below, participants systematically over-estimated the latter and under-estimated the former (the curve shows the best-fitting relationship between real and estimated probabilities; the straight line shows what would happen if people’s judgments were accurate).
This pattern is often attributed to availability: rare events are often given disproportionate publicity and are correspondingly more mentally-available than their environmental frequency would merit.
However:
* The bias here is in the environment (the over-reporting of rare events) rather than a consequence of a flawed estimation strategy.
* This kind of effect does not directly demonstrate availability-based judgment, because no assessment of the ease-of-retrieval has been made.
* The tendency to over-/under-estimate rare and common events can be explained in other ways. In particular, it can be seen as an instance of the general central tendency of judgment, where estimates for extreme items being biased towards the mean of the set. This central tendency is widespread and can be seen as an optimizing strategy – when one is uncertain, guessing the mean of the distribution is sensible – without invoking availability.

A stronger demonstration of the use of the availability heuristic comes from Tversky and Kahneman (1973). Participants listened to a list of 39 names. In one condition the names comprised 19 famous men and 20 less famous women; in another condition it comprised 19 famous women and 20 less famous men.
After listening to the list, some participants had to write down as many names as they could recall; others were asked whether the list contained more names of men or of women.
* In the recall task, participants retrieved more of the famous names (12.3 out of 19) than the non-famous names (8.4 out of 20). That is, famous names were more available.
* Crucially, 80 out of 99 participants judged the gender category that contained more famous names to be more frequent. (E.g., the people given a list of 19 famous men and 20 famous women reported that there were more men than women in the list).
It seems that people made their proportion estimates by assessing the ease with which examples of each come to mind. When one category was easier to retrieve (via the fame manipulation) it was judged more frequent, even when it was actually experienced less often.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

The Conjunction Fallacy

A

The availability heuristic is posited to produce judgments that deviate from the rules of probability theory. A basic axiom of probability theory is that the probability of event “A” cannot be less than the probability of the conjunction “A and B”. However, subjective probability estimates sometimes violate this principle, demonstrating the conjunction fallacy.
For example, Tversky and Kahneman (1983) gave some participants the following problem:
* “In four pages of a novel (about 2,000 words), how many words would you expect to find that have the form _ _ _ _ i n g (seven letter words that end with “ing”)?
Other participants were asked:
* “To estimate the number of words of “the form _ _ _ _ _ n _ (seven letter words with n as the penultimate letter)”.
All ing words have n as the penultimate letter, so the number of n words must be at least as large as the number of ing words. However, participants violated this principle: they estimated, on average, 13.4 ing words but only 4.7 n words.
Tversky and Kahneman (1983) took this as evidence that people are basing their judgments on the mental availability of relevant instances: it is easy to think of “ing” words (for example, by thinking of words that rhyme) but we are less accustomed to organizing/retrieving words based on their penultimate letter, so n words are harder to retrieve and thus seem rarer. If/when participants apply a more systematic mental search strategy, we would expect the conjunction fallacy to disappear.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Base Rate Neglect

A

Similarity-based judgments are insensitive to prior probabilities: the extent to which I look like the sort of person who might be proficient at ballet is independent of the proportion of ballet dancers in the population, for example.
So, judgments based on representativeness will be largely independent of base rates.
In one demonstration, Kahneman and Tversky (1973) told participants that a panel of psychologists had interviewed a number of engineers and lawyers and produced character sketches of each person. They were told that 5 such descriptions had been randomly selected and that they should rate, from 0-100, the likelihood that each sketch described one of the engineers (rather than one of the lawyers).
Some participants were told that the population from which the descriptions were drawn consisted of 30 engineers and 70 lawyers. Others were told that the population comprised 70 engineers and 30 lawyers. That is, Kahneman and Tversky manipulated the base rates for the two possible outcomes.
Below is an example personality sketch:
“Jack is a 45 year old man. He is married and has four children. He is generally conservative, careful, and ambitious. He shows no interest in political and social issues and spends most of his free time on his many hobbies which include home carpentry, sailing, and mathematical puzzles. The probability that Jack is one of the 30 [or 70, depending on the condition] engineers in the sample of 100 is ______ %”
The descriptions varied in how similar they were to the stereotypical lawyer/engineer.
* Crucially, people judged the probability that Jack is a lawyer to be much the same when the description was purportedly drawn from a population of mostly engineers as when it was drawn from a population of mostly lawyers.
This is an example of base rate neglect: the personality description might provide some information about Jack’s likely occupation, but this should be combined with information about the number of engineers and lawyers in the population from which his description was randomly drawn. However, people ignored these base probabilities. Kahneman and Tversky argue that:
1. People assess the extent to which the description of Jack is similar to (or representative of) each of the two categories – lawyers and engineers.
2. To the extent that Jack is more similar to the stereotypical engineer, he is more likely to be judged an engineer.
3. Because this assessment of similarity is independent of the prevalence of lawyers and engineers in the population, the resulting probability judgment is independent of the base rates for these two professions.
More direct evidence for the role of representativeness comes from Kahneman and Tversky (1973), who gave participants the following personality sketch:
“Tom W. is of high intelligence, although lacking in true creativity. He has a need for order and clarity, and for neat and tidy systems in which every detail finds its appropriate place. His writing is rather dull and mechanical, occasionally enlivened by somewhat corny puns and by flashes of imagination of the sci-fi type. He has a strong drive for competence. He seems to have little feeling and little sympathy for other people and does not enjoy interacting with others. Self-centred, he nonetheless has a deep moral sense.”
They were also given a list of 9 academic subject areas (e.g., computer science).
* The prediction group was told that the sketch of Tom was prepared by a psychologist during Tom’s final year in high school, and that Tom is now a graduate student. They were asked to rank the 9 academic subjects by the probability that Tom W. is specializing in that topic.
* The base-rate group was not shown the Tom W. sketch but “consider[ed] all first year graduate students in the US today” and indicated the percentage of students in each of the 9 subject areas – that is, the estimated the base rates for each subject area.
* The representativeness group ranked the 9 subject areas by the degree to which Tom W. “resembles a typical graduate student” in that subject area.
Across the 9 subjects, probability judgments were very highly correlated with representativeness judgments (r = .97) but negatively correlated with base-rate judgments (r= -.65). That is, predictions were based on how representative people perceive Tom W. to be of the various fields, and ignored the prior probability that a randomly-selected student would belong to those fields (base rate neglect).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

The “Representativeness” Heuristic

A

The “Representativeness” Heuristic
Kahneman and Tversky also suggested that people use a representativeness heuristic. The idea is that:
* when estimating a probability – for example, how likely it is that a person belongs to a particular category or the probability that an observed sample was drawn from a particular population – people assess the similarity between the outcome and the category (or between the sample and the population).
Suppose that you meet a new person at a party and try to estimate the probability that he or she has tried internet dating. The idea is that you base your judgment on the similarity between the person and your stereotype of internet-daters – that is, on the extent to which the person is representative of the category “people who have tried internet dating”.
More generally, the representativeness heuristic involves “an assessment of the degree of correspondence between a sample and a population, an instance and a category, and act and an actor or, more generally, between an outcome and a model.” (Tversky & Kahneman, 1983, p. 295).
As with the availability heuristics, we can see evidence for this strategy by looking at the biases and axiom-violations that creep in to people’s intuitive judgments.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

The “Anchor-and-Adjust” Heuristic

A

So far we have considered how people use information about the target quantity to reach a probability or frequency estimate. Our judgments are also shaped by candidate response values. In particular, anchoring refers to the assimilation of a numeric estimate towards another, anchor value.
Anchors can come from many sources. Often, our own recent judgments serve as anchors for the current estimate. For example, Matthews and Stewart (2009) had people estimate the prices of shoes from Top Shop; the judgments on trial n positively correlated with the judgments on trial n-1 for 26 out of 28 participants.
Anchors can also be externally provided, and ostensibly irrelevant. In a famous demonstration, Tversky and Kahneman (1974) span a wheel of fortune that landed on 10 (a low anchor) for one group of participants and on 65 (a high anchor) for another group. Participants were asked whether the proportion of African Countries in the United Nations was more or less than the anchor, and then asked for their best estimate of the true value. The median estimate was 25 in the low anchor condition and 65 in the high anchor condition – that is, the participants’ judgments were pulled towards the anchor values.
Similarly, Chapman and Johnson (1999) had people write down the last 2 digits of their social security number and treat it as a probability (e.g., “14%”). Participants were asked whether the probability that a Republican would win the 1996 US Presidential Election was more or less than this probability, prior to giving their best estimate of the true probability. The larger the anchor, the larger the best estimate, with a correlation of r = 0.45.
The most famous account of anchoring effects is the “anchor-and-adjust” heuristic; the idea is that we use the anchor as an initial estimate of the target value and adjust from that starting point in the right direction; because the adjustment is effortful, we often adjust insufficiently and so our judgment is biased towards the anchor value.
This probably happens sometimes, but there are contraindications. For example, in the “wheel of fortune” task described above, warning people about anchoring effects and/or giving them an incentive to be accurate often has little effect on the extent to which people anchor on the provided value (e.g., Epley & Gilovich, 2005), which doesn’t fit with the idea that the anchoring effect reflects a “lazy” or “intuitive” judgment system that can be over-ridden by effortful deliberation.
Other mechanisms that might contribute towards anchoring/assimilation effects include:
* The idea that consideration of the anchor as a possible value for the estimated quantity activates relevant semantic knowledge (e.g., when considering 12% as a possible probability for the probability of a Republican win, we call to mind relevant information about the state of the economy, public perceptions of the candidates etc; this activated knowledge then shapes or biases our final estimate; Chapman & Johnson, 1999)
* The idea that an anchor value changes our perception of the magnitude of other candidate values (e.g., if we’ve just been thinking about a 12% probability, 50% seems quite large; if we’ve been considering 88%, 50% seems quite small; Frederick & Mochon, 2011).
* The idea that externally-presented anchors may be seen as a “hint” or suggestion, even if they are ostensibly uninformative (after all, doesn’t the fact that the experimenter is getting me to consider a number generated by a wheel of fortune suggest that they want me to be influenced by it in some way?)
These possibilities are not mutually exclusive – and note that they do not all fit with the idea that anchoring stems from the application of quick-and-easy-but-biasing heuristics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Ecology and Adaptation-Example 1: Natural Frequency Formats

A

One example of an “ecological” argument comes from the effect of natural frequencies on base rate neglect. In the “two systems”/“heuristics and biases” view, the problems caused by using availability or representativeness as the basis for probability judgments can be overcome by evoking “System 2” – i.e., by employing a more effortful processing strategy. Consistent with this, there is evidence that people can discount a potentially-misleading but readily-accessible cue such as stimulus familiarity (e.g., Oppenheimer, 2003). But can we do anything other than alert people to possible bias and/or tell them to put more effort into their judgment?
Some researchers argue that people do much better at probability tasks when the information is presented in way that matches our supposed “evolved” cognitive capacities for handling this kind of information. In particular, it has been argued that humans evolved to process frequencies (counts) obtained by sampling the environment, rather than normalized probabilities.

For example, consider the following problem:
“For a woman at age 40 who participates in routine screening, the probability of breast cancer is 1%. If a woman has breast cancer, the probability is 80% that she will have a positive mammogram. If a woman does not have breast cancer, the probability is 10% that she will still have a positive mammogram.
Imagine a woman from this age group with a positive mammogram. What is the probability that she actually has breast cancer?”

When Eddy (1982) gave this problem to physicians, 95 out of 100 gave estimates between 0.70 and 0.80. An estimate of 80% demonstrates the inverse fallacy: it confuses the probability of a positive test result given the presence of cancer, p(positive|cancer), with the probability of cancer given the positive test result, p(cancer|positive). These probabilities are not the same: the chances of cancer given a positive test depend on the base rate (prior probability) of cancer in the population. A positive test is more likely to indicate cancer when cancer is widespread than when it is very rare. But the physicians (and most people) tend to ignore this base rate information.

Probability theory tells us how we should update our beliefs (e.g., that a person has cancer) in the light of new information. Suppose we have a hypothesis and know that the prior probability that is true is and the probability that it is false, . We then encounter some new data, . The conditional probability of obtaining those data under the hypothesis is . (That is, is the probability of obtaining these data if the hypothesis is true).
Bayes’ theorem tells us how we should update our beliefs to give the posterior probability that is true, given our prior belief and the new data:

In the cancer example:
* is the hypothesis that the person has cancer
* is the base rate of cancer in the population (the prior probability that a randomly selected person has cancer) and equals 0.01
* is the prior probability that a person does not have cancer and equals 0.99
* is the probability of getting a positive test result given that the person has cancer, and equals 0.8
* is the probability of getting a positive test result given that the person does not have cancer, and equals 0.1
Thus:

In other words, given the positive test result the probability that the person has cancer is still only 7.5%.
Gigerenzer and colleagues have argued that probabilities only make sense when conceived as long-run frequencies, and that it does not make sense to talk about the probability of a one-off event (e.g., that a given person has a disease). Rather, Gigerenzer argues that humans evolved to keep track of event frequencies, estimated over time by “natural sampling” (i.e., encountering different types of events and remembering the number of times they occur).
Correspondingly, if we re-express the diagnosis problem in terms of natural frequencies (number of events of each type) rather than normalized probabilities, then people should find it much easier.
Consider this re-expression of the previous problem:
“Ten out of every 1000 woman at age 40 who participate in routine screening have breast cancer. Of these ten women with breast cancer, eight will have a positive mammogram. Of the remaining 990 women without breast cancer, 99 will still have a positive mammogram.
Imagine a group of 40 year old women with positive mammograms. How many of them actually have breast cancer? ____ out of _____”
As Hoffrage and Gigerenzer (1998) note, now the answer can easily be “seen” to be 8 out of 107 = 7.5%. They found that only 8% of physicians answered correctly (gave a judgment within 5% of the true value) in the original wording of the task, but that this increased to 46% with the natural frequency format.
More generally, this representation of the problem means that the answer is simply the number of true positives divided by the total number of positives; there is no need to keep track of the base rate, explaining base-rate neglect when problems are presented in standard probability format. In other words, the task is difficult in the original version because the use of normalized probabilities (which necessitate the explicit incorporation of base rates/priors) deviates from how we “naturally” evaluate chance.
Similar evolutionary ideas have been advocated by others (e.g., Cosmides and Tooby, 1996), but there are alternative explanations for why the natural frequency format makes the task easier. In particular, it has been suggested that it simply clarifies the set relations between the various event categories, and that any manipulation which achieves this will have the same beneficial effect.
Irrespective of the basis for the natural frequency format effects, some authors have argued that base rate neglect is not as common as the heuristics-and-biases programme would have us believe (e.g., Koehler, 1994). Quite often in natural contexts, people are sensitive to prior probabilities and update their beliefs appropriately.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Example 2: The Gambler’s Fallacy and Hot Hand Fallacy

A

A different illustration of potential “ecological rationality” comes from a consideration of how people judge the probability of streaks of a random outcome. For a sequence of independent events, the probability of a particular outcome is the same irrespective of what has happened in the past: the probability of getting “heads” from a fair coin is the same after a run of 3 heads as after a run of 3 tails. However, subjective probabilities often violate this independence axiom.
For example, Croson and Sundali (2005) examined roulette betting patterns from a Nevada casino. They focused on “even money” outcomes (where the two outcomes are equally likely, such as “red or black” or “odd or even”; if you bet on the right outcome, you get back twice what you staked) and looked at bets as a function of the number of times that an outcome had occurred in a row (e.g., a streak of two would mean that the last two spins both came up heads or both came up tails).
The graph below shows the proportion of bets that were “with” (white bars) and “against” (black bars) the streaks. As the run-length increased, people were increasingly likely to bet that the next outcome would be the opposite of the streak; after a run of 6 or more, 85% of bets were that the streak would end, even though this probability remains fixed at .50.
The belief that a run of one outcome increases the probability of another (when the events are actually independent) is called the gambler’s fallacy.
The gambler’s fallacy is often attributed to the representativeness heuristic: people expect a “local” sequence to be representative of the underlying process (Tversky & Kahneman, 1974). I know that a coin should, in the long run, produce equal numbers of heads and tails, so I expect any sequence of coin tosses to have this property. A run of heads means that a tails outcome will make the local sequence more representative of the data-generating process.
The gambler’s fallacy is widespread, but sometimes people show the opposite tendency by believing that a streak elevates the probability that the same outcome will occur again. In a famous demonstration, Gilovich et al. (1985) found that basketball players’ shooting accuracy was independent of their recent performance: the probability of scoring was the same after a run of “baskets” as after a run of “misses”. However, basketball fans believed that a player’s next shot was more likely to score after a run of successful shots than after a run of misses – a so-called “belief in the hot hand” or “hot hand fallacy”.
Gilovich et al.’s statistical analysis has been questioned (it is hard to establish that the outcomes of each basketball shot really are independent events), but the hot hand fallacy has been found in situations where the success of consecutive attempts really cannot be any guide to future performance. For example, Ayton and Fischer (2004) had people play a roulette-style game and found that their confidence in their predictions for the next outcome was greater after a run of successful predictions – even though the probability of them being right next time must be independent of past success because the roulette spins are random.
Belief in the hot hand has again been attributed to the representativeness heuristic: a run of one outcome doesn’t seem representative of randomness, leading people to conclude that the underlying process is not random (Gilovich et al., 1985).

Some researchers have objected that it is problematic to use the same mechanism to “explain” two completely contradictory findings (belief that a streak will end in the GF and that it will continue in the HH).
Ayton and Fischer (2004) therefore offered an alternative account, based on ecological considerations. Their argument runs:
* Many physical processes involve sampling without replacement, which results in diminishing probability for a given outcome the more times that it has occurred. For example, if you rummage blindly in your cutlery drawer for spoons, removing the spoons as you find them, then the probability that the next item will be a spoon decreases as your hunt progresses.
* Correspondingly, the GF reflects an over-generalization of this ecologically-sensible principle to other random, mechanical processes – e.g., roulette wheels and coin tosses – about which we have very limited experience.
* By contrast, many aspects of intentional human performance really do show positive recency. If you practice a new game, your shooting success will increase. So the hot hand fallacy can be seen as an appropriate generalization of this principle to situations which also use human performance, but where the outcome probabilities are in fact independent.
In support of these ideas, Ayton and Fischer (2004) presented sequences of outcomes with varying alternation rates (AR; a low AR means the next outcome is unlikely to be different from the last, giving many long runs of one outcome; a high AR means lots of short runs). Participants had to judge which of two processes generated each sequence (e.g., a series of basketball shots or a sequence of coin tosses). As the streak length increased, participants were more likely to attribute the sequence to intentional human performance like basketball than to a random mechanical process like coin-flipping.
A related but distinct account comes from elegant work by Hahn and Warren (2009). With an infinitely long sequence of coin flips, all sequences of a given length occur with equal probability – for example, the sequence HHHH will occur with the same frequency as HHHT, so believing that a run of heads means it’ll be tails next time is indeed a fallacy. However, Hahn and Warren noted that humans do not experience or remember infinitely-long sequences – and for shorter sequences, the probability of encountering HHHT and HHHH are not equal. In one illustration, Hahn and Warren simulated 10,000 sequences of 10 coin flips. The pattern HHHH only appeared in about 25% of the sequences, whereas HHHT occurred in about 35% of the simulated samples. In other words, if we had 10,000 people each of whom had experienced 10 flips of a fair coin, it would be perfectly reasonable for more of them to expect a sequence HHH to end with a T than with another H.
This work provides one example of a broader principle – namely, that the supposed “fallacies” of human judgment and decision-making are often perfectly rational given the finite and imperfect information afforded by the environment and our limited mental capacities.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Conclusion for judging probability

A

Conclusions
We have identified several key phenomena and ideas:
1. Human probability judgments do not always follow the laws of probability, and these deviations illuminate the judgment process.
2. One broad framework posits the use of simplifying strategies that reduce effort at the expense of sometimes introducing bias. In particular, people sometimes simplify judgments by substituting an easier-to-evaluate entity for the target dimension: the availability and representativeness heuristics are two examples.
3. Judgments often assimilate towards anchor values. There are many types of anchor and many mechanisms that underlie this assimilation.
4. We can also consider probability judgments in their ecological context. One idea is that humans evolved to process frequencies, not normalizing probabilities, although this interpretation of frequency-format effects is debatable.
5. Likewise, we can see phenomena such as the gambler’s fallacy as reflecting ecological experience with different types of generating process.
6. We have focused on probability judgments, but these kinds of ideas and effects apply to many other kinds of judgment.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Lecture 2: Reasoning

A

We will discuss how human reasoning deviates from the prescriptions of formal logic in a range of tasks, and how the systematic patterns of success and failure on these tasks inform our theorizing about the mental operations that underlie human reasoning. Four broad approaches will be considered:
1. People may solve reasoning problems by using simple heuristics (rules of thumb) rather than engaging actual reasoning processes
2. People often make “errors” because the use of language in formal logic differs from that of everyday life
3. The “Mental Models” framework provides an example of an algorithmic description of the steps by which people reason
4. Responses in reasoning tasks are highly sensitive to the framing of the task and the participant’s background beliefs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Two types of reasoning

A

Inductive reasoning involves drawing general conclusions from particular instances. For example, given the premise “I have fallen asleep in every Psychology lecture so far”, one might draw the conclusion “I will always fall asleep in Psychology lectures”. Inductive reasoning takes many forms and is central to scientific research, but the conclusions are not necessarily true; there is always the possibility that the next Psychology lecture will manage to hold your attention throughout.
Deductive reasoning involves drawing conclusions which follow necessarily from the premises; if we accept that the premises are true, and if the argument follows the rules of logic, then the conclusion has to be true, too.
We will focus on two kinds of deductive reasoning – propositional reasoning and syllogistic reasoning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Two types of deductive reasoning

A

propositional reasoning and syllogistic reasoning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Syllogistic reasoning

A

The study of Aristotelian syllogisms (aka quantitative syllogisms) provides an alternative approach to the psychological processes that underlie reasoning. Syllogisms typically comprise two premises and a conclusion, and involve the quantifiers all, no, some, and some…not.
The following is an example:
All people who teach psychology are psychologists
Jon teaches psychology
Therefore, Jon is a psychologist
Such arguments may be valid or invalid. Validity is determined by the structure of the argument – the relations between the premises and the conclusion. A valid argument is one where, if one accepts the truth of the premises, then the conclusion is also true. The above example is a valid argument. Of course, one might not accept the premises (in fact, Jon doesn’t have a degree in psychology, he just works as one), but that doesn’t change the validity.
The combination of quantifiers (all, no, some, some…not) and order of terms (e.g., all a are b vs all b are a) gives a total of 512 two-premise syllogisms, most of which are regarded by logicians as invalid.
Studies of syllogistic reasoning typically present the two premises and either ask participants “what follows?” or present a conclusion and have them indicate whether it is valid or invalid.
Imperfect performance
Despite their simple structure, syllogistic reasoning problems can be very hard. For example, in a review of the literature, Roberts and Sykes (2005) found that problems of the form: “all a are b; all b are c; what follows?” were correctly solved by 88% of participants (valid conclusion: “all a are c”). However, given a problem of the form: “all b are a; all b are c; what follows?” only 8% of participants correctly concluded that “some a are c” (or, equivalently, that “some c are a”).
By studying how structural features of the problem change performance, we can try to develop models of how people go about solving this kind of problem. We consider four approaches to understanding performance on these kinds of reasoning task.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

four approaches to understanding performance on syllogistic reasoning task.

A

Approach 1: Identify simplifying strategies
Approach 2: Focus on interpretation of the terms
Approach 3: Posit a sequence of processing steps – the “Mental Models” framework
Approach 4: Consider the role of framing and experience

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Approach 1: Identify simplifying strategies

A

One suggestion is that many people do not actually engage in any reasoning at all when confronted with syllogistic reasoning problems. Rather, they may base their responses on simple heuristics.
An early example is atmosphere theory, according to which the mood of the premises influences judgments about what the mood of the conclusion should be. “Mood” means whether the statement is affirmative or negative, and whether it is universal or particular. (E.g., “all…” is universal and affirmative, whereas “some are not…” is particular and negative).
Begg and Denny (1969) gave participants 64 reasoning problems comprising two premises and a choice of four conclusions. For example:
All a are b
All b are c
All c are a Some c are a No c are a Some c are not a
Participants indicated which if any of the 4 conclusions followed from the premises. Nineteen of the 64 problems had a valid solution among the 4 options presented; the authors focussed on responses for the other 45 problems, where choosing any of the options constituted an error.
* When both premises were positive, 79% of conclusions endorsed were positive
* When at least one premise was negative, 73% of chosen conclusions were negative
* When both premises were universal, 77% of chosen conclusions were universal
* When at least one premise was particular, 90% of chosen conclusions were particular
So this is evidence that the “atmosphere” (quality and quantity) of the premises shapes beliefs about the validity of different possible conclusions – e.g., universal premises lead people to assert universal conclusions.
Crucially, however, this fails to explain why/how participants decide whether or not a syllogism has a valid conclusion, yet when participants are given two premises and asked “what follows?”, they correctly identify that there is no valid inference 29-40% of the time (Roberts & Sykes, 2005). The idea that their conclusions are guided by the “atmosphere” of the premises doesn’t capture this.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Approach 2: Focus on interpretation of the terms

A

“Errors” in syllogistic reasoning partly reflect differences between the use of language in formal logic and in everyday life.
For example, consider two arguments:
VALID INVALID
All A are B All A are B
All B are C All C are B
Therefore, All A are C Therefore, All A are C
If I take “All C are B” to mean that “All C are B and vice-versa” then the invalid argument on the right would be equivalent to the valid one on the left, and it would be fine to accept the conclusion. Likewise, in logic “Some” means “Some and perhaps but not necessarily all”, but in everyday speech we typically use “Some” to mean “Some but not all”.
In one demonstration, Ceraso and Provitera (1971) presented wooden blocks and had people reason about their properties. In the “traditional” version of the task, people were given syllogisms such as:
All blocks with holes are red
All blocks with holes are triangular
Only 1 out of 40 people correctly identified “Some red blocks are triangular” as the valid inference; more than half endorsed “All red blocks are triangular”, which is what we’d expect if they take “All A are B” to imply “All B are A”.
In a modified version of the task, people were given more explicit instructions about the interpretation of the premises, such as:
Whenever I have a block with a hole it is red, but not all red blocks have holes
Whenever I have a block with a hole it is triangular, but not all triangular blocks have holes
The proportion of people who correctly responded “Some red blocks are triangular” rose to 27 out of 40. Across a number of such problems, people scored an average of 58% correct with the traditional format but 94% correct with the modified versions.
So, these authors argue that syllogistic reasoning errors arise because people don’t properly apprehend the premises in the way that the experimenter intends. However, it is unlikely that premise misapprehension accounts for the full spectrum of performance on this kind of task.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Approach 3: Posit a sequence of processing steps – the “Mental Models” framework

A

A more sophisticated and very wide-ranging account of the mental operations that underlie reasoning comes with the mental models framework developed by Philip Johnson-Laird (e.g., Bucciarelli & Johnson-Laird, 1999; Johnson-Laird, Byrne, & Schaeken, 1992). The mental models approach has been applied to many types of reasoning problem; it posits that reasoning involves three stages:
1. Comprehension: use language and background knowledge to construct a mental model of the state of the world that is implied by the premises
2. Description: combine the models implied by the premises into a composite, and use this to try to draw a conclusion that goes beyond re-iterating the premises.
3. Validation: search for alternative models. If all of these are consistent with the initial conclusion, it is judged valid; if one or more of the new models contradict the conclusion, reject it and try to construct an alternative which can then be validated.
For example, consider the following:
All Psychologists are Comedians
All Comedians are Psychopaths
What follows?

We start by constructing a model of the first premise:
Psychologist Comedian
Psychologist Comedian

Each row represents a conjunction of items (e.g., a psychologist-comedian); you generate an arbitrary number of instances of each case – I have listed two above, but for simplicity we could list just one. (Sometimes authors would write the “Psychologist” exemplars in square brackets to signify that the Psychologist item is exhaustively represented – it cannot appear in any other possibility and must always be paired with “Comedian”.) The three dots signify that there are other instances and possibilities that could be represented but which aren’t yet included in the model. In particular, we could include something that is a non-psychologist comedian, or a non-psychologist non-comedian.
Likewise, the second premise leads to a model like this:
Comedian Psychopath
Comedian Psychopath

During the “description” stage, the reasoner attempts to construct an integrated representation of the information in the premises. For this example, only one such model can be constructed:
Psychologist Comedian Psychopath
In this case, the “validation” step would fail to find any other models, from which it follows that “All Psychologists are Psychopaths” – the correct inference (well, the valid conclusion!).
This kind of one-model syllogism should be relatively easy to solve because there is only one model that is consistent with the premises. Other multiple-model syllogisms are more challenging, because there are several possible ways of combining the information in the premises. For example:
No Artists are Bakers
All Bakers are Candlemakers
What follows?

Here we might construct an initial model (using A, B, and C for the Artists, Bakers, and Candlemakers):
A
B C

Which would lead to the preliminary conclusion that “No Artists are Candlemakers” (or that “No Candlemakers are Artists”).
However, searching for alternative models during the validation step reveals that a second model is possible:
A
A C
B C

This model acknowledges the possibility of an artist-candlemaker, which refutes the initial conclusion. A new conclusion, consistent with both of the models, is that “Some Artists are not Candlemakers”. However, a third model can also be constructed:
A C
B C

Again, this refutes the previous conclusion. The only conclusion that is consistent with all three mental models is that “Some Candlemakers are not Artists”.
The mental models approach also describes what happens when there is no valid conclusion. For example, consider:
No Aardvarks are Bigots
No Bigots are Chocolate-lovers
What follows?

An initial model might be:
A
B
C

Leading to the conclusion “No A are C”. However, one can also construct a model:
A C
B
in which all A are C. There is no conclusion that is consistent with both models, so the correct response is “no valid conclusion”.
So:
* If a reasoner fails to consider all of the alternative models, he or she is less likely to draw the correct inference – so multiple model syllogisms will be harder than single-model ones
* Considering more models will require more time, effort, and processing-capacity – so multiple-model syllogisms will take longer to solve, and people with greater working memory, or those with more time/inclination to work on the task, will do better
Copeland and Radvansky (2004) tested these predictions:
* First, participants completed a working memory span assessment
* Next, participants were presented with syllogisms such as “All cyclists are coffee drinkers; All coffee drinkers are surgeons” along with all 9 possible conclusions (the 8 combinations of the two end terms “Cyclists” and “Surgeons” with the four quantifiers “All”, “None”, “Some” and “Some…not”, plus the option “no valid conclusion”).
The following table shows accuracy and response-time data, organized according to the number of models supported by each syllogism:

% correct	RT (s) One-model	87	25 Two-model	40	29 Three-model	34	33

As you can see:
* Problems with more possible mental models were solved less accurately, consistent with people failing to consider all of the possible states implied by the premises
* Problems with more possible models were solved progressively more slowly, consistent with it taking time to construct each model and check the validity of preliminary conclusions
* In addition, participants with higher WM span were more accurate and faster at the reasoning tasks, particularly for more complex syllogisms, consistent with model-construction being a resource-intensive activity
* Analysis of the response choices showed that they were better predicted by the mental models theory than by simple heuristics such as the “atmosphere” approach described above
These data fit with the idea that model construction is a time- and resource-demanding activity. However, they are not direct evidence for the mental model strategy – indeed, responses were similarly well-described by an alternative “probability heuristics” model.
Newstead, Handley, and Buck (1999) sought more direct evidence. Participants were given premises such as:
All of the buskers are computer operators
None of the computer operators are boxers
Participants had to write down the conclusion (or if there was no valid conclusion). Straight after answering each problem, participants were given a list of the 9 possible conclusions and indicated which they had considered when coming up with their response. There results were as follows:
Single-model syllogisms Multiple-model syllogisms Indeterminate syllogisms (where no valid conclusion can be drawn)
% correct 70 12 19
Number of conclusions considered 1.05 1.12 1.12

So, although multiple-model syllogisms are harder, people did not try to construct more models for them. In addition, there was no correlation between the number of models constructed and the proportion of syllogisms solved correctly.
Newstead et al. (1999, p.354) argue that “reasoners are able to construct alternative models – for example, when their first model leads to an unbelievable conclusion – but that they normally construct only one”.
If this is correct, a key issue would be: what determines how the initial model is constructed? The next section offers some ideas.

19
Q

Approach 4: Consider the role of framing and experience

A

A final approach to the study of reasoning emphasizes the contribution of background knowledge and the role that reasoning plays in natural conversation. Syllogistic reasoning is affected by the framing of the problem and the participant’s prior experiences. One crucial demonstration came from Evans et al. (1983), who gave people valid and invalid syllogisms with believable and unbelievable conclusions. Examples are shown in the table, along with the mean proportion of people who accepted each argument as valid for each type of problem:

Believable conclusion	Unbelievable conclusion Valid	No cigarettes are inexpensive Some addictive things are inexpensive

Therefore, some addictive things are not cigarettes (89%) No addictive things are inexpensive
Some cigarettes are inexpensive

Therefore, some cigarettes are not addictive (56%)
Invalid No addictive things are inexpensive
Some cigarettes are inexpensive

Therefore, some addictive things are not cigarettes (71%) No cigarettes are inexpensive
Some addictive things are inexpensive

Therefore, some cigarettes are not addictive (10%)

In this study, plausibility increased the judged validity of both valid and invalid arguments: judgments about argument validity are influenced by beliefs both about the conclusions themselves and about the probability that those conclusions will be true.
There have been many attempts to explain this so-called belief bias.
* The selective scrutiny hypothesis – another example of a heuristic approach to reasoning – posits that people initially evaluate the plausibility of the conclusion. If it is reasonable, they accept it – without engaging in any actual “reasoning” at all; scrutiny of the logical connection between premises and conclusion only arises when the conclusion is unbelievable.
However, in the example from Evans et al. (1983) above – and in a meta-analysis of similar studies by Klauer et al. (2000) – we saw that validity does affect the acceptance of believable arguments (that is, people reject invalid arguments with plausible conclusions).
* The misinterpreted necessity hypothesis conjectures that people don’t know how to respond when a conclusion is possible but not logically necessary. (E.g. All A are B; All B are C; Therefore, all C are A. It might be true that all C are A, but it is not a logical necessity.) In such cases, they might use believability to make their decision.
However, as you can see in the above example (and in other studies), belief influences acceptance even when conclusions are deductively valid – it is not limited to indeterminate uncertainty.
Rather than belief influencing judgments before (or instead of) reasoning (selective scrutiny) or after reasoning (misinterpreted necessity), some have argued that belief exerts two separate effects: (1) inducing an overall bias to accept/reject the conclusion, and (2) shaping the reasoning process itself.
Klauer et al. (2000) developed a framework which incorporates this idea (see also Evans et al., 2001). The basic ideas are that:
* People typically generate just one mental model, because of capacity limits (see also the evidence from Newstead et al., 1999, above)
* If the conclusion is believable, people attempt to construct a model that is consistent with this claim
* If the conclusion is unbelievable, they attempt to construct a model which refutes this claim (e.g., if the unbelievable conclusion was “All A are C”, they would try to construct a model where “Some A are not C” that was consistent with the information in the premises).
* When the attempt to construct the “desired” model fails, the participant is in a state of uncertainty and will be somewhat swayed by their belief about the base-rate probability that the conclusion is valid.
The nice feature of this theory is that it combines a description of the cognitive operations by which people reason with the idea that these operations will be shaped by prior beliefs and biases – and in so doing it captures the interacting effects of validity, believability, and base-rates reported by Klauer et al. (2000).
However, we need to be cautious. Evidence that belief bias affects reasoning rather than just leading to an overall boost to the probability of responding “valid” comes from the interaction between believability and validity (the effect of believability is larger for invalid arguments). But this analysis uses “proportion correct” as the index of performance. As you might know from studies of perception and memory, using proportion correct means assuming that there is a linear relationship between hit rate and false alarm rate, but this is rare; rather, data such as these are often more appropriately analysed in a signal detection framework. When researchers have applied a signal detection analysis to the effect of believability on people’s ability to discriminate between valid and invalid arguments, they often find a criterion shift (i.e., a bias towards judging all arguments Valid when they have plausible conclusions) but no effect on discriminability (the underlying ability to distinguish valid from invalid arguments). This argues against the idea that prior knowledge/belief qualitatively changes the reasoning process in the manner envisaged by the selective scrutiny or mental models accounts (see, for example, Trippas et al, 2018).

20
Q

Propositional Reasoning

A

The same four approaches can be found in studies of propositional reasoning. Propositional reasoning involves reasoning about propositions containing the conditionals: If, And, Not, and Or.

For example, consider the following proposition:

If it is raining, then I take the bus.

Now suppose you learn that it is raining, and infer that I therefore took the bus. This is a valid type of inference, called the modus ponens (MP). Equally, suppose you learn that I did not take the bus, and conclude that it is not raining. This, again, is a valid inference, called the modus tollens (MT). Now suppose you learn that I took the bus and conclude that it must be raining. This is called affirming the consequent (AC) and is usually regarded as a fallacy (I might have taken the bus even though it was sunny). Likewise, you might learn that it is not raining, and conclude that I did not take the bus. This is called denial of the antecedent (DA), and is again regarded as an error (because again, I might take the bus in the sunshine).

Studies of this kind of reasoning task often use abstract constructions such as “if p then q” to minimize any contribution of background knowledge/beliefs about the likely truth of different possible conclusions. Schroyens et al. (2001) collated data from many experiments and examined the frequency with which people made each of the four types of inference, as follows (with reminders of how these would apply to the “if it is raining…” example in parentheses).

Modus ponens (MP)

(It is raining. Conclude I took the bus.) Denial of the antecedent (DA)

(It is not raining. Conclude I did not take the bus.) Affirmation of the consequent (AC)

(I took the bus. Conclude it is raining.) Modus tollens (MT)

(I did not take the bus. Conclude it is not raining.)
96.8% 56.0% 64.0% 74.2%

Clearly, people frequently commit the “fallacies” of denying the antecedent and affirming the consequent.

Related to propositional reasoning is a famous task – the four-card selection task – developed by Wason (1968). Wason laid out four cards in front of participants and told them that each card has a number on one side and a letter on the other. The cards were like this:

D K 3 7
They were also given the conditional sentence:
“If there is a D on one side, then there is a 3 on the other side.”
The Experimenter pointed to each card in turn and asked whether knowing what was on the other side would allow him to find out whether the rule was true. (In more recent versions, the participant is simply asked to indicate only those cards that they’d need to turn over to see whether the rule is true or false).
Turning over the D card is a fairly obvious step; if there was anything other than a 3 on the reverse, it would mean the rule was false, so this is a “correct” card choice. Typically, a large proportion of participants also choose the 3 card, presumably thinking that there should be a D on the other side. However, this is not the “right” answer; the rule doesn’t say that there has to be a D on the other side of every 3, so this card is irrelevant. The correct choice is the 7, because if there is a D on the other side of that then rule is false.
However, in Wason’s original study, only 1 out of 34 people choose D and 7. So people’s reasoning deviates from formal logic.
More generally, the rule that participants have to test can be phrased as:
“If P then Q”
and the cards as:
P not-P Q not-Q
89% 16% 62% 25%

with the correct choice being P and not-Q. The numbers below each option are the proportions of participants who selected each card, collapsing across a large number of experiments using the task (Oaksford and Chater, 1994). Clearly, the “failure” to choose the P,not-Q combination is widespread.
Early theorizing attributed this to a confirmation bias – a tendency to seek evidence that the rule is true rather than trying to falsify it. Indeed, the task was seen as relevant to broader debates about how scientists can and should approach theory testing. However, this idea was quickly shown to be inadequate (see below).
Again, we can consider four approaches to understanding performance on these kinds of reasoning problems.

Approach 1: Identify simplifying strategies
Just as for syllogistic reasoning, approach to propositional reasoning involves identifying the simplifying strategies (heuristics) that people sometimes use to reach a solution. For the 4-card selection task, one potential strategy was uncovered by Evans and Lynch (1973), who varied the conditional rule that participants had to test. Examples, along with the proportion of people selecting each card, are shown below:
S 9 G 4
If there is an S on one side, then
there will be a 9 on the other side
(If P then Q) 88% 50% 8% 33%
If there is an S on one side, then
there will not be a 9 on the other side
(If P then not-Q) 92% 58% 4% 8%

Confirmatory testing should lead people in the second row to select S and 4 (i.e., to seek instances where the rule is true), but in fact participants selected the cards which were mentioned in the rule (S and 9). In the case of “if P then not-Q”, this actually leads to the logically correct response!
Thus, simply choosing items that are explicitly mentioned in the problem statement – a “matching heuristic” – might be one simplifying strategy when faced with this kind of task.

Approach 2: Focus on interpretation of the terms
As for syllogistic reasoning, propositional reasoning “errors” may often reflect the participant’s interpretation of the terms. For example, in the selection task: “If there is an D on one side…” might be taken to mean “If there is an D on the top of the card [i.e., on the part I can see]…”. In this case, the only card you need to turn over to check the rule is the “D” card.
Likewise, some people might take “If” to be biconditional – i.e., to mean “If and only if…”, so that “If P, then Q” also means “If Q, then P”). Under this interpretation of “If”, the participant would need to turn over all 4 cards – or just the “P” and “Q” cards if they think the rule applies to the visible faces of the cards.
Gebauer and Laming (1997) argued that the common selection of “P” and “Q” results from just this pattern of understanding of the rule (see also Osman & Laming, 2001). You can see how the biconditional interpretation of “If” would also to Affirming the Consequent and Denial of the Antecedent fallacies.

Approach 3: Posit a sequence of processing steps – the “Mental Models” framework
Heuristic responding (e.g., the matching rule) and interpretational confusion don’t capture all of the challenges posed by people’s conditional inferences. For example, why is the Modus Tollens harder than the Modus Ponens? The error rates for these should be the same no matter how people interpret “If”, but the MT is more difficult. Likewise, those accounts don’t describe what actually happens when people engage in reasoning.

Johnson-Laird and colleagues have applied their mental models approach to propositional reasoning, too.
For example, suppose you are given the conditional proposition:
If there is a Circle, then there is a Triangle
The idea is that this leads to an initial model which just relates the items explicitly mentioned in the conditional rule:
Circle Triangle

Like before, each row represents a conjunction of items (a circle paired with a triangle). The categorical premise “There is a Circle” then leads easily to the conclusion “There is a Triangle” (the valid, Modus Ponens argument). Likewise, the initial model leads to the Affirmation of the Consequent (AC) fallacy: from the model, the premise “There is a Triangle” leads to the conclusion “There is a Circle”.
In contrast, being told “There is not a Triangle” leads to no inference because this initial model does not include any representation of “No Triangle” cases. As we saw above, people do often fail to draw the valid, Modus Tollens inference that “There is not a Circle”.
The AC fallacy and the failure to draw the valid MT inference will both be avoided if we expend the mental effort to “flesh out” the other mental models that are consistent with the information in the conditional by constructing models that explicitly incorporate “No circles” and “No triangles”:
Circle Triangle
No Circle No Triangle
No Circle Triangle
With this fully explicit set of possibilities, the “No Triangle” premise leaves us with only one model – that in which there is “No Circle” (i.e., we draw the Modus Tollens inference). Likewise, we avoid affirming the consequent – given the premise “There is a Triangle”, inspecting the models reveals that the presence of a Triangle does not permit a conclusion about the presence or absence of a Circle.
The same kind of approach can be used for Wason’s selection task (see e.g., Ragni et al., 2018). Just as for syllogistic reasoning, there is disagreement about the adequacy of the mental models approach (e.g., Baratgin et al., 2015).

Approach 4: Consider the role of function and experience
As before, framing and experience shape people’s responses in propositional reasoning tasks, and these effects illuminate the underlying mental processes.
Focusing on the selection task: one early observation was that performance is made easier by using familiar materials. Another key finding is that the selection task seems to be easier when it is cast in terms of familiar social rules rather than abstract symbols.
For example, Griggs and Cox (1982) asked people to imagine that they are police officers responsible for ensuring that people conform to the rule “If a person is drinking beer, then the person must be over 19 years of age”. Each card represented a person, with their age on one side and what they are drinking on the other; the task was to “select the card or cards that you definitely need to turn over to determine whether or not people are violating the rule”.
The cards were:
Drinking a Beer Drinking a Coke Age 22 Age 16
The same participants also completed a standard, abstract version of the task with the rule “If a card has an ‘A’ on one side, then it has a ‘3’ on the other side” and the cards A, B, 2, and 3. (Task order was counterbalanced.)
* With the Abstract version of the task, no participant selected the P,not-Q combination (i.e., no-one choose A and 2).
* With the thematic version, it is much easier to see that we need to turn over the P and not-Q cards (i.e., “Drinking a Beer” and “Age 16”) to see whether the rule has been violated; and 29 out of 40 participants responded correctly.
Memory cueing?
Why did the change of format make the task so much easier? The task has subtly changed from reasoning about the tests needed to establish the truth/falsity of a proposition to identifying cases where a rule has been violated. Reasoning about obligations and permissible behaviours is called deontic reasoning, and is arguably a different type of thinking from that required by the abstract selection task.
One explanation for superior performance with the deontic version of the task is that people have prior experience with the rule in question; Griggs and Cox (1982) interpreted their results as reflecting retrieval of previous experience with the rule (and with instances that would violate it).
Support for this idea comes from cross-cultural studies in which thematic framing only improved performance for participants whose country has a rule of that kind (e.g., Cheng & Holyoak, 1985).
Cheater detection?
Familiarity with the materials cannot be the whole explanation for superior performance with modified versions of the selection task, however, because we also see improvements with rules that are completely novel.
For example, Cosmides (1989) administered versions of the selection task that involved a fictional Polynesian tribe (the Kaluame) and rules such as:
“If a man eats cassava root, then he must have a tattoo on his face”
with cards representing 4 men:
Tattoo No tattoo Eats cassava Eats molo nuts
The task is to see if the rule is being broken. In one condition, the rule was framed as a simple description of co-occurrence observed by an anthropologist. The proportion of participants who chose the “P, not-Q” combination (i.e., “Eats cassava” and “No tattoo”) was only 21%, similar to the proportion seen with abstract materials. However, when the same rule was framed as a social contract (cassava root being an aphrodisiac that the tribe’s Elders decree should be limited to married men, who are distinguished by having facial tattoos), the proportion of people who chose P not-Q rocketed to 75%.
Cosmides argued that humans have an evolved sensitivity to violations of social contracts, which can be thought of as conditionals of the form “If you take a benefit, then you pay a cost”.
In a subsequent test, Cosmides employed a “switched” version of the social contract rule:
“If a man has a tattoo on his face, then he eats cassava root”
Tattoo No tattoo Eats cassava Eats molo nuts
The logically-correct answer is now to turn over the “Tattoo” and “Eats molo nuts” cards, but only 4% of people did this. However, if people are still on the hunt for cheats, they will focus on the “no tattoo” and “eats cassava” cards. 67% of people did this.
Cosmides took these findings as evidence for evolved “social contract algorithms” which underlie human reasoning.
Although very famous, the evolved cheater-detection idea is deeply flawed. One basic problem is that we see facilitation of performance on the selection task (i.e., increased P, not-Q selections) with rules which cannot realistically be described as “If you take a benefit, then you pay a cost”. For example, Manktelow (p.84) reports a study in which the rule was “If you clear up spilt blood, then you must wear rubber gloves”. Approximately 75% of participants correctly choose the P, not-Q cards (i.e., Clearing up Blood, and Not Wearing Gloves) with this framing, but “clearing up spilt blood” doesn’t constitute a “benefit” from a social contract!
Relevance and Utility
A more general approach posits that choices in the selection task depend on the relevance or utility of the various cards to the question that they think they are being asked. There are various versions of this basic idea, but the general framework can be related to some of the foregoing findings:
* The items mentioned in the rule are likely to seem particularly relevant and, as we saw above, the matching bias suggests that people select these irrespective of the rule they are being asked to test
* In the “social contract” versions of the task studied by Cosmides, there is high utility attached to finding a cheat. Likewise, there is value in identifying a nurse who doesn’t wear gloves when clearing up blood
Girotto et al. (2001) provide evidence that it is the perceived relevance/value of the options, rather than the detection of rule-violations, which determines choices in the selection task. Participants took the role of employees in a travel agency, and went through four successive versions of the Selection Task, as follows:
* “True descriptive”. It is 1979. A customer would like to travel to East Africa but is allergic to the cholera immunization. You seek to show the customer that there is a rule that “If a person travels to any East African country, then that person must be immunized against cholera”. There are four cards representing countries and the vaccines they require: Somalia; Sweden; Requires Cholera; Requires None. Which cards do you need to turn over to find out whether the rule is true? 65% of people chose the P,Q combination (Somalia and Cholera); only 9% chose P,not-Q (Somalia, None). This replicated the usual finding of poor performance on the selection task
* “True deontic”. Now your boss asks you to check whether customers have followed the rule. The four cards represent travellers and their immunization status: Mr Neri, Ethiopia; Mr Verdi, Canada; Immunized, Cholera; Immunized, None. Which cards do you need to turn over to see if people have followed the rule? Now only 26% of participants made the P,Q selection (Neri, Cholera), whereas 62% chose P,not-Q (Neri, None)
At this point it looks like framing the task as one of rule-violation boosts the selection of the logically-correct P,not-Q combination, as predicted by theories that argue for specialized systems for deontic reasoning/cheater detection.
But the experiment continued…
* “False descriptive”. Now is it the present day and you are thinking about going to East Africa yourself, and are allergic to the cholera immunization. However, you think that the immunization is no longer required. Your boss disagrees. You are confronted with cards representing countries and their required immunizations: Kenya; Ireland; Requires Cholera; Required None. Which do you have to turn over to see whether it is true that “If a person travels to any East African country, then that person must be immunized against cholera”? Now 15% chose P,Q (Kenya, Cholera) and 47% chose P,not-Q (Kenya, None)
Although this is not a situation where we are invited to detect rule-breakers, people have been led to the correct P,not-Q selection by the perceived relevance of those options – the framing leads one to regard it as important to establish that the boss is wrong.
The final condition was:
* “False deontic”. It turns out that you were right and that the rule is no longer applicable. Your boss is worried that she may have mis-informed customers, and asks you to check client records to see whether customers have followed the rule. The four cards are: Mr Rossi, Eritrea; Mr Bianco, France; Immunized, Cholera; Immunized, None. 71% chose P,Q (Rossi, Cholera) and only 15% chose P,not-Q (Rossi, None).
Even though the framing is now deontic, people are making the classic selection-task-error of choosing P and Q. Why? Because the wording of the task gives great relevance to the possibility that people might have followed a rule unnecessarily (and might therefore sue the company, for example).
These effects reiterate that we have come a long way from studying “pure” propositional reasoning. Indeed, Sperber and Girotto (2002, p.277) argue:
“relevance-guided comprehension processes tend to determine participants’ performance and pre-empt the use of other inferential capacities. Because of this, the value of the selection task as a tool for studying human inference has been grossly overestimated.”

Integrating approaches
The approaches discussed above are not mutually exclusive. Different people will employ different strategies (e.g., heuristic responding vs a rigorous attempt to construct mental models) at different times. Furthermore, the effects of experience and relevance discussed in the preceding section are not incompatible with accounts that emphasize interpretation of terms or the construction of mental models. As just two examples:
* The interpretation of “If” as conditional or biconditional can depend on the content of the rule. For example, Wagner-Egger (2007) used 2 deontic versions of the 4 card task and probed not just participants’ card selections, but also their understanding of the rule (by asking what would have to be on the reverse of each card, assuming the rule is true). In one version, the rule was “If a customer is drinking beer, then he/she must be over 18 years of age”. Most participants interpreted “If” as a conditional (i.e., you might be over 18 and not drink beer!) and made the “correct” p-not-q (“beer”, “under 18”) card selection – like we saw in previous work. In another version, the rule was “If a customer spends more than 100 Swiss Francs, then he/she receives a free gift”. Now most participants interpret the rule biconditionally (i.e., as meaning “if and only if you spend the money do you get the gift”) and the “p-not-q” card selection pattern was less frequent than turning over all 4 cards (as one should, if one has adopted the biconditional interpretation of “If”).
* In the Mental Models framework, past knowledge affects the ease with which a full set of models will be fleshed-out. For example, “If it was foggy, then the match was cancelled. The match was not cancelled.” This readily leads to the usually-difficult Modus Tollens conclusion (“It was not foggy”) because existing knowledge of the fog-sports relationship means we readily flesh out the full set of mental models supported by the conditional “If, then” statement.
* Likewise, Mental Models theory can accommodate the effects of relevance and of differing interpretations of the premises (e.g., conditional vs biconditional), by assuming that these factors influence the formulation of the “initial” model and the fully explicit set of models (see Ragni et al., 2018).
The approaches and ideas discussed in this lecture are therefore not necessarily in opposition to one another.

The “New Paradigm”
In the past few years, there has emerged a new way of thinking about the kinds of tasks and effects that we have discussed. The core idea of this “new paradigm” is that the way humans approach such tasks rests upon the calculus of probability rather than the calculus of logic. Conventional logic requires us to accept statements such as “All A are B” or “If A, then B” as absolutely true, or to reject them as absolutely false. In real discourse, such statements involve degrees of probability or belief. For example, we might attach a high probability to the conditional: “If Abdul is a Baker, then he likes Cakes” – e.g., this statement might be judged to be likely to be true, but not certain (after all, nothing ever is). We can also assign some degree of belief to Abdul being a Baker and to him liking Cakes. If someone now asserts that “Abdul is a Baker”, we can update all of these beliefs – in Abdul’s status as a Baker, in his probable liking for Cakes, and potentially in the truth of the conditional “If, then” statement, too. There is an extensive and complex body of work in this area, which is beyond the scope of the current lectures – but the suggested reading gives an introduction to this approach.

21
Q

Conclusions for deductive reasoning

A

Reasoning is a huge topic. Nonetheless, general principles emerge:
1. Human reasoning often deviates from formal logic
2. We can identify simplifying strategies that capture some aspects of performance
3. People often interpret the terms of reasoning problems differently from the intended meaning of the experimenter, but otherwise reason appropriately
4. We can try to develop detailed accounts of the steps by which people reason; Mental Models theory is one prominent examples
5. Responses are greatly influenced by the framing of the problem, the participant’s background knowledge, and the way that they interpret the task
6. In fact, these contextual and interpretational effects mean that, in many cases, our “reasoning” experiments may not be studying the types of reasoning that the experiments originally intended at all!
7. Nonetheless, examining the patterns of performance across multiple versions of the tasks can illuminate the mental processes that underlie performance on these tasks and, more generally, tell us something about how people “think” when they tackle complex problems

22
Q

Lecture 3: Risky Choice

A

This lecture will cover some of the key findings and theories in the cognitive psychology of risky choice, focusing on “decisions from description”. We will examine:
* How typical behaviour deviates from the prescriptions of conventional theories of rational choice
* How these results led to Prospect Theory as a model of human decisions under risk
* Some of the empirical and conceptual problems/limitations of Prospect theory
Introduction
A decision is a choice between alternatives that is intended to produce a desired or favourable outcome.

23
Q

Types of choices

A

multi-attribute choice
Most decisions involve multi-attribute choice – one must select between 2 or more options that differ in 2 or more attributes (e.g., choosing between 3 phones that differ in price, screen size, and battery life).

inter-temporal choice
In inter-temporal choice, one of the attributes that varies is time (e.g., would you rather receive £10 right now, or £25 one year from today?) -Heavily discount the future, the discount rate of people vary. Using this reasoning scenario, we could find the choice of people whether planning along the future, this might vary due to personal diversity.

risky choice
In risky choice, one or more of the possible outcomes are probabilistic (i.e., they are not certain to occur). Sometimes the probabilities are not known precisely, in which case the decision may be referred to as “under uncertainty” or “under ambiguity” (there is some disagreement about terminology; some authors use “risk” and “uncertainty” interchangeably to mean situations whose outcomes are probabilistic.)

24
Q

Risky choices are made “from description” (information about the options is explicitly presented – e.g., in writing); other choices are made “from experience” (the decision-maker has to learn the outcomes and their probabilities by repeatedly sampling the environment).

A

Sometimes risky choices are made “from description” (information about the options is explicitly presented – e.g., in writing); other choices are made “from experience” (the decision-maker has to learn the outcomes and their probabilities by repeatedly sampling the environment). There are important differences between these two kinds of task (see e.g., Wulff et al., 2018).
The archetypal paradigm for studying risky choice involves presenting a choice between gambles, such as:
A. An 80% chance of £4000 (and a 20% chance of nothing)
B. £3000 for sure
Would you rather play A or B?
The role of this type of choice task in studies of decision-making has been compared to the study of Drosophila in cell biology, and has led to sophisticated accounts of how people evaluate and choose between risky prospects.

25
Q

Expected value

A

One potentially-rational way to choose between gambles would be to calculate the expected value (EV) of each option by weighting each outcome by its probability:
EV = p1a1 + p2a2 + …pnan
Where the p and a values are the probability and amount that make up the option. For the example above, this would mean:
A. An 80% chance of £4000. EV = (0.80 x 4000) + (0.20 x 0) = £3200
B. £3000 for sure. EV = 1.0 x 3000 = £3000
So option A is better; if we played this game over and over again and always chose option A, we’d end up an average of £200 better off each time.

80% if people choose B, this means they are risk averse for gains.
This clearly doesn’t describe people’s choices. When Kahneman and Tversky (1979) gave people the choice between A and B (using Israeli currency rather than sterling – but here and throughout we use £ signs to make things more concrete), 80% chose the guaranteed 3000 rather than the risky option with higher expected returns – they were risk averse for gains.

26
Q

Expected utility

A

The concave function with higher rewards come with higher utility. Like the 4000 and the 3000 can convert the utility with the risk taken into consideration. So the risk actually would reduce the utility per $. So the value of the reward isn’t the same number with the addition of risks on to the original value.
Economists long ago replaced expected value with expected utility. Utility can be thought of as the subjective value of an outcome, and is some transformation u(a) of the objective amount. The expected utility of an option is:
EU = p1u(a1) + p2u(a2) + …pnu(an)
(Expected value is then a special case where u is the identity function.)
Crucially, decisions made according to expected utility are rational, in the sense that they conform to and follow from a set of axioms whose reasonableness it is hard to dispute (for example, that if A is preferred to B and B is preferred to C then A is preferred to C).
EU theory readily accommodates risk aversion by positing that the utility function is concave – that is, people have diminishing sensitivity to increasingly large gains so that the subjective value of (for example) £200 is not twice that of £100. The graph below illustrates a concave utility function:

Returning to our example:
A. An 80% chance of £4000: EU = 0.8 x u(4000) + 0.2 x u(0)
B. £3000 for sure. EU = 1.0 x u(3000)
The preference for B simply requires that the utility of £3000 is more than 80% of the utility of £4000, which we can see is the case with the utility function plotted above.
Violations of expected utility and the rise of Prospect Theory
Expected utility remains a key idea in economics, and is a good normative model (a statement of how people should make decisions), but it has been comprehensively debunked as a description of how people actually behave. The catalogue of violations of EU theory is too great to cover in detail, so we will focus on some key points which led to the development of a major alternative, Prospect Theory.

27
Q

Reference dependence

A

A fundamental problem is that EU theory concerns the subjective value of final outcomes – one’s ultimate state of wealth after playing a gamble, for example. But real decisions often violate this principle.
For example, Kahneman and Tversky (1979) presented two decision tasks:
Task 1: In addition to whatever you own, you have been given £1000. You are now asked to choose between:
A. A 50% chance to gain £1000
B. Gaining £500 for sure
Task 2: In addition to whatever you own, you have been given £2000. You are now asked to choose between:
C. A 50% chance to lose £1000
D. Losing £500 for sure
For Task 1, 84% of participants chose the sure gain – as we saw before, people are often risk-averse for gains. For Task 2, 69% chose option C, the riskier choice.
Crucially, the final-state entailed by option C is the same as for option A – a 50-50 chance of ending up £1000 or £2000 better off than you were before the task; likewise, options B and D both mean a guaranteed net increase of £1500. So surely I should either prefer A and C or B and D?

The preference reversal between the two versions of the task violates rationality and the Expected Utility account of decision-making.

28
Q

The Prospect Theory value function

A

Rather than basing decisions on the expected utility of end-states, people seem to focus on changes in wealth with respect to a reference point. This reference point is usually the status quo, but may also be an aspiration level or some other salient value.
Kahneman and Tversky’s (1979) Prospect Theory posits an S-shaped value function which is concave for gains and convex for losses, where gains and losses are defined with respect to the current reference point. That is, people show diminishing sensitivity to progressively larger increases from the reference point, and diminishing sensitivity to progressively larger decreases from the reference point, too. The value function is sketched below.

In Task 1 above, the initial endowment of £1000 is assimilated in to the current reference point, and both options A and B are construed (and explicitly represented in the text of the problem) as gains. Diminishing sensitivity means that the guaranteed gain of £500 has more than half the subjective value of a gain of £1000, so people take the safe option.
In Task 2, the initial £2000 endowment again enters the “status quo”, and the options are construed as involving a loss. Diminishing sensitivity means that the sure loss of £500 seems worse than a 50% chance of losing a full £1000, so people are prepared to take the risk.
This framing effect is not just seen with money. For example, Tversky and Kahneman (1981) famously presented people with the following problem:
“Imagine that the US is preparing for an outbreak of an unusual disease, which is expected to kill 600 people. Two alternative programs to combat the disease have been proposed…

The framing effect leads to different choices which the expected value is exactly the same.
If program A is adopted, 200 people will be saved
If program B is adopted, there is 1/3 probability that 600 people will be saved and 2/3 probability that no people will be saved”
72% chose program A, the “safe” program.
In a separate version, the programs were re-framed in terms of the number of lives lost rather than saved:
If program A is adopted, 400 people will die
If program B is adopted, there is 1/3 probability that nobody will die and 2/3 probability that 600 people will die
Now 78% chose the “risky” option B. Again, framing identical final outcomes as gains or losses has caused a preference reversal, consistent with the idea that there is an S-shaped value function centred on a reference point.

One other feature of the S-shaped value function posited by Prospect Theory is that it is steeper in the loss-domain than it is for gains – typically, it’s assumed to be about twice as steep. This reflects the idea that “losses loom larger than gains” – that a loss of a given magnitude has greater subjective magnitude than a gain of the same size.
As an example, would you take a bet that offered a 50% chance of winning £1000 and a 50% chance of losing £1000? Most people say “no”, demonstrating loss aversion.
(risk seeking for gain, risk aversion for loss)
Meaning loss has a higher expected value of loss than winning even though the same probability and price.

29
Q

Loss aversion

A

One other feature of the S-shaped value function posited by Prospect Theory is that it is steeper in the loss-domain than it is for gains – typically, it’s assumed to be about twice as steep. This reflects the idea that “losses loom larger than gains” – that a loss of a given magnitude has greater subjective magnitude than a gain of the same size.
As an example, would you take a bet that offered a 50% chance of winning £1000 and a 50% chance of losing £1000? Most people say “no”, demonstrating loss aversion.
(risk seeking for gain, risk aversion for loss)
Meaning loss has a higher expected value of loss than winning even though the same probability and price.

30
Q

The Endowment Effect

A

You have it already, it is harder to loss it.
One famous putative demonstration of loss aversion is the endowment effect, which is the finding that people value an item they already own more than they would be prepared to pay for the same item if they did not own it.
In one illustration, Knetsch (1989) gave students a choice between a 400g bar of Swiss chocolate and a new coffee mug.
* Some participants simply indicated which product they would like to take (no initial endowment)
* Other participants were given (endowed with) the mug and had it in their possession for a few minutes before being asked whether they would like to swap it for the chocolate bar
* A third group were given the chocolate bar and later asked whether they would like to swap it for the mug
The percentage of people who chose each product in each group was as follows:

Chose Mug over Chocolate	Chose Chocolate over Mug No initial endowment	56%	44% Endowed with mug	89%	11% Endowed with chocolate	10%	90%

So, the mug and chocolate are, absent ownership, similarly attractive and we would expect half the people who are given the mug to prefer the chocolate and vice-versa. However, once the object is in people’s possession, they are reluctant to swap it.

This endowment effect is argued to be irrational and a hindrance to the proper operation of markets. It is often taken to be evidence for (or an instance of) a steeper value function for losses than for gains: once you’ve got the mug (or chocolate), giving it up seems like a loss and acquiring the other product seems like a gain, and the value function means that “losses loom larger than gains”.
However, this view is not universal.
* Economists often argue that endowment effects are rational or artefactual. For example, there may be a “transaction cost” associated with the effort of making an exchange. Similarly, when people are asked to state their “buying price” (willingness to pay) or “selling price” (willingness to accept), they may not respond honestly, or may misunderstand how the market works. Correspondingly, there is evidence that the effect doesn’t arise in some situations – e.g., in domains where the buyers and sellers have considerable market experience.
* More generally, loss aversion is not a universal feature of decision-making; it depends on the probabilities and amounts involved, on past experience with other gambles, and on the framing of the task (see Ert & Erev, 2013, for a demonstration).

31
Q

Decision weights

A

So far we have seen violations of expected utility which suggest a reference-dependent, S-shaped value function which is steeper for losses than for gains.
What about the representation of probabilities? Recall that in EU theory, utilities are weighted by the probability of each outcome. Is this tenable? Extensive violations of rationality suggest not.
Some of the most famous examples come from the economist Allais, whose “paradox” was an early problem for Expected Utility theory. A version of this problem is provided by Kahneman and Tversky (1979).

PROBLEM 1 PROBLEM 2

Option A £2500 with probability 0.33 Option C £2500 with probability 0.33
£2400 with probability 0.66 £0 with probability 0.67
£0 with probability 0.01

Option B £2400 with certainty Option D £2400 with probability 0.34
£0 with probability 0.66

82% chose B			83 % chose C

This pattern of choice again violates rationality and expected utility theory. To see why, note that we can re-write the options as follows:
p = .66 p = .33 p = .01
Option A 2400 2500 0
Option B 2400 2400 2400
Option C 0 2500 0
Option D 0 2400 2400
It’s now clear that A and B both offer a 66% chance of winning £2400, so the choice between them has to be based on the difference between, on the one hand, a 33% chance of £2500 and a 1% chance of nothing, and on the other a 34% chance of £2400. This is exactly the same as for options C and D; both have a 66% chance of winning nothing, so the choice has to be based on the same set of probabilities and amounts as in the former version of the problem. Correspondingly, people should choose the same way in both tasks, and the preference reversal shows a violation of rationality.

32
Q

The “certainty effect”

A

Tversky and Kahneman (1981) take this as an example of the certainty effect – the idea that people disproportionately weight outcomes which are guaranteed to occur (or not occur) so that “a reduction of the probability of an outcome by a constant factor has more impact when the outcome was initially certain than when it was merely probable” (Kahneman & Tversky, 1981, p. 455). In Problem 1, option B seems particularly attractive because the outcome is certain.
As another example, participants were asked to choose between:
A. A 50% chance to win a tour of England, France, and Italy
B. A one week tour of England, with certainty
78% chose B. From an expected utility perspective, this implies that 0.5 x u(England, France, Italy) is less than u(England) – i.e., that the utility of a trip to England, France, and Italy is less than twice the utility of a trip to England alone.
However, when faced with:
C. A 5% chance to win a tour of England, France, and Italy
D. A 10% chance to win a tour of England
67% chose C, which implies that 0.05 x u(England, France, and Italy) > 0.1 x u(England). In other words, u(England, France, Italy) is now more than twice the utility of a trip to England alone.
-Thinking they are both low probability, people would take the risk.
With one is certain, people choose the one with absolute certainty-The certainty effect

33
Q

Non-linear treatment of probabilities

A

More generally, people seem to overweight extreme probabilities and under-weight moderate-to-large ones. As an example:
Gonzalez and Wu (1999) presented the following:
“You have two lotteries to win $250. One offers a 5% chance to win the prize and the other offers a 30% chance to win the prize.

A: You can improve the chances of winning the first lottery from 5 to 10%.
B: You can improve the chances of winning the second lottery from 30 to 35%.

Which of these two improvements, or increases, seems like a more significant change?”

75% chose option A, so an increase from 5 to 10% felt more important than the same absolute increment in the mid-range of probabilities.
Then, the authors constructed a new version by adding 60 to all of the values involved. Now the 5% increase seemed more significant for option B (that is, a rise from 90% to 95% felt more important than a rise from 65% to 70%).

People overweight low probability event and underweight high probability.
5-10 is much more than 85-90.
People scare of terrorism- overweigh effect

Kahneman and Tversky proposed a decision weight function – a non-linear mapping between stated probabilities and the weight given to the corresponding outcome when forming an overall evaluation of a prospect. (Note that this is not the same as positing a mapping between physical probabilities and the subjective representation of how a given probability “feels”, subjectively). Prospect Theory’s decision-weight function is plotted below.

34
Q

A brief summary of Prospect Theory

A

Prospect Theory was proposed by Kahneman and Tversky (1979) as a descriptive account of human decisions under risk, and has been extremely influential. Its core components comprise:
1. An Editing stage, during which a number of principles are used to simplify the options and ready them for evaluation. For example, if a gamble promised a 50% chance of £10, a 10% chance of £10, and a 40% chance of nothing, the first two options will be combined into a single “60% chance of £10” representation. Similarly, inconsequential differences in amounts will be ignored (“rounded”).
2. A reference point is selected which determines whether outcomes are construed as gains or losses, with the value of these outcomes being determined by the S-shaped value function
3. The subjective value of outcomes are multiplied by the decision-weight transformations of their associated probabilities
The overall value of a prospect (gamble) is then: w(p1)v(a1) + w(p2)+v(a2) + … where w is the decision-weight function, v is the value function, and pi and ai are the probabilities and amounts for each outcome.
The original version of Prospect Theory (Kahneman & Tversky, 1979) had a number of limitations, including being limited to gambles with only two outcomes.
Cumulative Prospect Theory (Tversky & Kahneman, 1992) addressed some of these problems. (Note: the probability weighting function above actually comes from CPT – so you might see a slightly different version of this curve in some papers/textbooks.)

35
Q

A brief critique of Prospect Theory

A

Prospect Theory is one of the most successful psychological theories (the original paper has been cited by other papers over 30 thousand times). However, it has a number of shortcomings.
Limited Scope – even for risky choice
There are violations of rationality in risky choice which Prospect Theory doesn’t address. We consider two examples, both of which relate to the idea that people do not have stable psychoeconomic functions that map objective quantities (e.g., amounts of money) onto patterns of behaviour.
Example 1: Valuation vs Choice
People’s valuations of two gambles can contradict their choices when asked to pick between them.
Lichtenstein and Slovic (1971) provide one demonstration:
* First, participants were presented with pairs of bets and indicated which they would prefer to play.
o One bet (the “P bet”) had a high probability of winning a relatively small amount. E.g., a 95% chance to win $2.50 and a 5% chance to lose $0.75
o The other (the “$ bet”) had a small chance of winning a large amount of money. E.g., a 40% chance of winning $8.50 and a 60% chance of losing $1.50
* About an hour after completing this choice task, the participant was shown the bets one at a time, told that he or she owned a ticket entitling them to play the gamble, and asked to state the minimum price for which they would be prepared to sell the ticket. (That is, they were asked to express in dollars how much the bet was worth to them.)
On average, P-bets and $-bets were chosen about equally often. However, the key finding was that participants regularly showed preference reversals: they gave a higher valuation for the $-bet than for the corresponding P-bet, even though they chose the P-bet in the choice-task. 73% of participants showed this reversal for every pair where they originally chose the P-bet. By contrast, hardly anyone showed a reversal in the other direction (offering a higher valuation for the P-bet when the $-bet had originally been chosen; only 17% of people ever did this).
Economists were extremely resistant to accept these preference reversals and spent great efforts trying to demonstrate that they were methodological artefacts (e.g., that people weren’t really stating their selling price during the valuation task). However, the findings are robust and replicate across diverse participant groups and when people play for real money.
These results show that preferences are constructed by elicitation procedures rather than reflected in people’s responses in those tasks. They imply that there is no stable value function relating objective and subjective value.
One explanation for this effect concerns response compatibility. In the valuation task, responses are on the same scale as the rewards/losses offered by the bet, so that aspect of the gamble will dominate and $-bets are valued more highly than P-bets (because the monetary returns are greater).
Example 2: Decoy effects
The same instability is illustrated by decoy effects. A famous example is the asymmetric dominance effect (aka the attraction effect), a widely-used marketing trick. As one illustration, Ariely (2009) describes a genuine advert in the Economist magazine offering:
* A. One-year on-line only subscription. $59.
* B. One-year print subscription. $125
* C. One year print and web subscription. $125
When he offered this selection to MIT students, 16% chose A and 84% chose C. (Unsurprisingly, no-one chose the print-only version when the print-and-on-line subscription cost the same).
However, when the print-only option was removed from the set, the preferences reversed: 68% chose A and 32% chose C.
This violates a core assumption of conventional rational choice theory – namely, the independence from irrelevant alternatives: my preference between two options should not depend on other options. After all, whether I prefer an apple to an orange shouldn’t depend on whether I had the option of choosing a banana.
The asymmetric dominance effect also shapes risky choice. For example, Wedell (1991) presented choices between a relatively “safe” option such as “A 50% chance of $20” and a relatively risky options such as “A30% chance of £33”. An asymmetrically-dominated decoy such as “A 25% chance of $33” boosted preference for the risky option, whereas as decoy such as “A 50% chance of $18” boosted preference for the safer option.
Two other decoy effects are also widely-discussed:
* The similarity effect: when the decoy is similar to option A it draws choice share from A and boosts relative preference for B
* The compromise effect: when the decoy is more extreme than option A on both dimensions, it boosts choice share for A
This kind of context effect has been explained in many ways (see Howes et al., 2016, for a review), most of which emphasize the importance of relative judgment – in the Economist example, the target item (e.g., the print+online package) is clearly equal to or better than the decoy (print only) option on both relevant dimensions (price and quantity; it costs more but you get the same). The online-only version only “beats” the decoy on one dimension (affordability). So these comparisons lead to selection of the target (for example, because it has higher “rank position” in the set of three, or because it is easier to justify the choice of this item).
Whatever the mechanism (and there may well be more than one), these context effects aren’t captured by Prospect Theory – and they point to a lability of preferences that questions the basic approach of trying to infer psychoeconomic functions from choices.
Empirical Problems
Beyond the difficulty of accommodating choice-valuation preference reversals, there are several phenomena in risky choice which contradict Prospect Theory. Many of these are reviewed by Birnbaum (2008). One example is provided by Weber and Chapman (2005), who varied both the size of the stakes and the probabilities of the various outcomes, using gambles like the following:
Low stakes High stakes
Low probabilities A 2% chance of $6
or
A 4% chance of $3 A 2% chance of $600
or
A 4% chance of $300
High probabilities A 40% chance of $6
or
An 80% chance of $3 A 40% chance of $600
or
An 80% chance of $300

The proportion of people choosing the risky (lower probability) option in each condition is plotted below:
When the stakes are low, people don’t care about the probability. When the stakes are high, people care more about the probability.

  • When the outcomes have high probabilities, people are less risk-averse when the stakes are low. (This has been labelled the “peanuts effect”: people don’t mind taking a risk when playing for peanuts).
  • This peanuts effect can be explained by the value function of Prospect Theory: the subjective difference between (say) $6 and $3 may seem bigger than the difference between $600 and $300, so it seems more worth taking a gamble in the low-stakes case.
  • However, the peanuts effect is much smaller (nee, non-existent) when the probabilities are small.
  • Prospect theory can’t readily accommodate this. We’d have to assume that the value function is less concave when the probabilities are low, but it doesn’t really make sense to posit that the subjective value of a particular amount of money depends on the probability of receiving it (at least, not within the overall conceptual framework of Prospect theory).
    While the core ideas of reference-point-dependent valuations, diminishing sensitivity to gains and losses, and non-linear transformations of probabilities into subjective “weights” all have a lot of merit, the overall edifice of Prospect Theory struggles to account for a number of effects like these.
    Conceptual problems – the lack of mechanism
    People don’t behave rationally when they are taking potential/fake risks, which is not the same in the real world condition. The behaviour is irrational. People are deducting the hint or the reason when doctors are talking about the death rate, they are reading between the lines to acquire the social cues.
    Also during the conversation, using just the sentence or actually numbers would effect the behaviour to produce a more rational response, instead of just social cues.
    A parallel problem is that Prospect Theory is purely descriptive; it says that decisions will be “as if” people combine amounts and probabilities in a particular way, but makes no claims or predictions about the actual processes by which people reach a decision.
    More recent theories have specified process accounts of the sampling, integration, and comparison processes that underlie risky choice.

One example is Stewart et al.’s (2006) Decision by Sampling account, which emphasizes the role of memory retrieval in constructing subjective value. A given amount of money, for example, will be valued by retrieving other amounts of money from memory and performing pairwise comparisons against the target value to establish the target’s rank-position in the set. This rank-position defines the subjective magnitude of the outcome; the same process provides the subjective magnitude of probabilities.

Crucially, this account predicts that subjective values, and hence choices, will depend upon the memory-retrieval set against which the current attribute is compared. We should therefore be able to shift people’s preferences by exposing them to different amounts of money (or probabilities) earlier in the session. Stewart and colleagues have shown that this is just what happens (Stewart et al., 2003).

36
Q

Lecture 4: Emotion and Decision-Making

A

We will:
* Briefly survey some ideas about the nature of emotions and their relationship to bodily states
* Examine in detail one particular view of how such physiologically-grounded emotional signals may guide decisions

37
Q

Appraisal theorists such as Scherer (1984)

A

respond that the cognitive appraisals that underlie emotion need not be conscious, and have gone on to propose various appraisal dimensions that are presumed to shape the evocation of emotion –including the extent to which an event is certain, under one’s own control, the responsibility of other people, requiring of effort, attention-capturing, and so on.

38
Q

Where are we now?

A

Although contemporary emotion theorists might disagree about whether there are categorical “basic” emotions or a graded “core affect”, and about the precise roles of cognition, physiology, and culture, there is broad consensus that emotion has:
* A cognitive component (the evaluation of objects and events)
* A physiological component (changes in somatic state)
* A motivational component (action tendencies)
* An expressive component (facial and vocal signals)
* A subjective component (the feeling of the emotion)
A recent review (Scherer & Moor, 2019) offers the following diagram to depict the common ground between contemporary views of emotion. Note that the elements in the diagram are linked by double-headed arrows, in recognition that they are dynamic and interactive.

So:
* Physiological changes are an important component of emotion
* There probably aren’t clear-cut physiological “fingerprints” for specific emotion…
* …or simple, unidirectional causal pathways between the different components of emotion
Bear these points in mind when we progress to consider the role of emotion in decision making!

39
Q

Lerner et al (2015)

A

A recent review by Lerner et al (2015) highlights the myriad ways in which emotion can influence decision-making, as illustrated by the diagram below:In this framework, people make decisions by consciously or non-consciously evaluate information about the possible outcomes associated with different courses of action (e.g., choosing a particular gamble). This evaluation is shaped by characteristics of the options (e.g., the probability of winning each possible prize) and of the decision-maker (e.g., the extent to which you value monetary rewards). This evaluation process is presumed to be shaped by one’s current emotional state, and this state is in turn shaped by a range of factors, including background emotions that reflect relatively stable aspects of the decision-maker (e.g., chronic anxiety) or transient, incidental responses to external events (e.g., the news that you have just been promoted). The characteristics of the options (e.g., the possible outcomes and their probabilities) may also themselves elicit emotions – e.g., fear or excitement – as may the act of evaluation (e.g., it can be distressing to engage in a difficult choice, so your mood may worsen as you try to make the decision).

We don’t have time to discuss all of these possible influences, so we will focus on one important component of the process: the arrow between “expected outcomes” and “conscious or non-conscious evaluation”. Specifically, we will discuss one very influential account of how anticipated emotion shapes decision-making: the somatic marker hypothesis.

40
Q

The somatic marker hypothesis

A

The somatic marker hypothesis (SMH) arose from studies of patients with neurological damage – in particular, those with damage to the amygdala, and those with damage to the orbitofrontal cortex/ventro-medial prefrontal cortex. (The anatomical and functional distinctions between the OFC and vmPFC are disputed.)- the somatic marker is the prior memories linked to emotions

Regarding the amygdala:
* Patients DR and SE, with bilateral damage, showed selective impairment in the recognition of “fear” from face photographs (mean 4/10 correct vs. 8.6/10 for controls; performance was normal for other emotions; Calder et al., 1996).
* Lesions to the amygdala abolish/reduce the acquisition of fearful responses to initially neutral stimuli which are repeatedly paired with aversive outcomes (e.g., Blanchard & Blanchard, 1972)
* Patients with amygdala lesions show less declarative memory for emotional material. For example, participants were shown a slide show with accompanying narrative which included some emotional elements (scenes of surgery) and 24 hours later answered questions about what they had seen. Controls showed better memory for the most evocative slide than for the other slides; patients with amygdala damage did not (Adolphs et al., 1997).
* The superior recall of information about emotionally arousing pictures (both pleasant and unpleasant) relative to retrieval of neutral information has been found to correlate with the size of the amygdala activation during encoding (Hamann et al., 1999)
Regarding the vmPFC:
* Lesion patients show increased emotional reactivity (e.g., frustration, irritability) and decreased emotionality (i.e., they are judged to be less responsive, show blunted affect, socially withdraw; Anderson et al. (2006).
* Patients also fail to show the elevation in skin conductance response (SCR – a measure of sweating) that usually accompanies viewing a socially-evocative slide image (e.g., a mutilated body; Damasio et al., 1990).
* When faced with moral dilemmas which pit the emotionally-charged sacrifice of one person against the greater loss of other lives, lesion patients were more likely to choose the “utilitarian” response (e.g., shoving one person under a train to save five others; Koenigs et al., 2007)
Amydgala and vmPFC damage and decision-making
Phineas Cage after damaged the amygdala, he makes poor decisions.
Emotions are key for making rational decisions since the amygdala are actually used for both emotions and decision making. Without weighing decision back and forth by using conscious cue of emotions. The prior memories linked to emotions are called somatic marker.
Antonio Damasio and colleagues noted that vmPFC patients often show impaired “real life” decision-making, but are indistinguishable from “normal” participants in terms of general intellect and performance on a range of neuropsychological tests.

41
Q

The Iowa Gambling Task (IGT)

A

Damasio and colleagues developed a simple gambling game, the Iowa Gambling Task, as a model for real-life decision-making, where one must balance the possibility of big rewards with the risk of substantial losses.
The participants are supposed to figure out which decision leads to the most profit. And they find people with vmPFC damage takes the wrong options- people who damaged have trouble making decisions.
Normal people would move towards to do the option C/D test.
Criticism is that by changing. Shuffling the card arrangement would leads to different decision of lesion.

Participants are given $2000 of toy money and confronted with 4 decks of cards, labelled A, B, C, and D. Each card they turnover yields a reward; some also yield a penalty. The goal is to make as much money as possible. Participants turn over a total of 100 cards, although they don’t know how many they will be asked to turn. Every card in Decks A and B produces a large reward ($100) but the intermittent losses are hefty (Deck A gives losses of -150 to -350 every two to three cards; Deck B gives losses of -1250 every ten cards); Decks C and D give smaller wins of 50 but also incur smaller losses (-50 every few cards for C; -250 every 10 cards for D). In the long run, choosing C or D is more advantageous than choosing A or B.
Bechara et al. (1994) gave this task to 6 patients with ventromedial frontal lobe damage (including EVR), 9 patients with other types of damage (e.g., occipital lesions) and 21 matched healthy participants. The mean proportion of selections from each deck for vmPFC patients and healthy controls are plotted below.

The patients were more likely than the controls to choose the “bad” decks (those with high rewards but larger losses). Trial-by-trial analysis showed that controls initially sample from all decks and then gravitated towards C and D, whereas vmPFC patients continue to draw from A and B throughout the task. Brain-damaged control participants performed in the same way as healthy participants, so there is something lesion-specific about the choices made by vmPFC patients.
* Bechara and colleagues argue that one cannot explicitly keep track of the gains and losses associated with each deck and must develop a “feeling” for which decks are risky/profitable

42
Q

Bechara et al. (1996)

A

In a subsequent study, Bechara et al. (1996) recorded the skin conductance response (SCR) of the participants as they played the game. SCR (also known as galvanic skin response) is a measure of sweating, and is taken to indicate physiological arousal. The results are shown below:

  • Reward SCRs (the reaction to the positive outcome that was initially revealed when participants turned a card) were similar for patients and controls
  • Likewise, Punishment SCRs (the reaction when the card also entailed a loss, which was announced after the reward) was similar for both groups
  • However, there was a big difference in anticipatory responses – the arousal immediately before choosing a deck. Controls show more arousal immediately prior to choosing a risky deck (A or B) than before choosing a safe one (C or D); patients with ventral prefrontal cortex damage show no such sensitivity
    In a subsequent study, Bechara et al. (1999) used the IGT to compare patients with amygdala lesions to those with vmPFC damage.
  • Like the vmPFC patients, those with amygdala lesions did not learn to select from the good decks
  • Again like vmPFC patients, amygdala lesions abolished the anticipatory SCRs that preceded selection from a bad deck
  • However, unlike the vmPFC group, amygdala damage also eliminated the SCRs that accompanied the rewarding/punishing outcomes
    So, one idea is that the amygdala is involved in associating particular stimuli or actions with affectively-meaningful outcomes, but that the vmPFC is crucial in re-activating these representations at the time of choice (see below).

What are people conscious of?
Bechara et al. (1997) ran another version of the study, this time interrupting the game after 20 trials (when the participant had experienced some gains and losses) and every 10 trials thereafter, and asking:
1. “Tell me all you know about what is going on in this game”
2. “Tell me how you feel about this game”
The authors divided responses into four periods:
1. Pre-punishment: the early stages, when people had sampled all 4 decks without encountering any losses. Neither patients nor controls generated anticipatory SCRs.
2. “Pre-hunch”. After a few losses (around cards 10-20) controls begin to generate anticipatory SCRs but “all indicated that they did not have a clue about what was going on”
3. “Hunch”. After about 50 cards, normal participants expressed a “hunch” that A and B were risky and generated anticipatory SCRs when they selected these decks. vmPFC patients did not generate anticipatory SCRs or express a hunch
4. “Conceptual”. By about the 80th trial, 7/10 normal participants conceptually explained why A and B were worse than C and D. Even the 3 who did not reach this point continued to generate anticipatory SCRs and avoided the risky decks. 3/6 patients reached the conceptual period, but none of the patients showed anticipatory SCRs and all continued to favour the “bad” decks
Bechara et al argue that “in normal individuals, nonconscious biases guide behaviour before conscious knowledge does. Without the help of such biases, overt knowledge may be insufficient to ensure advantageous behaviour” (p. 1293).
This, broadly, is the somatic marker hypothesis: a given situation activates “dispositional knowledge” of the emotional experiences previously associated with the various options/outcomes. The vmPFC is seen as a key structure in storing such knowledge, whose activation is posited to trigger autonomic responses (including those which lead to skin conductance changes).
These nonconscious processes are argued to bias the more deliberative, “cognitive” decision-making that is based on overt knowledge about past actions and outcomes. In other words, the vmPFC (and other structures, including the amygdala) are taken to play a role in “re-living” the emotional/somatic experiences associated with particular response options, and these somatic changes – which may be outside conscious awareness – shape or bias the conscious decision process.
This view of emotion shares much with the James-Lange view, in that emotion is presumed to result from the brain’s processing of somatic signals. Damasio avoids the problems associated with a crude version of this idea by positing an “as-if” loop (an idea also found in James and other early writers). As described by Dunn et al (2006): “somatic markers can reflect actions of the body proper (the ‘body’ loop) or the brain’s representation of the action expected to take place in the body (the ‘as-if’ loop). In other words, the brain can construct a forward model of changes it expects in the body, allowing the organism to respond more rapidly to external stimuli without waiting for that activity to actually emerge in the periphery”.
The somatic marker hypothesis
People retrieve facts then emotions to make decisions.
Body loop-experience actually trigger physiological changes.
As of body loop-people still anticipate how they might feel with somatic marker, hypothetical which would still trigger some physiological response.

43
Q

Problems with the IGT and Somatic Marker Hypothesis

A

Problems with the IGT and Somatic Marker Hypothesis
Damasio’s work had a huge impact and the IGT is still very widely-used. However, the task has a number of problems which undermine the very substantial claims of the marker hypothesis.

Do we need physiological responses to perform the task?
Heims et al. (2005) examined patients with pure autonomic failure (PAF), which involves degeneration of autonomic neurons and failure of the autonomic nervous system to regulate bodily states. For example, patients do not show increased heart rate or blood pressure when under stress, and lack skin conductance responses to emotive stimuli.
-Somatic cues may not be needed.
-A&B are bad but also have larger outcomes, with people who damaged vmPFC, when you shift the outcome, people are learning differently with the damage, if change the nature of the task, the outcome is different.

The PAF patients were significantly more likely than controls to select the good decks – certainly not evidence that one needs somatic changes to signal which deck to pick.
Do SCRs really signal anticipated outcomes?
Damasio and colleagues interpret the high SCRs that precede selection from the “bad” decks as being a somatic representation of the negative outcomes that will follow.
Tomb et al. (2002) point out that the bad decks (A and B) don’t just have net negative outcomes, they also involve much larger amounts of money (both for wins and for losses), and the variance of the outcomes from these decks is higher. The greater anticipatory SCRs might therefore reflect the uncertainty associated with selecting from one of these decks, rather than their long-run profitability.
Tomb et al. ran a new version of the task in which the “good” decks had higher-magnitude/more variable outcomes than the “bad” ones: A and B always returned $2250 but had mean losses of $1500 every 10 cards; C and D had wins of $250 and mean losses of $1000 every 10 cards. So now A and B are better than C and D, despite still having larger/more variable outcomes.
* The standard IGT replicated previous findings: people most often chose the good decks (C and D) and generated larger anticipatory SCRs when they selected from the bad decks (A and B).
* In the modified version, people still chose the good decks (now A and B) but the SCRs were larger preceding selection from these decks.
Tomb et al. therefore argued that anticipatory physiological responses do not signal that the deck is “bad”.
Damasio and colleagues (2002) countered that, in the modified version of the task used by Tomb et al., the anticipatory SCRs for the “good decks” signal their “goodness” – but if the same physiological signals indicate both positive and negative outcomes, how can they be used to guide choice?
Do people really “decide advantageously before knowing the advantageous strategy”?
Maia and McClelland (2004) challenged the idea that somatic states “unconsciously” signal the advantageous/disadvantageous outcomes. They point out that Bechara et al.’s study (1997) has several shortcomings, including:
1. “Good” and “Bad” decks are defined by the experimenter, based on the long-run expected returns of each. But rational behaviour from a participant should be determined by the experiences they have actually had prior to the point of choosing. E.g., if all past experience with decks A and B has been positive so far (because it’s early in the experiment and the punishments haven’t yet started) then these are the “good decks”.
2. The questions that Bechara et al. used to probe “conscious knowledge” are hopelessly vague. People might well be able to articulate their understanding of the outcomes associated with each deck if we asked them more sensitive questions.
Maia and McClelland ran a version of the IGT in which participants were asked after trials 20, 30, 40… to answer a series of more penetrating questions, including rating how bad/good each deck is on a scale from -10 to +10, asking people to estimate the average winnings, frequency of losing, and average size of loss if they chose a given deck for the next 10 trials (allowing the authors to calculate people’s implicit “expected return”), and asking for the participant’s own estimate of the “average net result” that would come from selecting each deck for the next 10 trials. The authors then looked to see whether participants’ responses indicated which two decks were “best” at each point in the study.
* Participants’ explicit knowledge was good: typically about 18 of the 20 participants gave responses which indicated that they knew which decks were best, right from the first question period. This level of performance was often higher than their behavioural performance –people continued to explore the risky, “bad” decks on some trials
* Looking specifically at trials where participants behaved advantageously (selected one of the two best decks, given their past experience), virtually all participants had explicit knowledge of which decks were best.
So, there might be “unconscious” knowledge (perhaps represented by somatic markers) that shapes decision making, but the data from Maia and McClelland (2004) show that we don’t need to posit this to explain performance on the gambling task.
What about the vmPFC patients?
Farah and Fellows (2005) point out that, in the standard IGT, the first trials from all decks are wins; losses only emerge after several cards from a given deck have been turned over. Patients with vmPFC lesions might do badly because the initial positive experiences set up a response tendency which they fail to overcome once the negative outcomes for the high-stakes decks start to arrive.
To test this they compared vmPFC patients with normal controls on the standard IGT and on a modified version in which the losses associated with each deck were made apparent within the first few selections.
* In the standard version, control participants chose from the good decks more often than did ventral prefrontal patients (62% vs 50%).
* In the shuffled version, there was little difference between the participant groups: both chose from the advantageous decks most of the time (68% for patients, 72% for controls)
So these authors argue that the patients show a deficit in reversal learning rather than a failure to associate particular choices with long-run negative outcomes.
Not a dead loss…
The Iowa Gambling Task and Somatic Marker Hypotheses are prime examples of how interesting new methodologies and ideas can generate a huge amount of research which shows, fairly rapidly, that the original (rather over-stated) claims go beyond what the data can support – and, indeed, that the task may not be measuring what it was intended to. This pattern is repeated throughout psychology research. Nonetheless, we do make progress, and elements of the task and Damasio’s theorizing are useful and important.
First, there is evidence that somatic states do signal forthcoming outcomes in a way that might guide choice. Dunn et al. (2010) used a modified version of the IGT called the “intuitive reasoning task” (IRT).
* On each trial participants selected from one of four decks and then had to guess whether the colour of their chosen card would match that of another card that was about to appear on-screen; a correct guess won money, an incorrect guess lost it.
* The computer was rigged such that participants’ predictions were classified as correct on 60% of trials for decks A and B but only 40% of trials for decks C and D.
* Decks A and C involved relatively small wins/losses; decks B and D used higher-magnitude outcomes. Points won during the game translated into real financial rewards at the end.
The IRT avoids some of the pitfalls of the IGT:
1. Outcome magnitude/variability is no longer confounded with “goodness” of the deck
2. There is no longer a fixed sequence of cards with a long delay before the first losses for the “bad” decks, so reversal learning is not an issue
3. Moreover, with a separate group of participants the authors followed Maia and McClelland’s (2004) approach to probing conscious knowledge of deck outcomes and found find little evidence for overt knowledge; they argue that the IRT really does tap “intuitive reasoning”, although probing and operationalizing conscious knowledge is so fraught with difficulties that we should be sceptical about this.
Heart rate and electrodermal activity (skin conductance) were recorded prior to each choice. At the end of the task, the authors also probed people’s ability to assess their own bodily states – i.e., their interoception – by having them report how many heartbeats they experienced in a certain time period and comparing this with the true number, assessed by electrocardiogram.
The key results were:
* On average, people chose more from the “good” decks (A and B) than the “bad” ones (C and D), and their responses were unaffected by the size of the wins/losses.
* Participants showed reduced electrodermal activity and slower heart rates prior to selecting from the good decks
* The difference between the anticipatory response to good decks and bad decks correlated highly with behavioural performance. That is, people who experienced larger “warning” signals in their physiology made more advantageous choices
* Interoceptive accuracy moderated the relationship between anticipatory responses and performance: the effect of physiological warning signals prior to selection from a bad deck was more pronounced for people with good interoception
So, there is evidence that physiological signals may guide decision-making, and that this is more pronounced for people who are “in tune” with these signals. More broadly, there is widespread consensus that the amygdala and vmPFC play a key role in representing and associating rewards with actions (see e.g. Hiser & Koenigs, 2018, for a review). The SMH doesn’t offer a complete account of the role of emotion in decision-making, but the development and testing of the hypothesis has helped advance our understanding and clarify our thinking about this topic.