Reasoning and decision making Flashcards
Lecture 1: Judging Probabilities
We will focus on some of the ways in which intuitive probability judgments violate the prescriptions of probability theory, and what these patterns of response reveal about how people estimate probabilities. Two broad approaches will be considered:
* The idea that probability (and other) judgments are sometimes “biased” because people use simplifying strategies that reduce effort but are prone to error
* The idea that we should consider the ecological context in which judgments are made, and that apparent biases may be rational responses given the informational and cognitive constraints of the human decision-maker
2 approaches of how people make probability judgments
The fallibility of probability judgements was central to the Heuristics and Biases research program developed by Amos Tversky and Daniel Kahneman in the 1970s.
The “two systems”/ “heuristics and biases” approaches suggest that probability judgments (and other kinds of judgment) are biased/violate rational prescriptions because the human judge has adopted a quick-and-dirty strategy rather than a more effortful consideration of relevant material.
An alternative framework emphasizes the role of ecological conditions and informational constraints on people’s judgments and decisions. We will consider two examples.
The “Availability” Heuristic
If an event happens a lot then it will be easy to think of many past instances, so basing judgments on availability is sensible and people’s frequency judgments are often very accurate. However, availability can entail bias if:
1. our experience of past events does not reflect their true frequencies, or
2. events are easy to recall for some reason other than their frequency of occurrence.
One observation that is often taken as evidence for the availability heuristic is that people commonly overestimate the frequency or probability of rare events and underestimate common ones. For example, Lichtenstein et al. (1978) had participants estimate the number of US deaths per year due to 40 causes ranging from very rare (e.g., botulism, with one death per 100 million people) to very common (e.g., stroke: 102 000 deaths per 100 million people).
As shown in the graph below, participants systematically over-estimated the latter and under-estimated the former (the curve shows the best-fitting relationship between real and estimated probabilities; the straight line shows what would happen if people’s judgments were accurate).
This pattern is often attributed to availability: rare events are often given disproportionate publicity and are correspondingly more mentally-available than their environmental frequency would merit.
However:
* The bias here is in the environment (the over-reporting of rare events) rather than a consequence of a flawed estimation strategy.
* This kind of effect does not directly demonstrate availability-based judgment, because no assessment of the ease-of-retrieval has been made.
* The tendency to over-/under-estimate rare and common events can be explained in other ways. In particular, it can be seen as an instance of the general central tendency of judgment, where estimates for extreme items being biased towards the mean of the set. This central tendency is widespread and can be seen as an optimizing strategy – when one is uncertain, guessing the mean of the distribution is sensible – without invoking availability.
A stronger demonstration of the use of the availability heuristic comes from Tversky and Kahneman (1973). Participants listened to a list of 39 names. In one condition the names comprised 19 famous men and 20 less famous women; in another condition it comprised 19 famous women and 20 less famous men.
After listening to the list, some participants had to write down as many names as they could recall; others were asked whether the list contained more names of men or of women.
* In the recall task, participants retrieved more of the famous names (12.3 out of 19) than the non-famous names (8.4 out of 20). That is, famous names were more available.
* Crucially, 80 out of 99 participants judged the gender category that contained more famous names to be more frequent. (E.g., the people given a list of 19 famous men and 20 famous women reported that there were more men than women in the list).
It seems that people made their proportion estimates by assessing the ease with which examples of each come to mind. When one category was easier to retrieve (via the fame manipulation) it was judged more frequent, even when it was actually experienced less often.
The Conjunction Fallacy
The availability heuristic is posited to produce judgments that deviate from the rules of probability theory. A basic axiom of probability theory is that the probability of event “A” cannot be less than the probability of the conjunction “A and B”. However, subjective probability estimates sometimes violate this principle, demonstrating the conjunction fallacy.
For example, Tversky and Kahneman (1983) gave some participants the following problem:
* “In four pages of a novel (about 2,000 words), how many words would you expect to find that have the form _ _ _ _ i n g (seven letter words that end with “ing”)?
Other participants were asked:
* “To estimate the number of words of “the form _ _ _ _ _ n _ (seven letter words with n as the penultimate letter)”.
All ing words have n as the penultimate letter, so the number of n words must be at least as large as the number of ing words. However, participants violated this principle: they estimated, on average, 13.4 ing words but only 4.7 n words.
Tversky and Kahneman (1983) took this as evidence that people are basing their judgments on the mental availability of relevant instances: it is easy to think of “ing” words (for example, by thinking of words that rhyme) but we are less accustomed to organizing/retrieving words based on their penultimate letter, so n words are harder to retrieve and thus seem rarer. If/when participants apply a more systematic mental search strategy, we would expect the conjunction fallacy to disappear.
Base Rate Neglect
Similarity-based judgments are insensitive to prior probabilities: the extent to which I look like the sort of person who might be proficient at ballet is independent of the proportion of ballet dancers in the population, for example.
So, judgments based on representativeness will be largely independent of base rates.
In one demonstration, Kahneman and Tversky (1973) told participants that a panel of psychologists had interviewed a number of engineers and lawyers and produced character sketches of each person. They were told that 5 such descriptions had been randomly selected and that they should rate, from 0-100, the likelihood that each sketch described one of the engineers (rather than one of the lawyers).
Some participants were told that the population from which the descriptions were drawn consisted of 30 engineers and 70 lawyers. Others were told that the population comprised 70 engineers and 30 lawyers. That is, Kahneman and Tversky manipulated the base rates for the two possible outcomes.
Below is an example personality sketch:
“Jack is a 45 year old man. He is married and has four children. He is generally conservative, careful, and ambitious. He shows no interest in political and social issues and spends most of his free time on his many hobbies which include home carpentry, sailing, and mathematical puzzles. The probability that Jack is one of the 30 [or 70, depending on the condition] engineers in the sample of 100 is ______ %”
The descriptions varied in how similar they were to the stereotypical lawyer/engineer.
* Crucially, people judged the probability that Jack is a lawyer to be much the same when the description was purportedly drawn from a population of mostly engineers as when it was drawn from a population of mostly lawyers.
This is an example of base rate neglect: the personality description might provide some information about Jack’s likely occupation, but this should be combined with information about the number of engineers and lawyers in the population from which his description was randomly drawn. However, people ignored these base probabilities. Kahneman and Tversky argue that:
1. People assess the extent to which the description of Jack is similar to (or representative of) each of the two categories – lawyers and engineers.
2. To the extent that Jack is more similar to the stereotypical engineer, he is more likely to be judged an engineer.
3. Because this assessment of similarity is independent of the prevalence of lawyers and engineers in the population, the resulting probability judgment is independent of the base rates for these two professions.
More direct evidence for the role of representativeness comes from Kahneman and Tversky (1973), who gave participants the following personality sketch:
“Tom W. is of high intelligence, although lacking in true creativity. He has a need for order and clarity, and for neat and tidy systems in which every detail finds its appropriate place. His writing is rather dull and mechanical, occasionally enlivened by somewhat corny puns and by flashes of imagination of the sci-fi type. He has a strong drive for competence. He seems to have little feeling and little sympathy for other people and does not enjoy interacting with others. Self-centred, he nonetheless has a deep moral sense.”
They were also given a list of 9 academic subject areas (e.g., computer science).
* The prediction group was told that the sketch of Tom was prepared by a psychologist during Tom’s final year in high school, and that Tom is now a graduate student. They were asked to rank the 9 academic subjects by the probability that Tom W. is specializing in that topic.
* The base-rate group was not shown the Tom W. sketch but “consider[ed] all first year graduate students in the US today” and indicated the percentage of students in each of the 9 subject areas – that is, the estimated the base rates for each subject area.
* The representativeness group ranked the 9 subject areas by the degree to which Tom W. “resembles a typical graduate student” in that subject area.
Across the 9 subjects, probability judgments were very highly correlated with representativeness judgments (r = .97) but negatively correlated with base-rate judgments (r= -.65). That is, predictions were based on how representative people perceive Tom W. to be of the various fields, and ignored the prior probability that a randomly-selected student would belong to those fields (base rate neglect).
The “Representativeness” Heuristic
The “Representativeness” Heuristic
Kahneman and Tversky also suggested that people use a representativeness heuristic. The idea is that:
* when estimating a probability – for example, how likely it is that a person belongs to a particular category or the probability that an observed sample was drawn from a particular population – people assess the similarity between the outcome and the category (or between the sample and the population).
Suppose that you meet a new person at a party and try to estimate the probability that he or she has tried internet dating. The idea is that you base your judgment on the similarity between the person and your stereotype of internet-daters – that is, on the extent to which the person is representative of the category “people who have tried internet dating”.
More generally, the representativeness heuristic involves “an assessment of the degree of correspondence between a sample and a population, an instance and a category, and act and an actor or, more generally, between an outcome and a model.” (Tversky & Kahneman, 1983, p. 295).
As with the availability heuristics, we can see evidence for this strategy by looking at the biases and axiom-violations that creep in to people’s intuitive judgments.
The “Anchor-and-Adjust” Heuristic
So far we have considered how people use information about the target quantity to reach a probability or frequency estimate. Our judgments are also shaped by candidate response values. In particular, anchoring refers to the assimilation of a numeric estimate towards another, anchor value.
Anchors can come from many sources. Often, our own recent judgments serve as anchors for the current estimate. For example, Matthews and Stewart (2009) had people estimate the prices of shoes from Top Shop; the judgments on trial n positively correlated with the judgments on trial n-1 for 26 out of 28 participants.
Anchors can also be externally provided, and ostensibly irrelevant. In a famous demonstration, Tversky and Kahneman (1974) span a wheel of fortune that landed on 10 (a low anchor) for one group of participants and on 65 (a high anchor) for another group. Participants were asked whether the proportion of African Countries in the United Nations was more or less than the anchor, and then asked for their best estimate of the true value. The median estimate was 25 in the low anchor condition and 65 in the high anchor condition – that is, the participants’ judgments were pulled towards the anchor values.
Similarly, Chapman and Johnson (1999) had people write down the last 2 digits of their social security number and treat it as a probability (e.g., “14%”). Participants were asked whether the probability that a Republican would win the 1996 US Presidential Election was more or less than this probability, prior to giving their best estimate of the true probability. The larger the anchor, the larger the best estimate, with a correlation of r = 0.45.
The most famous account of anchoring effects is the “anchor-and-adjust” heuristic; the idea is that we use the anchor as an initial estimate of the target value and adjust from that starting point in the right direction; because the adjustment is effortful, we often adjust insufficiently and so our judgment is biased towards the anchor value.
This probably happens sometimes, but there are contraindications. For example, in the “wheel of fortune” task described above, warning people about anchoring effects and/or giving them an incentive to be accurate often has little effect on the extent to which people anchor on the provided value (e.g., Epley & Gilovich, 2005), which doesn’t fit with the idea that the anchoring effect reflects a “lazy” or “intuitive” judgment system that can be over-ridden by effortful deliberation.
Other mechanisms that might contribute towards anchoring/assimilation effects include:
* The idea that consideration of the anchor as a possible value for the estimated quantity activates relevant semantic knowledge (e.g., when considering 12% as a possible probability for the probability of a Republican win, we call to mind relevant information about the state of the economy, public perceptions of the candidates etc; this activated knowledge then shapes or biases our final estimate; Chapman & Johnson, 1999)
* The idea that an anchor value changes our perception of the magnitude of other candidate values (e.g., if we’ve just been thinking about a 12% probability, 50% seems quite large; if we’ve been considering 88%, 50% seems quite small; Frederick & Mochon, 2011).
* The idea that externally-presented anchors may be seen as a “hint” or suggestion, even if they are ostensibly uninformative (after all, doesn’t the fact that the experimenter is getting me to consider a number generated by a wheel of fortune suggest that they want me to be influenced by it in some way?)
These possibilities are not mutually exclusive – and note that they do not all fit with the idea that anchoring stems from the application of quick-and-easy-but-biasing heuristics.
Ecology and Adaptation-Example 1: Natural Frequency Formats
One example of an “ecological” argument comes from the effect of natural frequencies on base rate neglect. In the “two systems”/“heuristics and biases” view, the problems caused by using availability or representativeness as the basis for probability judgments can be overcome by evoking “System 2” – i.e., by employing a more effortful processing strategy. Consistent with this, there is evidence that people can discount a potentially-misleading but readily-accessible cue such as stimulus familiarity (e.g., Oppenheimer, 2003). But can we do anything other than alert people to possible bias and/or tell them to put more effort into their judgment?
Some researchers argue that people do much better at probability tasks when the information is presented in way that matches our supposed “evolved” cognitive capacities for handling this kind of information. In particular, it has been argued that humans evolved to process frequencies (counts) obtained by sampling the environment, rather than normalized probabilities.
For example, consider the following problem:
“For a woman at age 40 who participates in routine screening, the probability of breast cancer is 1%. If a woman has breast cancer, the probability is 80% that she will have a positive mammogram. If a woman does not have breast cancer, the probability is 10% that she will still have a positive mammogram.
Imagine a woman from this age group with a positive mammogram. What is the probability that she actually has breast cancer?”
When Eddy (1982) gave this problem to physicians, 95 out of 100 gave estimates between 0.70 and 0.80. An estimate of 80% demonstrates the inverse fallacy: it confuses the probability of a positive test result given the presence of cancer, p(positive|cancer), with the probability of cancer given the positive test result, p(cancer|positive). These probabilities are not the same: the chances of cancer given a positive test depend on the base rate (prior probability) of cancer in the population. A positive test is more likely to indicate cancer when cancer is widespread than when it is very rare. But the physicians (and most people) tend to ignore this base rate information.
Probability theory tells us how we should update our beliefs (e.g., that a person has cancer) in the light of new information. Suppose we have a hypothesis and know that the prior probability that is true is and the probability that it is false, . We then encounter some new data, . The conditional probability of obtaining those data under the hypothesis is . (That is, is the probability of obtaining these data if the hypothesis is true).
Bayes’ theorem tells us how we should update our beliefs to give the posterior probability that is true, given our prior belief and the new data:
In the cancer example:
* is the hypothesis that the person has cancer
* is the base rate of cancer in the population (the prior probability that a randomly selected person has cancer) and equals 0.01
* is the prior probability that a person does not have cancer and equals 0.99
* is the probability of getting a positive test result given that the person has cancer, and equals 0.8
* is the probability of getting a positive test result given that the person does not have cancer, and equals 0.1
Thus:
In other words, given the positive test result the probability that the person has cancer is still only 7.5%.
Gigerenzer and colleagues have argued that probabilities only make sense when conceived as long-run frequencies, and that it does not make sense to talk about the probability of a one-off event (e.g., that a given person has a disease). Rather, Gigerenzer argues that humans evolved to keep track of event frequencies, estimated over time by “natural sampling” (i.e., encountering different types of events and remembering the number of times they occur).
Correspondingly, if we re-express the diagnosis problem in terms of natural frequencies (number of events of each type) rather than normalized probabilities, then people should find it much easier.
Consider this re-expression of the previous problem:
“Ten out of every 1000 woman at age 40 who participate in routine screening have breast cancer. Of these ten women with breast cancer, eight will have a positive mammogram. Of the remaining 990 women without breast cancer, 99 will still have a positive mammogram.
Imagine a group of 40 year old women with positive mammograms. How many of them actually have breast cancer? ____ out of _____”
As Hoffrage and Gigerenzer (1998) note, now the answer can easily be “seen” to be 8 out of 107 = 7.5%. They found that only 8% of physicians answered correctly (gave a judgment within 5% of the true value) in the original wording of the task, but that this increased to 46% with the natural frequency format.
More generally, this representation of the problem means that the answer is simply the number of true positives divided by the total number of positives; there is no need to keep track of the base rate, explaining base-rate neglect when problems are presented in standard probability format. In other words, the task is difficult in the original version because the use of normalized probabilities (which necessitate the explicit incorporation of base rates/priors) deviates from how we “naturally” evaluate chance.
Similar evolutionary ideas have been advocated by others (e.g., Cosmides and Tooby, 1996), but there are alternative explanations for why the natural frequency format makes the task easier. In particular, it has been suggested that it simply clarifies the set relations between the various event categories, and that any manipulation which achieves this will have the same beneficial effect.
Irrespective of the basis for the natural frequency format effects, some authors have argued that base rate neglect is not as common as the heuristics-and-biases programme would have us believe (e.g., Koehler, 1994). Quite often in natural contexts, people are sensitive to prior probabilities and update their beliefs appropriately.
Example 2: The Gambler’s Fallacy and Hot Hand Fallacy
A different illustration of potential “ecological rationality” comes from a consideration of how people judge the probability of streaks of a random outcome. For a sequence of independent events, the probability of a particular outcome is the same irrespective of what has happened in the past: the probability of getting “heads” from a fair coin is the same after a run of 3 heads as after a run of 3 tails. However, subjective probabilities often violate this independence axiom.
For example, Croson and Sundali (2005) examined roulette betting patterns from a Nevada casino. They focused on “even money” outcomes (where the two outcomes are equally likely, such as “red or black” or “odd or even”; if you bet on the right outcome, you get back twice what you staked) and looked at bets as a function of the number of times that an outcome had occurred in a row (e.g., a streak of two would mean that the last two spins both came up heads or both came up tails).
The graph below shows the proportion of bets that were “with” (white bars) and “against” (black bars) the streaks. As the run-length increased, people were increasingly likely to bet that the next outcome would be the opposite of the streak; after a run of 6 or more, 85% of bets were that the streak would end, even though this probability remains fixed at .50.
The belief that a run of one outcome increases the probability of another (when the events are actually independent) is called the gambler’s fallacy.
The gambler’s fallacy is often attributed to the representativeness heuristic: people expect a “local” sequence to be representative of the underlying process (Tversky & Kahneman, 1974). I know that a coin should, in the long run, produce equal numbers of heads and tails, so I expect any sequence of coin tosses to have this property. A run of heads means that a tails outcome will make the local sequence more representative of the data-generating process.
The gambler’s fallacy is widespread, but sometimes people show the opposite tendency by believing that a streak elevates the probability that the same outcome will occur again. In a famous demonstration, Gilovich et al. (1985) found that basketball players’ shooting accuracy was independent of their recent performance: the probability of scoring was the same after a run of “baskets” as after a run of “misses”. However, basketball fans believed that a player’s next shot was more likely to score after a run of successful shots than after a run of misses – a so-called “belief in the hot hand” or “hot hand fallacy”.
Gilovich et al.’s statistical analysis has been questioned (it is hard to establish that the outcomes of each basketball shot really are independent events), but the hot hand fallacy has been found in situations where the success of consecutive attempts really cannot be any guide to future performance. For example, Ayton and Fischer (2004) had people play a roulette-style game and found that their confidence in their predictions for the next outcome was greater after a run of successful predictions – even though the probability of them being right next time must be independent of past success because the roulette spins are random.
Belief in the hot hand has again been attributed to the representativeness heuristic: a run of one outcome doesn’t seem representative of randomness, leading people to conclude that the underlying process is not random (Gilovich et al., 1985).
Some researchers have objected that it is problematic to use the same mechanism to “explain” two completely contradictory findings (belief that a streak will end in the GF and that it will continue in the HH).
Ayton and Fischer (2004) therefore offered an alternative account, based on ecological considerations. Their argument runs:
* Many physical processes involve sampling without replacement, which results in diminishing probability for a given outcome the more times that it has occurred. For example, if you rummage blindly in your cutlery drawer for spoons, removing the spoons as you find them, then the probability that the next item will be a spoon decreases as your hunt progresses.
* Correspondingly, the GF reflects an over-generalization of this ecologically-sensible principle to other random, mechanical processes – e.g., roulette wheels and coin tosses – about which we have very limited experience.
* By contrast, many aspects of intentional human performance really do show positive recency. If you practice a new game, your shooting success will increase. So the hot hand fallacy can be seen as an appropriate generalization of this principle to situations which also use human performance, but where the outcome probabilities are in fact independent.
In support of these ideas, Ayton and Fischer (2004) presented sequences of outcomes with varying alternation rates (AR; a low AR means the next outcome is unlikely to be different from the last, giving many long runs of one outcome; a high AR means lots of short runs). Participants had to judge which of two processes generated each sequence (e.g., a series of basketball shots or a sequence of coin tosses). As the streak length increased, participants were more likely to attribute the sequence to intentional human performance like basketball than to a random mechanical process like coin-flipping.
A related but distinct account comes from elegant work by Hahn and Warren (2009). With an infinitely long sequence of coin flips, all sequences of a given length occur with equal probability – for example, the sequence HHHH will occur with the same frequency as HHHT, so believing that a run of heads means it’ll be tails next time is indeed a fallacy. However, Hahn and Warren noted that humans do not experience or remember infinitely-long sequences – and for shorter sequences, the probability of encountering HHHT and HHHH are not equal. In one illustration, Hahn and Warren simulated 10,000 sequences of 10 coin flips. The pattern HHHH only appeared in about 25% of the sequences, whereas HHHT occurred in about 35% of the simulated samples. In other words, if we had 10,000 people each of whom had experienced 10 flips of a fair coin, it would be perfectly reasonable for more of them to expect a sequence HHH to end with a T than with another H.
This work provides one example of a broader principle – namely, that the supposed “fallacies” of human judgment and decision-making are often perfectly rational given the finite and imperfect information afforded by the environment and our limited mental capacities.
Conclusion for judging probability
Conclusions
We have identified several key phenomena and ideas:
1. Human probability judgments do not always follow the laws of probability, and these deviations illuminate the judgment process.
2. One broad framework posits the use of simplifying strategies that reduce effort at the expense of sometimes introducing bias. In particular, people sometimes simplify judgments by substituting an easier-to-evaluate entity for the target dimension: the availability and representativeness heuristics are two examples.
3. Judgments often assimilate towards anchor values. There are many types of anchor and many mechanisms that underlie this assimilation.
4. We can also consider probability judgments in their ecological context. One idea is that humans evolved to process frequencies, not normalizing probabilities, although this interpretation of frequency-format effects is debatable.
5. Likewise, we can see phenomena such as the gambler’s fallacy as reflecting ecological experience with different types of generating process.
6. We have focused on probability judgments, but these kinds of ideas and effects apply to many other kinds of judgment.
Lecture 2: Reasoning
We will discuss how human reasoning deviates from the prescriptions of formal logic in a range of tasks, and how the systematic patterns of success and failure on these tasks inform our theorizing about the mental operations that underlie human reasoning. Four broad approaches will be considered:
1. People may solve reasoning problems by using simple heuristics (rules of thumb) rather than engaging actual reasoning processes
2. People often make “errors” because the use of language in formal logic differs from that of everyday life
3. The “Mental Models” framework provides an example of an algorithmic description of the steps by which people reason
4. Responses in reasoning tasks are highly sensitive to the framing of the task and the participant’s background beliefs
Two types of reasoning
Inductive reasoning involves drawing general conclusions from particular instances. For example, given the premise “I have fallen asleep in every Psychology lecture so far”, one might draw the conclusion “I will always fall asleep in Psychology lectures”. Inductive reasoning takes many forms and is central to scientific research, but the conclusions are not necessarily true; there is always the possibility that the next Psychology lecture will manage to hold your attention throughout.
Deductive reasoning involves drawing conclusions which follow necessarily from the premises; if we accept that the premises are true, and if the argument follows the rules of logic, then the conclusion has to be true, too.
We will focus on two kinds of deductive reasoning – propositional reasoning and syllogistic reasoning.
Two types of deductive reasoning
propositional reasoning and syllogistic reasoning.
Syllogistic reasoning
The study of Aristotelian syllogisms (aka quantitative syllogisms) provides an alternative approach to the psychological processes that underlie reasoning. Syllogisms typically comprise two premises and a conclusion, and involve the quantifiers all, no, some, and some…not.
The following is an example:
All people who teach psychology are psychologists
Jon teaches psychology
Therefore, Jon is a psychologist
Such arguments may be valid or invalid. Validity is determined by the structure of the argument – the relations between the premises and the conclusion. A valid argument is one where, if one accepts the truth of the premises, then the conclusion is also true. The above example is a valid argument. Of course, one might not accept the premises (in fact, Jon doesn’t have a degree in psychology, he just works as one), but that doesn’t change the validity.
The combination of quantifiers (all, no, some, some…not) and order of terms (e.g., all a are b vs all b are a) gives a total of 512 two-premise syllogisms, most of which are regarded by logicians as invalid.
Studies of syllogistic reasoning typically present the two premises and either ask participants “what follows?” or present a conclusion and have them indicate whether it is valid or invalid.
Imperfect performance
Despite their simple structure, syllogistic reasoning problems can be very hard. For example, in a review of the literature, Roberts and Sykes (2005) found that problems of the form: “all a are b; all b are c; what follows?” were correctly solved by 88% of participants (valid conclusion: “all a are c”). However, given a problem of the form: “all b are a; all b are c; what follows?” only 8% of participants correctly concluded that “some a are c” (or, equivalently, that “some c are a”).
By studying how structural features of the problem change performance, we can try to develop models of how people go about solving this kind of problem. We consider four approaches to understanding performance on these kinds of reasoning task.
four approaches to understanding performance on syllogistic reasoning task.
Approach 1: Identify simplifying strategies
Approach 2: Focus on interpretation of the terms
Approach 3: Posit a sequence of processing steps – the “Mental Models” framework
Approach 4: Consider the role of framing and experience
Approach 1: Identify simplifying strategies
One suggestion is that many people do not actually engage in any reasoning at all when confronted with syllogistic reasoning problems. Rather, they may base their responses on simple heuristics.
An early example is atmosphere theory, according to which the mood of the premises influences judgments about what the mood of the conclusion should be. “Mood” means whether the statement is affirmative or negative, and whether it is universal or particular. (E.g., “all…” is universal and affirmative, whereas “some are not…” is particular and negative).
Begg and Denny (1969) gave participants 64 reasoning problems comprising two premises and a choice of four conclusions. For example:
All a are b
All b are c
All c are a Some c are a No c are a Some c are not a
Participants indicated which if any of the 4 conclusions followed from the premises. Nineteen of the 64 problems had a valid solution among the 4 options presented; the authors focussed on responses for the other 45 problems, where choosing any of the options constituted an error.
* When both premises were positive, 79% of conclusions endorsed were positive
* When at least one premise was negative, 73% of chosen conclusions were negative
* When both premises were universal, 77% of chosen conclusions were universal
* When at least one premise was particular, 90% of chosen conclusions were particular
So this is evidence that the “atmosphere” (quality and quantity) of the premises shapes beliefs about the validity of different possible conclusions – e.g., universal premises lead people to assert universal conclusions.
Crucially, however, this fails to explain why/how participants decide whether or not a syllogism has a valid conclusion, yet when participants are given two premises and asked “what follows?”, they correctly identify that there is no valid inference 29-40% of the time (Roberts & Sykes, 2005). The idea that their conclusions are guided by the “atmosphere” of the premises doesn’t capture this.
Approach 2: Focus on interpretation of the terms
“Errors” in syllogistic reasoning partly reflect differences between the use of language in formal logic and in everyday life.
For example, consider two arguments:
VALID INVALID
All A are B All A are B
All B are C All C are B
Therefore, All A are C Therefore, All A are C
If I take “All C are B” to mean that “All C are B and vice-versa” then the invalid argument on the right would be equivalent to the valid one on the left, and it would be fine to accept the conclusion. Likewise, in logic “Some” means “Some and perhaps but not necessarily all”, but in everyday speech we typically use “Some” to mean “Some but not all”.
In one demonstration, Ceraso and Provitera (1971) presented wooden blocks and had people reason about their properties. In the “traditional” version of the task, people were given syllogisms such as:
All blocks with holes are red
All blocks with holes are triangular
Only 1 out of 40 people correctly identified “Some red blocks are triangular” as the valid inference; more than half endorsed “All red blocks are triangular”, which is what we’d expect if they take “All A are B” to imply “All B are A”.
In a modified version of the task, people were given more explicit instructions about the interpretation of the premises, such as:
Whenever I have a block with a hole it is red, but not all red blocks have holes
Whenever I have a block with a hole it is triangular, but not all triangular blocks have holes
The proportion of people who correctly responded “Some red blocks are triangular” rose to 27 out of 40. Across a number of such problems, people scored an average of 58% correct with the traditional format but 94% correct with the modified versions.
So, these authors argue that syllogistic reasoning errors arise because people don’t properly apprehend the premises in the way that the experimenter intends. However, it is unlikely that premise misapprehension accounts for the full spectrum of performance on this kind of task.