Planning Flashcards
- 2 Long-term or short-term maximizing: Do animals plan ahead?
- 2.1 Delayed reinforcement, impulsiveness, and temporal discounting
The psychological literature on learning and choice suggests that animals seldom anticipate events more than a few seconds or minutes in the future (W. Roberts 2002). Even a small delay between response and reinforcer has a devastating effect on rate of
learning (see Bouton 2007). Learning with delayed reinforcers can be improved in various ways, for example by introducing one or more stimulus changes between response and reinforcer, but the delays that can be bridged in this way are generally minutes at the most.
Even the knowledge that free food will come later does not decrease rats’ willingness to work quite hard for food available in the present (Timberlake 1984). In Section 11.1 we have seen other evidence of apparent insensitivity to long-term gain, for instance in the suggestion that animals choose among options on the basis of the delay to the next scheduled reward, that is, short-term rather than long-term E/T. Nowhere is this more evident than in so-called self-control experiments, experiments which in fact demonstrate exactly the opposite, namely impulsiveness, also referred to as preference for immediacy or temporal myopia (W.
Roberts 2002).
By analogy with situations in which people might exhibit self control by, say, rejecting a beer now in the interest of a safe drive home later, subjects in experiments
on self control choose between a short delay to a relatively small reward and a longer delay to a larger reward. In the typical design, diagrammed in Figure 11.12, the total durations of trials are equated across options by adjusting the delay between reward
and the beginning of the next trial.
Thus the long delay/large reward option gives more reward per trial and thereby maximizes intake over a session even when it gives the same E/T as the small reward/short delay option if T is measured from trial onset to reward. Most animals that have been tested strongly prefer a short/small option. For example, given the choice between 2 seconds of eating after a 2-second wait and 6 seconds of eating after a 6-second wait, pigeons and rats choose the short/small option on 97% and 80% of trials respectively (Tobin and Logue 1994), but cynomologous monkeys show self control with similar parameters (Tobin et al. 1996) as
do human adults (Tobin and Logue 1994).
Although humans are not necessarily very good at delaying gratification in real-life economic situations (Fehr 2002), the dramatic failures of other species to show even modest self control (or, equivalently, patience) in laboratory tests suggests that we may be better at it than most other animals. In a direct test of this notion, Jeffrey Stevens and colleagues (Stevens, Hallinan, and Hauser 2005; Rosati et al. 2007) tested humans, marmosets, tamarins, chimpanzees, and bonobos with the delay to the larger reward titrated until a long/large option was chosen as often as an immediate/small one.
Most interestingly, when chimpanzees and university students chose between two food items immediately and six items delayed two minutes, the chimpanzees chose the delayed reward on over 70% of trials, whereas the students chose it on fewer than 20% of trials. When money was substituted for food, however, the students’ chose the delayed reward on nearly 60% of trials. The authors conclude from the differences between monkeys on the one hand and apes and humans on the other that ‘‘core components of the capacity for future-oriented decisions’’ are shared
across the ape/human lineage (Rosati et al. 2007, 1663).
The effect of money versus food on humans’ choices should serve as a reminder that conclusions about species differences here, as elsewhere, must be based on multiple tests. When food is involved, species differences in impulsiveness may be related to body weight, with smaller animals being more impulsive (Tobin and Logue 1994), or to feeding ecology, with animals that catch active prey like moving insects
being more impulsive (Stevens, Hallinan, and Hauser 2005).
The ability to delay reward may also be important in social contexts. In theory, some reciprocal social
exchanges involve performing a costly altruistic act in anticipation of the favor being returned hours or days later (see Chapter 12), but the few relevant data provide little evidence that reciprocal social exchanges could be based on such self control (Stevens and Hauser 2004). For example, chimpanzees wait at most eight minutes to exchange a small piece of cookie for a large one (Dufour et al. 2007).
But all the data mentioned here are from animals waiting (or not) for food. Self-control might be more evident with resources that are unlikely to disappear or be lost to competitors, such as a water hole or a safe shelter within an animal’s territory. A psychological explanation for impulsiveness is that rewards are discounted in value the more they are delayed. Discounting can be tracked in a titration procedure
(see Section 11.1.4) to discover how much immediate food is psychologically equivalent to a given amount of delayed food.
Consistent with self-control experiments, this
procedure reveals very steep discounting functions for rats and pigeons. For example, to a pigeon food delayed 2–4 seconds is worth less than half the same amount of immediate food (Green. et al. 2004). An informal functional explanation is that psychological discounting is an adaptation to the uncertainty of the future: delayed rewards should be devalued in proportion to the probability that they will decay, be
lost to competitors, or the like.
As the adage has it, ‘‘a bird in the hand is worth two in
the bush.’’ But the discounting functions just described seem more extreme than any natural situation demands. One alternative explanation is that the short term rate maximization they reflect does lead to maximizing long term intake rates in natural situations, as when choosing whether to stay in a patch or move on (Stephens., Brown, and Ydenberg 2004).
Thus discounting is consistent with evidence in
Section 11.1 that animals can behave much as predicted by models based on maximizing long-term E/T even when responding on the basis of shorter-term currencies. In any case, none of the evidence summarized here indicates that animals anticipate events days or even hours in the future.
11.2 Long-term or short-term maximizing: Do animals plan ahead?
Learning and memory allow animals to behave in ways that prepare them for the future, but without any explicit representation of the future (or the past) as such. For
example, conditioned responses express present knowledge about stimuli experienced in the past. Similarly, animals may respond adaptively to recurring daily or seasonal cues by migrating, hibernating, building nests, or caching food, but members
of each new generation do so before experiencing the consequences of their behavior and thus presumably without foreseeing those consequences.
The Monarch butterflies that fly from Canada to Mexico each fall are the grandchildren or even more distant descendents of the last Monarchs that made the trip South, and they find the wintering grounds without ever having contact with experienced individuals. Food
caching bird species whose development has been studied begin to cache early in development even when hand-raised (Clayton 1994), and as adults food-storers express a compulsion to cache without regard to consequences (de Kort. et al. 2007).
Given the power of selection together with learning to produce future-oriented behaviors that do not demand interpretation as planning, what would behavioral
evidence for future planning look like? This question hardly arose in comparative cognition research until recently, when Suddendorf and Corballis (1997) drew attention to a claim by Bischof and Kohler that animals, unlike adult humans, are cognitively ‘‘stuck in the present’’ (see also W. Roberts 2002). They dubbed this
claim, that no animal engages in mental time travel, recreating the past and imagining the future, the Bischof-Kohler hypothesis.
By itself such a hypothesis is meaningless, even empty, because we have no direct access to animals’ mental events. But this consideration has not discouraged attempts to demonstrate planning for the future in
nonhuman species. As with studies of animal episodic -like memory (Chapter 7), the convincingness of these demonstrations depends on how well they fit a clear set of behavioral criteria.
Suddendorf and Busby (2005) proposed that to be evidence of planning, a behavior or combination of behaviors must be novel (thus ruling out conditioned responses, migration, and the like), and it must function in the service of a motivational state other than one the animal is in at the time of performing it. For instance, like a shopper heading to the supermarket after an ample dinner, an animal that can plan would amass resources against future hunger or thirst even while sated.
This criterion helps to rule out behaviors acquired or maintained through long-delayed reinforcement, assuming they would not continue without the relevant motivation. Another criterion that rules out gradual learning with delayed reinforcement is that the
behavior should be shown reliably as soon as the required information is provided.
Finally, planninglike behavior should not be domain-specific but be capable of being expressed in more than one context. Even human children do not show behavior that fits all these criteria until they are
four or five years old (Suddendorf and Busby 2005), but so far no other animals ever do. For instance, planning a route among familiar sites can be seen (cf. Chapter 8 and Janson 2007) as choosing among present stimuli, the cues visible from the animal’s present location, which are associated with different delays, energy expenditures, and/or amounts of reward.
In a study conceived as a test of the Bischof-Kohler hypothesis, Naqshbandi and Roberts (2006) allowed monkeys or rats to choose between two quantities of food, dates for the monkeys and raisins for the rats. The
animals naturally preferred the larger amount, but eating so many dates or raisins at once demonstrably made the animals thirsty. To test whether they could foresee their future thirst while sated with water, they were then exposed to a regime in which water was removed from their cages when the foods were offered but returned sooner on trials when the smaller amount was chosen.
With the monkeys, choice of the larger number of dates did fall after about 6 trials and in the one monkey tested it recovered when baseline conditions were reinstated. Rats did not show a comparable effect, but
they showed only a weak and variable preference for the larger quantity in the first place. In any case, the fact that preference changed gradually, if at all, means this
example fails the test of planning and suggests that delayed reinforcement or punishment was operating in some way.
In an experiment very much like one suggested by Tulving (2005) as a test of mental time travel for children, Mulcahy and Call (2006) tested whether bonobos and orangutans save tools for future use. The animals first learned to use a tool to get grapes from a dispenser. Then while the apparatus was blocked they had opportunity to choose one object from a collection of objects in the test room and take it to an adjoining chamber, where they waited for an hour before being readmitted to the test room with the apparatus available.
All animals tested took a tool on some occasions,
but their performance was very spotty. For instance, one orangutan took a tool four times in a row on the first eight trials and then not again till trial 14. An anthropocentric view point, that is, folk psychology, would seem to predict that once an animal understood that planning ahead is helpful it would plan on every trial. Moreover, in this task how often a particular tool is taken at random most likely depends on the alternative objects offered and how often they have been paired with food or otherwise used in the recent past, and this was neither well specified nor investigated here.
And finally, because using the tool resulted in a treat of grapes that the animals presumably always desired, planning for a future need was not tested (Suddendorf
2006). Somewhat stronger evidence comes from a similar study (Osvath and Osvath 2008) in which two chimpanzees and an orangutan nearly always chose a tube for sucking juice from a container an hour before opportunity to use it. Whether the animals were planning or simply taking the object most strongly associated with food was addressed by making a piece of favorite fruit one of the options. All animals still
chose the tube first on at least half the trials.
Moreover, when a second choice of fruit versus tube was given after an animal already had a tube, all animals chose the fruit. They also showed appropriate choice of a stick tool. A next step with studies of this
kind would be to offer multiple functional tools and seek evidence that choice anticipates a specific future task. However if such behavior were found, it would be
necessary to show how it was different from a conditional discrimination based on present cues to what tool can be used next.
In any case, the candidate that so far fits the largest number of criteria for planning comes not from choice of tools but from food storing in scrub jays (Raby et al. 2007). The birds in this study lived in large cages with three compartments (‘‘rooms’’; Figure 11.13). After first acquiring information about which room had food in the
morning, they behaved as if planning for breakfast by caching food items in the evening where they were most likely to be needed.
For example, in Experiment 1 they first experienced three cycles of a treatment in which they received ‘‘breakfast’’ of pine nuts in the morning in one end room; in the other end room, no breakfast was
provided until 2 hours after daylight. In the test, for the first time whole pine nuts were provided in the central room in the evening along with sand-filled trays for
caching in the two end rooms. The birds cached three times as many pine nuts in the ‘‘no breakfast room’’ as in the ‘‘breakfast room.’’
Importantly, all the data came from this first test, before the birds had experienced the consequences of their choices. Similarly, birds learned to expect breakfast in both rooms in the second experiment, peanuts in one and dog kibble in the other. On their first opportunity to cache peanuts and dog kibble in the evening, they distributed their caches so as to provide each room
with more of the food it usually lacked.
Although this study was greeted (e.g., by Shettleworth 2007) as an advance over earlier ones with primates, it lacks a control for the possibility that scrub jays were
expressing a natural tendency to spread out caches of a given food type irrespective of information about how this would determine what they had to eat the next day
(Premack 2007; Suddendorf and Corballis 2008b; but see Clayton. et al. 2008). For an animal that caches different kinds of items (and that, as we saw in Chapter 7, can remember what it cached where), a strategy of distributing items of each type as widely as possible would help to defeat predators that might raid just one of those types.
In any case, the birds’ hoarding here is not clearly behavior for a future need because although they could both eat and cache during the test they may have been
somewhat hungry. Correia and colleagues (Correia, Dickinson, and Clayton 2007) used stimulus-specific satiety to address this issue. Birds were sated on one of two foods, peanuts or dog kibble, by prefeeding them with it just before opportunity to cache both foods. Such prefeeding selectively suppresses not only consumption but caching of the prefed item.
Here, however, some birds were additionally prefed the
alternative item just before opportunity to recover their caches. If they could foresee that they would not want this item at the time of retrieval, they should suppress
caching of it initially rather than of the item they were just prefed. Although the findings of this study appear under a title proclaiming positive results, the birds
cached so little in the test trials, some of them not at all, that the best conclusion here is ‘‘provocative but not proven.’’ Moreover, even if more substantial data were
consistent with those published so far, they can be interpreted as a novel and subtle adaptation of the food-hoarding system rather than evidence for a more domain general ‘‘mental time travel’’ (Premack 2007; Suddendorf and Corballis 2008b).
As with episodic memory (Chapter 7), an important impetus for the set of studies reviewed here is the wish to understand the neural substrate of human mental time travel. Mentally recreating the past and imagining the future turn out to share neural underpinnings in normal adults, and patients with impaired episodic memory may also have difficulty thinking about the future (Addis, Wong, and Schacter 2007).
None of this seems very surprising. Both conscious and unconscious memory presumably were selected in the first place because they allow past experience to
influence future behavior. Indeed, it seems plausible that the adaptive value of episodic memory, in the sense of ‘‘mental time travel’’ into the past, lies entirely in allowing its possessor to imagine and thus plan for the future.
Autonoetic consciouness and concomitant future planning may indeed be uniquely human (Suddendorf and Corballis 2008a), but other animals clearly share with humans multiple kinds of future-oriented behavior (W. Roberts et al. 2008; Raby and Clayton 2009). Notwithstanding the challenge laid down by the Bischof-Kohler hypothesis, a more productive way forward may be to look for the components of planning, which species show them, and under what conditions (Raby and Clayton 2009).