goal-directed behaviour and habits Flashcards
stimuli to response learning
S-R habit learning
stimuli to outcomes learning
S-O pavlovian learning
response to outcome learning
A-O goal-directed learning
instrumental learning
a change in beh produced by a causal relationship between the beh & a biologically important stim
S-R habit learning
- thorndike’s “law of effect”
- +ve reinforcers strengthen the connection between a stim & the response
- -ve reinforcers weaken the connection between a stim & the response
- thorndike developed this theory based on how quick cats can get out of a box
- presentations of the stim elicit the instrumental action as a response
instrumental actions
- action outcome learning
- “human actions are those behaviours that persons have chosen to perform, and perform for a reason” (Greve, 2001)
- actor has an intention to execute the beh
- has a belief about the causal relation between the action and the outcome
- beh produces a desired outcome
- intention when belief and desire met
- cog theories implement this as a practical inference
adams & dickinson (1981)
outcome value
- instrumental training - rats trained to press a lever to obtain a reward (brown pellet) whilst second rewards was presented noncontingently (sugar pellet)
- outcome devaluation - for some rats (D-N) the brown pellet, which was presented during training, was paired with LiCl, whilst for other rats (N-D) the noncontingent pellet was paired with LiCl
- extinction test - rats in both groups were tested for lever press beh on extinction
- during extinction in D-N group pressed less than rats in N-D group –> sensitivity outcome devaluation
- reinforced tests reveal that rats in both groups has learned the aversion in the devaluation phase
satiety specific outcome devaluation
- 3 phases
- training - rats pressed lever & obtain one type of pellet
- before test on extinction, rats were pre-fed with the same pellets they were working for same or diff pellet
- lever presses during extinction show less instrumental beh when rats were pre-fed with the pellets experienced during training relative to the diff pellets
contingency degradation
- each time animal produces action: pellet
- start to degrade relation by some actions not being followed by pellet
- 50% relationship between action and pellet
- unpaired outcome or noncontingent - give free outcomes
hammond (1980)
contingency degradation
- sensitive to probability of outcomes of actions
- probability of free outcomes increasing: dependent on other probability
- calculate probabilities
- instrumental conditioning linked to the two probabilities
variables that determine whether beh is G-D or habitual
- amount of training
- scheduales of reinforcement
- choice
- contiguity
S-R and A-O learning
- A-O and S-R learning can be revealed by diff amounts of training
- adams (1982) trained rats to lever press for sucrose pellets, but he varied the amount of training (100 vs 500 lever presses)
- –> devalued the outcome & tested on extinction & on reacquisition
- during test - rats with minimal training (100) showed goal-directed beh, less pressing in group devalued relative to the non-devalued control
- rats that received long training, no ev of devaluation, indicative of habits
schedules of reinforcement
- ratio schedules - model an environment in which resources are constantly replenished (unlimited)
- interval schedules model an environment with depleting sources that regenerate after a fixed or variable interval
dickinson et al. (1983)
schedules of reinforcement
- compared random ratio and variable interval schedules of reinforcement
- used an outcome devaluation procedure in which they paired the O with sickness (LiCl)
- outcome devaluation - pellet present during training was paired with LiCl in groups devalued (D) but not in control groups (N)
- followed by a test of lever pressing on extinction
- only rats trained with a ratio schedule showed an outcome devaluation (goal directed)
- interval schedule lower - still not 0, they were habitual
amount of training - choice
- Colwill & Rescorla (1985) attempted to replicate Adam’s (1982) findings & failed to find an effect of amount of training
- trained concurrently two levers that resulted in the presentation of 2 different outcomes
- they tested in a choice procedure
- after training with choice procedure devalued consequence
- varied amount of training with different levers
- two levers & two rewards: choice, minimise habituation
kosaki & dickinson (2010)
choice
- noncontingent group only had one lever
- same amount of total reinforcer - diff ways of achieving this
- rate of responding lower if they had choice
- all rats learned though
- training with single lever - replicated results of Adams, didn’t matter if devalued
- choice - still goal-directed
contiguity - temporal closeness between A-O
- most research investigating action-outcome learning has focused on immediate consequences
- actions followed by delayed outcomes may better capture many of the decisions and actions that we make in our day-to-day activities
- foraging animals make choices for consequences delayed in time
- humans save for retirement, or for children’s education
- scientists have ideas, pursue findings, do the research, and the publish the research many years down the line
Urcelay & Jonkman (2019)
contiguity
- assessed whether A-O contguity has an effect on sensitivity to OD (satiety-specific)
- trained to press lever in 2 diff contexts with diff levers and pellets
- one context: pellets followed immediately after lever press
- other context: pellets presented 20s after a lever press
- during outcome devaluation - 1 of the pellets was pre-fed & rats were immediately tested for lever pressing on extinction
- repeated 4 times so that all rats were prefed with same or diff pellets in each context
- outcome devaluation effect in immediate context
- not in delayed context –> delayed outcomes facilitate outcome formation
computational accounts
perez & dickinson (2020)
- gaol-directed system: computes response-outcome rate corr & determine current G-D strength
- habit system: accumulates habit strength by summed reward prediction-error
- these two are summed
- that determines whether a beh is emitted or not
habit account of OCD
- Gillan et al. (2014) used an avoidance task & compared OCD patients & controls in their sensitivity to outcome devaluation
- after small amount of training
- after extended training
- training: no difference between sweat conduction across all these stimuli between controls and OCD patients
- early devaluation test: both reduced responding for the sided that was disconnected, understood the task & devaluation procedure
- devaluation after extended training: smaller devaluation in OCD patients, more likely to have an urge to respond
a habit account of drug addiction
- drugs of abuse excerpt their effects through neural systems involved in feeding & sexual behs
- everitt et al. (2001) proposed that drug addiction can (in part) be understood as a transition from goal-directed to habitual beh that is exacerbated by the drug’s effects on similar neural systems as food
- dopamine release in ventral striatum
- develops into habit quicker than food reinforcer
corbit et al. (2012)
habit account of drug addiction
- trained rats to press lever for small amount of beer
- kept training with either beer or sucrose
- in one week they were goal directed
- after two weeks they were still goal directed
- after four weeks they were no longer goal directed
- over time behaviour became habitual
- even without training in between (bottom two graphs) this pattern is still seen
- so not because they are used to devaluation task
non-contingent alcohol facilitating habit formation for sucrose
corbit et al. (2012)
- press for sucrose
- behaviour goal-directed after two and eight weeks
- work for sucrose but after the training given ethanol as well
- after 8 weeks: sucrose with ethanol became habitual