Credit Assignment Flashcards
What is the credit assignment problem?
The credit assignment problem deals with how to assign credit for an outcome over a sequence of steps that leads to the outcome with maximal reward (optimal policy).
Credit assignment equation
Vi = p1x(V1) + p2x(V*2)…..
What does the optimal policy in credit assignment need to account for?
The optimal policy (Vi) needs to take into consideration current state, the value of state for each step taken (Vn) as well as the probability (p) of the state transition.
How will an algorithm determine the optimal policy?
The algorithm will take decisions that exploit to estimate the steps with greatest reward and explore to identify the values of different paths in order to find the optimal policy.
What is the temporal credit assignment problem?
The temporal credit assignment problem takes into consideration that the value of the reward diminishes with increasing time through the discount factor (gamma). Gamma tends to 1 if the value of the reward minimally diminishes over time and tends to 0 if the value of the reward greatly diminishes over time.
What is the equation for the temporal credit assignment problem?
V(t) = r(t) + yV(t+1)
Outline the processes involved in perceptual decision making
a decision variable evolves until a stopping mechanism commits the process to a particular choice (Schall 2001)
What first evidence suggested that the LIP was involved in decision making?
studies demonstrating that neuronal firing in the LIP could predict the direction, timing and magnitude of saccadic eye movement in monkeys viewing random-dot motion for reward (Shadlen and Newsome, 2001)
What variation to initial LIP studies suggested that LIP encodes temporal properties?
When the visual stimulus was presented only briefly, LIP activity persisted for up to 800ms until a decision was made which corresponded to the saccade (Huk and Shadlen, 2005), indicating that the LIP is involved in time integration of the response.
What have the spike rates of LIP neurones been interpreted as?
The spike rates of single LIP neurons have been interpreted as direct neural correlates of an evolving decision variable (Gold & Shadlen 2007).
What regions besides the LIP have been implicated in decision making?
Meanwhile, neuronal firing in the middle temporal visual area (MT) have been implicated in representing the motion stimulus (Britten et al., 1993).
Outline a model for decision making by LIP
Together, these properties have given rise to a model where LIP neurons either integrate, or reflect the integration of, motion evidence from area MT in favour of a decision.
What study opposes LIP as the decision maker?
However, that silencing the LIP does not impact decision making (Katz et al.,2016) suggests that direction-related signals in the LIP may be a result of feedback or extensive training.