Estimating reinforcement learning model with attention-weighted decisions reveals that paying full attention to the relevant factor is not necessary for success
By Genela Morris and Anna Rubinchik
Location Bloomfield 424
Academic Program: Please choose
Wednesday 26 April 2017, 10:30 - 11:30
Reinforcement learning (RL) describes the process by which during a series of trial-and-error attempts, those actions that culminate in a reward are strengthened. When the actions are based on sensory cues, an association is formed between the sensory cue, the action and the reward. Computational, behavioral and neurobiological accounts of this process have been very successful in explaining simple stimulus-response learning. However, real-life cues are often complex combinations of multidimensional stimuli. In such cases, the task of assigning the correct feature of the cue with relevance to reward is not trivial, and the underlying cognitive process is poorly understood. To tackle this question we adapted an intra-dimensional / extra-dimensional set shifting paradigm to train rodents on a multidimensional sensory discrimination task. In our setup, stimuli of different modalities (spatial, olfactory and color) are combined into complex cues and manipulated independently. In each set, only a single stimulus dimension is relevant for reward. To distinguish between learning and decision-making we combined an adapted classical reinforcement learning model with a decision rule that chooses an alternative according to a weighted average of learnt values, factored by assumed dimension relevance which can be associated with attention. This model outperformed a simple reinforcement learning model of all available sitmuli. Examination of the parameters that best describe animals' behavior reveals that a high success rate can be achieved with a surprisingly low weight associated with the relevant dimension. Such low weight is particularly prominent after an extra-dimensional shift phase in the animals' training, offering an explanation for the delayed learning in this phase, and suggesting added value to keeping alert to future rule changes in the environment.