On the computational and neural characterisation of reward learning behaviour

Ban, Kitti (2024) On the computational and neural characterisation of reward learning behaviour. PhD thesis, University of Glasgow.

Full text available as:
[thumbnail of 2024BanPhD.pdf] PDF
Download (5MB)


Do we learn differently from better- or worse-than-expected decision outcomes? Over the past decades, converging evidence emerged about the crucial role of the dopaminergic system in guiding learning through signalling reward prediction errors. However, a complete characterisation of how this learning process is influenced by feedback valence, surprise, and uncertainty is still lacking. The current thesis focuses on exploring the differential behavioural and neural mechanisms related to learning from positive versus negative decision outcomes whilst examining the influence of uncertainty on these processes. In our first experiment, we collected simultaneous EEG and eye-tracking data during a probabilistic reversal learning task. Using multivariate EEG analysis, we replicated the two distinct spatiotemporal reward learning systems reported by Fouragnan and colleagues (2015). Given that locus-coeruleus-noradrenaline (LC-NA) activity is difficult to directly measure non-invasively in humans, we used the pupil response as a proxy for LC-NA activity. We showed that the increased feedback-related pupil response to negative compared to positive outcomes is exclusively driven by increased negative feedback processing in the early and the late system. Additionally, a stronger coupling between early, but not late, system activity and the feedback-evoked pupil response was linked to reduced performance, increased uncertainty as well as exploration propensity. In line with existing research indicating the LC-NA network in uncertainty signalling and network resets, we propose that when internal estimates of environmental uncertainty surge in response to negative feedback, the early system, regulated by noradrenergic activity, interrupts processing in structures of the late system. Such network resets may aid flexible adaptation to changing environments by simultaneously reducing the influence of learned value representations and increasing the neural gain of new information. Our second experimental chapter extended the above study by examining post-feedback response adaptation as a function of early and late system activity. Specifically, we utilised hierarchical drift diffusion modelling, in which the drift rate and boundary separation were constrained by trial-wise and valence-specific early and late system activity. We hypothesised that an LC-NA-induced interruption in reward learning structures would reduce subsequent evidence accumulation as learned value representations become less influential and participants consider a reversal in reward contingencies more likely. Consistent with this hypothesis, we found that increased negative feedback processing by the early and late system reduced evidence accumulation in the next trial. Furthermore, a stronger association between the feedback-locked pupil response and early system activity following negative outcomes was significantly associated with the degree of drift rate reduction prompted by the early system. This result implies that LC-NA mediated network resets may be primarily associated with the early system, which in turn may down-regulate late system activity. Our final study explored differential value learning in the Balloon Analogue Risk Task (BART) under varying levels of uncertainty. By deriving differential learning rates from the newly developed Scaled Target Learning model, we showed that participants preferentially learn from positive compared to negative feedback under increased levels of uncertainty. Furthermore, the degree of this learning bias was negatively related to performance under the highest level of uncertainty. These results provide further evidence for differential mechanisms implicated in positive and negative feedback processing and indicate the important modulatory role of uncertainty in reward learning. Together, this thesis provides novel insights on the valence-specific neural and behavioural characteristics associated with feedback processing. Our results also highlight the important modulatory role uncertainty and noradrenaline play in reward learning and thus provide a more complete depiction of reward learning behaviour.

Item Type: Thesis (PhD)
Qualification Level: Doctoral
Subjects: R Medicine > RC Internal medicine > RC0321 Neuroscience. Biological psychiatry. Neuropsychiatry
Colleges/Schools: College of Medical Veterinary and Life Sciences > School of Psychology & Neuroscience
Supervisor's Name: Lages, Dr. Martin and Philiastides, Professor Marios
Date of Award: 2024
Depositing User: Theses Team
Unique ID: glathesis:2024-84313
Copyright: Copyright of this thesis is held by the author.
Date Deposited: 10 May 2024 14:13
Last Modified: 15 May 2024 09:40
Thesis DOI: 10.5525/gla.thesis.84313
URI: https://theses.gla.ac.uk/id/eprint/84313

Actions (login required)

View Item View Item


Downloads per month over past year