Dimensionless Bayesian Model-Based Reinforcement Learning

Charvet, Valentin (2024) Dimensionless Bayesian Model-Based Reinforcement Learning. PhD thesis, University of Glasgow.

Full text available as:
[thumbnail of 2024CharvetPhD.pdf] PDF
Download (6MB)

Abstract

This work explores an approach for improving the robustness of Model-Based Reinforcement Learning algorithms by transforming the observation and decision spaces with the Buckingham-Π theorem. This theorem is part of the field of Dimensional Analysis (DA) which studies the link between physical measurements and the units they are expressed in. The Buckingham-Π theorem provides a dimensionality reduction technique through a power law between the variables. The transformation can be applied on inputs and outputs of statistical learning models to increase their robustness. We extend prior work to study the impact of that procedure, called non-dimensionalization, through its equivariance properties on stationary dynamic systems. Our method stems from increasing the level of a priori physics knowledge within the Machine Learning models. That additional knowledge is brought implicitly by the constraints implied by the non-dimensionalization procedure into Machine Learning models. The results in this thesis suggest this approach is well suited for zero-shot transfer learning without data augmentation.

Throughout this thesis, we conduct the experiments on pendulum and cartpole environments within numerical simulations. First, we propose a framework for applying the Buckingham theorem to dynamic systems. We showed that under a full-rank assumption, we can transform the state variables as a function of the static variables. This transformation in turn yields estimators that are resilient to perturbations of the underlying dynamics. We included comparisons between Gaussian Process and Multi-Layer Perceptron for the regression task. The estimators are able to make maintain good predictive performance in the presence of distribution shift. Second, we propose a method to circumvent the need to measure all the variables for the transformation. With a probabilistic approach, we infer the hidden variables and constrain their dimensions. We expose two cases for this latent variables model, one that requires observations of the hidden variables during training and one that does not. Finally, we apply the previous findings to a Reinforcement Learning problem. To do so, we modify the Contextual Markov Decision Process (MDP) and non-dimensionalize the state and action spaces. Subsequently, we propose a generic model-based policy search algorithm within the dimensionless Π-MDP and demonstrate results with Gaussian Process dynamics models. We showed that within the evaluated environments, the dimensionless controller is more robust than its natural counterpart.

We showed the benefits of the transformation for generalizing predictions under distribution shift. The simplicity of the approach allows it to be applied to different domains such as regression and sequential decision-making. Our experiments suggest the Buckingham transformation is a promising avenue for statistical modelling under distribution shift.

Item Type: Thesis (PhD)
Qualification Level: Doctoral
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Colleges/Schools: College of Science and Engineering > School of Computing Science
Supervisor's Name: Murray-Smith, Professor Roderick
Date of Award: 2024
Depositing User: Theses Team
Unique ID: glathesis:2024-84765
Copyright: Copyright of this thesis is held by the author.
Date Deposited: 20 Dec 2024 09:53
Last Modified: 20 Dec 2024 10:18
Thesis DOI: 10.5525/gla.thesis.84765
URI: https://theses.gla.ac.uk/id/eprint/84765

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year