Structural learning for continuous data using graphical models

Szili, Benjamin (2022) Structural learning for continuous data using graphical models. PhD thesis, University of Glasgow.

Full text available as:
[thumbnail of 2022SziliPhD.pdf] PDF
Download (1MB)


The field of statistics, and science as a whole has continuously been improving the study of increasingly complex structures, not only focusing on their individual components but also how they interaction with each other and their dependencies. This thesis is centered around learning the structure of data using graphical models.

Graphical models allow the visual representation of dependence relations between a set of random variables through graphs, specified by some type of model formula. Whether the goal is probabilistic inference, mainly dealing with belief propagation, or causal inference, focusing on interventions, it is very important to be able to learn the underlying structure given a set of variables.

In graph theory, this can be achieved by applying structural learning algorithms, either based on constraints or some scoring function. While there are a number of such algorithms that have been used extensively and are effective, they are somewhat limited by the type of relationships they can learn.

Current methods excel at learning a network structure when the variables of interest are discrete, or if continuous, they are Gaussian. The main objective of this thesis is therefore to create a structural learning algorithm that is able to establish dependence relations that are not between discrete or Gaussian variables.

Initially relevant literature and key methodology on graphical models, kernel methods and information theory were reviewed. The aim was to use a measurement that can reliably detect pairwise and conditional dependencies between random variables that are not necessarily Gaussian. The main contribution of the thesis is then the incorporation of kernel methods and Mutual Information to create new structural learning algorithm using graphs to visualize the learned structure.

The resulting algorithm with three variants is then applied in a number of settings. First, a simulation setup using the post non-linear noise model on synthetic data was examined to compare performance of the new algorithm to a current approach. As the structure was known from the setup, the focus in this context was to see whether the algorithm successfully learns the structure.

The remaining two settings then present cases where the new algorithm can be applied to provide insight and improve inference by accounting for the dependence structure. One of these settings is focusing on distinguishing handwritten digits, initially using Gaussian Process latent variable models (GP-LVMs). The second setting then applies the algorithm in to the field of phonetics, the task focusing on identifying speakers based on sound data.

Item Type: Thesis (PhD)
Qualification Level: Doctoral
Subjects: Q Science > QA Mathematics
Colleges/Schools: College of Science and Engineering > School of Mathematics and Statistics
Supervisor's Name: Niu, Dr. Mu and Neocleous, Dr. Tereza
Date of Award: 2022
Depositing User: Theses Team
Unique ID: glathesis:2022-83691
Copyright: Copyright of this thesis is held by the author.
Date Deposited: 29 Jun 2023 13:44
Last Modified: 29 Jun 2023 13:44
Thesis DOI: 10.5525/gla.thesis.83691

Actions (login required)

View Item View Item


Downloads per month over past year