Flexible joint modelling of multivariate extreme and non-extreme events

Hu, Chenglei (2026) Flexible joint modelling of multivariate extreme and non-extreme events. PhD thesis, University of Glasgow.

Full text available as:
[thumbnail of 2026HuChengleiPhD.pdf] PDF
Download (39MB)

Abstract

In fields such as finance and environmental science, modelling the entire distribution of events with a particular focus on extremes is critical for risk management. Extreme Value Theory (EVT) offers a rigorous framework for such modelling. Initially developed to study the asymptotic behaviour of maxima of i.i.d. sequences, EVT was later extended to characterise the tails of distributions. A widely used result in univariate EVT is the peak-over-threshold (PoT) method, which approximates the tail of a distribution using the Generalised Pareto Distribution (GPD) above a sufficiently high threshold. This has motivated a “sliced” modelling framework that combines a separate distribution for the bulk (below threshold) with a GPD for the tail.

This thesis extends the sliced model to the multivariate setting and proposes three frameworks to either address the practical challenges arising in such extensions or provide alternate approaches for joint modelling of the bulk and tail.

Our first contribution is a multivariate analogue of the sliced model, combining a parametric bulk distribution with a multivariate GPD (mGPD) for the tail. The threshold separating bulk and tail is treated as a free parameter to avoid manual specification. Simulation studies demonstrate that the model robustly estimates marginal behaviour and both bulk and tail dependence, even under misspecification (e.g., when data are asymptotically independent but the model assumes asymptotic dependence). However, three limitations hinder scalability and realism in higher dimensions or large datasets. First, the mGPD is infinitely parameterised, with only a few closedform representations available, risking bias if the true dependence deviates from these forms.

Second, the piecewise construction introduces discontinuities at the bulk-tail boundary, both in the margins and in the dependence structure, which are unrealistic for large datasets. Third, structural inconsistency arises: while the mGPD is always asymptotically dependent, the bulk model (e.g., Gaussian) may be asymptotically independent, leading to conflicts in dependence representation. Moreover, the fixed dependence class of the mGPD limits its applicability in contexts such as spatial modelling, where tail dependence may vary with distance.

To address the first issue, we introduce GPDFlow, a novel mGPD framework in which the dependence structure is modelled using normalising flows, which is a flexible class of generative models. Unlike classic mGPDs, GPDFlow avoids closed-form constraints and instead learns a parameterised dependence structure through flows, with density evaluation performed numerically. GPDFlow explicitly transforms light-tailed distributions into heavy-tailed ones, overcoming typical limitations of generative models. It performs well in describing the data where only subsets of variables are extreme and outperforms standard mGPDs in estimating both marginal and tail dependence.

To address the issues of discontinuity and fixed asymptotic dependence, we develop a second framework combining the extended GPD (eGP) with a latent Gaussian model, implemented in the R-INLA package using the integrated nested Laplace approximation (INLA). The eGP is a sub-asymptotic distribution that retains the key properties of the GPD while avoiding the need for threshold specification, yielding a fully continuous model. Dependence is captured through latent Gaussian fields, ensuring coherence and continuity across the entire distribution. We illustrate this approach in a one-month-ahead spatio-temporal wildfire forecast application in Portugal, focusing on moderate and extreme burn areas. A two-stage ensemble design integrates environmental and historical data: in the first stage, an XGBoost model learns complex covariate patterns, producing pseudo-covariates that feed into the second-stage latent Gaussian model. This addresses key limitations of the INLA framework in handling high-dimensional covariates and obtaining future environmental inputs in retrospective analyses. The eGP model and associated priors are now fully implemented in R-INLA and publicly available to users.

Finally, to explore a purely deep learning-based solution without asymptotic constraints or rigid latent structures, we propose a model tailored to the EVA Data Challenge 2025, which involves estimating the expected number of daily precipitation extremes across a 5×5 spatial grid over 165 years. We use a long short-term memory (LSTM) network to encode spatio-temporal patterns and condition a denoising diffusion probabilistic model (DDPM) on the resulting hidden states. The model operates on log-transformed, zero-adjusted precipitation data. For comparison, we also develop a sliced model with conditional independent margins, using aWeibull distribution for the bulk and a GPD for the tail. The diffusion-based model performs better for five out of six target quantities in the data challenge evaluated at lower thresholds, and accurately captures tail heaviness, as validated by marginal GEV shape parameter analysis on simulated and real data.

Item Type: Thesis (PhD)
Qualification Level: Doctoral
Additional Information: Supported by funding from a College of Science and Engineering scholarship, University of Glasgow.
Subjects: Q Science > QA Mathematics
Colleges/Schools: College of Science and Engineering > School of Mathematics and Statistics
Funder's Name: College of Science and Engineering, University of Glasgow
Supervisor's Name: Castro-Camilo, Dr. Daniela
Date of Award: 2026
Depositing User: Theses Team
Unique ID: glathesis:2026-85833
Copyright: Copyright of this thesis is held by the author.
Date Deposited: 20 Mar 2026 14:16
Last Modified: 20 Mar 2026 14:16
URI: https://theses.gla.ac.uk/id/eprint/85833

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year