Spatial modelling of soil and air pollution extremes

Cuba, M. Daniela (2024) Spatial modelling of soil and air pollution extremes. PhD thesis, University of Glasgow.

Full text available as:
[thumbnail of 2024CubaPhD.pdf] PDF
Download (40MB)

Abstract

In this thesis, novel spatial statistical methods for unreplicated bivariate heavy metal soil contamination and replicated PM2.5 air pollution are developed by combining existing statistical approaches with extreme value theory. An introduction to the motivation behind this research is given in Chapter 1 while the necessary statistical and applied background for this research is given in Chapters 2 and 3, respectively.

In Chapter 4, the extremal dependence between threshold exceedances of heavy metal contaminants in the Glasgow Conurbation is investigated using two extreme value models with different extremal dependence structures that ignore the spatial dimension of the contaminants. The results show that for most contaminant pairs, moderately low quantile thresholds (u < 0.95) exhibit constant dependence, which can be modelled using a rigid dependence model, while exceedances of extreme quantile thresholds (u > 0.95) almost always display decaying dependence, requiring the flexible dependence of subasymptotic models. More specifically, the results show that chromium has a different migration behaviour than other elements, resulting in strongly decaying extremal dependence. Further evidence of this difference in behaviour is provided in the literature, as chromium is less likely to migrate regardless of conditions and persists in the soil longer than other heavy metals. In Chapter 5, a spatial model for the application is developed, which uses a bivariate mixture model approach to model the body and tail of the heavy metal distributions. Our approach is tailored to the case of unreplicated observations, which is non-standard in the extreme value theory literature. The body of the contaminant distributions was modelled using a Gaussian distribution, while the tails were modelled using a Gaussiangeneralised Pareto composition. The body-tail components of both contaminants were modelled jointly under a coregionalisation framework, allowing the tail components to share a scaled spatial random effect, effectively accounting for the dependence in the tails. The model showed that the probability of exceeding a safety threshold was high in the south banks of the river Clyde in urban Glasgow and some villages to the east - all areas of historical industrial activity and mining legacies.

In Chapter 6, we present an approach for data fusion of PM2.5 air pollution extremes in the Greater London region. Data fusion models are generally motivated by the need to integrate information from different data sources to obtain a better description of the underlying phenomenon. In this case, we fuse remote-sensing data (modelled data), which enjoy complete spatial and temporal records, and in-situ measurements from observation stations. The model proposed is a tailored approach for threshold exceedances, representing extreme concentrations of PM2.5, which enhances observations of the remote-sensing data (EAC4 dataset) to better represent threshold exceedances observed using data from the observation stations of the AURN - a high-quality air quality monitoring network in the UK. Results from the model show that the extremes data fusion model improves threshold exceedances reported by the EAC4 model, in the sense that it better approximates in-situ measurements. The extremes data fusion model also outperforms a competitive data fusion approach based on the Gaussian distribution. A map of fused observations shows different spatial patterns than the modelled observations, assigning higher concentrations to locations on the coast - a claim which is further corroborated by air pollution literature.

Finally, Chapter 7 presents my contribution to the challenges C2 and C4 of the EVA 2023 Data Challenge, a competition organised for the EVA 2023 conference in Milan, Italy. In C2, organisers ask for an extrapolated value that minimises an arbitrary loss function. To address the question, we propose a novel approach to extrapolate high quantiles under an application-specific loss function using an extreme-weighted bootstrap. C4 asks to estimate the probability of joint exceedance in a high-dimensional setting, for which we propose using a probabilistic principal component analysis model (PPCA). The methods were found to have mixed success, and we discuss the limitations and potential presented by these models.

Item Type: Thesis (PhD)
Qualification Level: Doctoral
Subjects: G Geography. Anthropology. Recreation > GE Environmental Sciences
H Social Sciences > HA Statistics
Colleges/Schools: College of Science and Engineering > School of Mathematics and Statistics > Statistics
Supervisor's Name: Castro-Camilo, Dr. Daniela and Scott, Professor Marian
Date of Award: 2024
Depositing User: Theses Team
Unique ID: glathesis:2024-84667
Copyright: Copyright of this thesis is held by the author.
Date Deposited: 07 Nov 2024 09:31
Last Modified: 07 Nov 2024 09:38
Thesis DOI: 10.5525/gla.thesis.84667
URI: https://theses.gla.ac.uk/id/eprint/84667

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year