Investigating uncertainty and emulating process-based models with multivariate outputs, applied to aquaculture

Currie, Michael (2022) Investigating uncertainty and emulating process-based models with multivariate outputs, applied to aquaculture. PhD thesis, University of Glasgow.

Full text available as:
[thumbnail of 2022CurriePhD_edited.pdf] PDF
Download (2MB)

Abstract

There remain environmental challenges which can only accurately be assessed by process-based modelling. An example of this is the monitoring of the environmental impacts of aquaculture, where the logistical diffculty and cost of collecting data over large areas make mathematical modelling the more effective approach. Such approaches are computationally intensive and do not account for uncertainty. NewDEPOMOD is an example of a process-based model that is used within aquaculture to model the environmental impacts of aquaculture. This thesis provides an in-depth investigation of uncertainty in such a model using sensitivity analysis, and proposes a novel statistical emulation framework to approximate the output from NewDEPOMOD, reducing the computational cost. NewDEPOMOD is a complex mathematical model that was developed in order to estimate and predict the transportation of waste particles from fish farms to their deposition on the seabed. It features a number of different types of input, representing features such as the fish farm physical structure, ow speeds and waste transportation properties. In addition, the output produced by NewDEPOMOD provides a measure known as Solids Flux in grid cells across the domain, representing the environmental impact. This can be visualised as either a univariate or multivariate output. The univariate outputs produced by NewDEPOMOD are the Total Area Impacted and 99th Percentile of Solids Flux which provide a measure of the size and intensity of the impact on the seabed. In collaboration with the Scottish Environment Protection agency (`SEPA'), with application to fish farm sites around the coast of Scotland, a set of inputs were identified as being of most importance for investigating the effiect of their uncertainty on the NewDEPOMOD outputs. In this thesis, sensitivity analyses are conducted at multiple fish farm sites, classed as high and low energy based on their ow speeds, using random forest models. Random forest models are proposed as they are exible, efficient, and the importance values produced by the models can be used to rank the inputs based on their in uence on the output data. To assess the impact of changing the inputs values on the output maps produced by NewDEPOMOD, traditional univariate sensitivity analysis techniques are expanded here to develop novel sensitivity analysis methods for considering multivariate model outputs. Three different approaches to investigating the output maps are considered: 1) shape analysis based on a landmark approach for identifying the main shape of the impact, 2) bivariate functional analysis where the output maps are considered as smooth surfaces, and 3) grid cell approach where the Solids Flux in each grid cell is considered individually. The performance of each approach was considered individually before developing a framework, using a subset of the approaches, that could be applied to multiple sites to assess parameter uncertainty, and hence the impact of altering the inputs on the output maps. The application of statistical emulation to model the univariate outputs from NewDEPOMOD reducing the computational cost is a novel approach. The methods proposed for the emulation are random forests and Gaussian processes which both provide exibility and allow for fast predictions for new data in comparison to the time taken to run NewDEPOMOD. For each site, training data will be used to fit the emulation models for each approach before using a test set of data to assess their predictive performance. Root Mean Squared Error (`RMSE') and the Mean Absolute Error (`MAE') are both considered as measures of how well the emulators perform and allow for comparisons to be made between the approaches. Further investigation assesses the suitability of a single emulator to be used at all sites, or whether the emulators should be built individually for each site. In practice, correlated outputs are more realistic in such a scenario and hence the emulation framework for the univariate outputs is expanded to consider the univariate outputs together as a correlated multivariate output. Extensions to the random forest and Gaussian process models are proposed which account for correlation between the outputs. The predictive performance for both approaches can be reviewed using RMSE and MAE to determine if there are improvements when modelling the univariate outputs together as a correlated output. This research provides a deeper understanding of NewDEPOMOD through the development of novel sensitivity analysis and emulation tools for computationally efficient analyses of data on the impact of fish farms. Remarks on the approaches used and their results are provided throughout this thesis, along with potential future extensions to the research.

Item Type: Thesis (PhD)
Qualification Level: Doctoral
Subjects: H Social Sciences > HA Statistics
Colleges/Schools: College of Science and Engineering > School of Mathematics and Statistics > Statistics
Supervisor's Name: Miller, Professor Claire and Scott, Professor Marian
Date of Award: 2022
Depositing User: Theses Team
Unique ID: glathesis:2022-83227
Copyright: Copyright of this thesis is held by the author.
Date Deposited: 25 Oct 2022 11:51
Last Modified: 25 Oct 2022 11:56
Thesis DOI: 10.5525/gla.thesis.83227
URI: https://theses.gla.ac.uk/id/eprint/83227

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year