The application of supervised machine learning techniques to determine photometric redshifts

Janiurek, Lara (2023) The application of supervised machine learning techniques to determine photometric redshifts. MRes thesis, University of Glasgow.

Full text available as:
[thumbnail of 2023janiurekmres.pdf] PDF
Download (3MB)

Abstract

The inference of the Hubble constant using gravitational wave data has allowed for a new way for the expansion of the Universe to be probed. The use of dark sirens, which are mergers of binary black hole systems, to measure the Hubble constant (H0) may shed considerable light on the current Hubble tension. Galaxy redshift surveys are a key ingredient for the application of these dark sirens in the measurement of H0. Most binary black hole merger events are not expected to have an associated electromagnetic counterpart, therefore measuring H0 using these sirens requires the identification of the redshifts of potential host galaxies and marginalising over these host galaxy redshifts. Photometric redshift surveys often contain significant statistical or systematic errors which may impact adversely on the Hubble constant inference. Improving the performance of dark sirens in the future observing runs of the LIGO Virgo KAGRA(LVK) network requires a better understanding of the photometric redshift errors. The current redshift values used by the LVK for cosmological inference are assumed to have an associated Gaussian error, however a true quantification of the redshift posteriors would give a more accurate result in the overall inference of the H0. Spectroscopic redshifts are difficult to obtain and many physical photometric techniques rely on cosmological models that could potentially introduce bias into the redshift measurements. Machine learning techniques are advantageous in that they don’t rely on assumed cosmological models.
In this work, the random forest algorithm GALPRO is implemented to generate photometric redshift posteriors. It is initially calibrated using a truth dataset compiled by Zhou et al. The initial calibration is successful and analysis suggests that the redshift posterior distributions are largely non-Gaussian. This further reinforces the need for a reliable method to generate redshift posteriors to better represent these photometric errors in the inference of H0.
Tests were run using the Zhou et al. dataset to determine how statistically similar the training and testing datasets from a survey must be for GALPRO to be applicable. It was found that the training and testing datasets must have similar redshift distributions and overlap by at least 90% in the band ranges to give accurate results. GALPRO was then trained using the Zhou et al. dataset and applied to a sample from the PanSTARRS survey to explore if GALPRO could be trained using a trusted dataset and applied to a general, new survey. It was shown that no matter how statistically equivalent the two surveys were, GALPRO could not produce accurate redshift posteriors for the new survey. The Zhou et al. and PanSTARRS surveys had very similar redshift distributions and overlapped in each inputted band by over 90%. Despite this, application of the algorithm still resulted in a catastrophic failure, indicating that there must be some underlying fundamental difference between the two surveys that causes the program to fail. This work serves as a cautionary tale in the application of random forests to new surveys when generating photometric redshift posteriors.

Item Type: Thesis (MRes)
Qualification Level: Masters
Subjects: Q Science > QC Physics
Colleges/Schools: College of Science and Engineering > School of Physics and Astronomy
Supervisor's Name: Hendry, Professor Martin
Date of Award: 2023
Depositing User: Theses Team
Unique ID: glathesis:2023-83369
Copyright: Copyright of this thesis is held by the author.
Date Deposited: 19 Jan 2023 12:41
Last Modified: 19 Jan 2023 14:38
Thesis DOI: 10.5525/gla.thesis.83369
URI: https://theses.gla.ac.uk/id/eprint/83369

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year