Mwanga, Emmanuel Peter (2024) Using machine learning and infrared spectroscopy for rapid assessment of key entomological indicators of malaria transmission. PhD thesis, University of Glasgow.
Full text available as:
PDF
Download (11MB) |
Abstract
Summary
Malaria vector surveillance is a critical element in control and elimination programs in endemic regions, serving to assess current transmission levels, vector species behaviours, and the efficacy of control interventions. Key surveillance metrics typically include the density and diversity of biting Anopheles mosquitoes, their blood-feeding histories, parasite prevalence within vectors, and the age structure of adult mosquito populations, among other indicators. However, conventional methods for monitoring these metrics are often costly, labour-intensive, and time-consuming, underscoring the need for scalable, simple, and cost-effective alternatives. The work presented in this thesis aligns with the recommendations of key policy organisations, including the World Health Organisation (WHO), which advocate for integrating effective surveillance into malaria control strategies in endemic regions.
The primary aim of my PhD project was to demonstrate that the emerging approach of Mid-infrared spectroscopy combined with machine learning (MIRS-ML) – a method that analyses biochemical signals generated by infrared light absorption in a sample – can offer high-throughput, and accurate assessments of entomological and parasitological indicators of malaria transmission. The project was therefore designed to provide field validation for the application of this technology by addressing several critical gaps to facilitate the effective implementation of MIRS-ML in vector surveillance. These gaps included: 1) the necessity for field-calibrated models to predict key entomological indicators of malaria across diverse settings, 2) the need to demonstrate the efficacy of this approach in areas where Anopheles funestus is predominant, as this species is the most significant malaria vector in East and Southern Africa but had not been analysed using MIRS-ML, 3) the need to apply this approach to multiple indicators in both laboratory and field settings, and 4) the necessity to show that infectious mosquitoes harbouring Plasmodium sporozoites in their salivary glands can be reliably detected using MIRS-ML.
The specific objectives of my PhD thesis were therefore as follows: 1) To evaluate the usefulness of transfer learning and dimensionality reduction techniques for improving the generalisability and transferability of MIRS-ML-based predictions for mosquito age classifications, 2) To demonstrate the application of MIRS-ML in classifying epidemiologically relevant age categories of adult female An. funestus mosquitoes, 3) To demonstrate the field applicability of MIRS-ML for identifying blood meal sources in field-collected An. funestus mosquitoes, 4) To validate the field applicability of MIRS-ML for detecting Plasmodium-infected An. funestus mosquitoes, and 5) To explore key lessons learned from infrared-based entomological and parasitological studies, and to outline future directions for the use of MIRS-ML in malaria surveillance. The field studies were conducted in an area in Southeastern Tanzania where An. funestus accounts for more than 80% of malaria transmission.
In objective 1 (Chapter 2), I explored whether dimensionality reduction and transfer learning could improve the generalisability of MIRS-based age predictions. Here, the dimensionality of the spectra data was reduced using unsupervised principal component analysis (PCA) or t-distributed stochastic neighbour embedding (t-SNE), and then used to train deep learning and standard ML models. Transfer learning was also used to reduce computational costs and enhance generalisability when predicting mosquito ages from new populations. The findings indicated that while dimensionality reduction alone did not improve generalisability, it did reduce computational time. Transfer learning was crucial for achieving generalisable MIRS-ML models for mosquito age prediction, suggesting that combining it with dimensionality reduction can improve the efficiency, transferability, and dissemination of these models.
In objective 2 (Chapter 3), I focused on applying MIRS-ML to rapidly classify the epidemiologically relevant age categories of An. funestus. Spectra data were divided into two age categories: 1-9 days (young, non-infectious) and 10-16 days (old, potentially infectious). PCA was used to reduce dimensionality, and a set of standard ML models and multi-layer perceptron (MLP) were trained to predict mosquito age categories. The results demonstrated the effectiveness of MIRS-ML in quickly classifying epidemiologically relevant age groups of An. funestus. Having been previously applied to Anopheles gambiae, Anopheles arabiensis and Anopheles coluzzii, this demonstration on An. funestus supports the potential of this low-cost, reagent-free technique for widespread use across all major Afro-tropical malaria vectors.
In objective 3 (Chapter 4), I demonstrated the first field application of MIRS-ML for assessing the blood-feeding histories of malaria vectors, with direct comparison to polymerase chain reaction (PCR) assays. After scanning mosquito samples on a spectrometer, blood meals were confirmed by PCR to establish the ‘ground truth’ for training ML models. Logistic regression and MLP models achieved over 88% accuracy in predicting mosquito blood meal sources, as well as closely matching the human blood index (HBI) estimates with the PCR-based standard HBI. This chapter provided evidence for the utility of MIRS-ML as a complementary surveillance tool in settings where conventional molecular techniques are impractical, given its cost-effectiveness, simplicity, scalability, along with its generalisability, outweighing minor gaps in HBI estimation.
In objective 4 (Chapter 5), I demonstrated the first field application of MIRS-ML for rapid and accurate detection of Plasmodium sporozoite in wild-caught An. funestus mosquitoes without requiring laboratory reagents. Desiccated mosquito head and thoraxes were scanned on MIRS, and sporozoite infection were confirmed by enzyme-linked immunosorbent assay (ELISA) and PCR, to establish references for training ML models. The ML models accurately predicted sporozoite-infectious mosquito samples with ∼92% classification accuracy, highlighting the potential of MIRS-ML to enhance surveillance in malaria-endemic regions.
Building on the findings from objective 1, 2, 3 & 4, chapter 6 discusses key lessons learned from infrared-based entomological and parasitological studies and explored the future prospects for MIRS-ML in malaria surveillance. While significant advances have been made, challenges such as improving model generalisability across different environments and enhancing the interpretability of biochemical signals remain. Transfer learning can improve model performance, but no single approach fully address the variability of field samples. The broader implementation of MIRS-ML for malaria surveillance will require continuous data generation, model validation, and the development of deployment-ready systems, including the potential use of pooled samples as current scanning is limited to individuals.
In conclusion, this thesis demonstrates the potential of MIRS-ML for reagent-free assessments of key entomological indicators of malaria transmission in field settings including mosquito age, blood-feeding histories, and Plasmodium infections. Another important advancement was the successful application of transfer learning and dimensionality reduction to improve model generalisability and computational efficiency across different mosquito populations. MIRS-ML achieved high accuracy in classifying epidemiologically relevant age groups, detecting blood meal sources, and identifying sporozoite-infected mosquitoes. While challenges such as data variability and model robustness remain, this research highlights the potential of the MIRS-ML approach as a powerful, reagent-free alternative to traditional surveillance methods. Future work should focus on optimising model performance and developing deployment-ready systems for multi-variable assessments in real-world settings, particularly in resource-limited, malaria-endemic regions.
Item Type: | Thesis (PhD) |
---|---|
Qualification Level: | Doctoral |
Additional Information: | Supported by funding from the Wellcome Trust fellowship, Gates Foundation, Medical Research Council (MRC), and the American Society of Tropical Medicine (ASTMH). |
Subjects: | R Medicine > RB Pathology |
Colleges/Schools: | College of Medical Veterinary and Life Sciences > School of Biodiversity, One Health & Veterinary Medicine |
Supervisor's Name: | Babayan, Dr. Simon, Okumu, Professor Fredros and Baldini, Dr. Francesco |
Date of Award: | 2024 |
Depositing User: | Theses Team |
Unique ID: | glathesis:2024-84824 |
Copyright: | Copyright of this thesis is held by the author. |
Date Deposited: | 20 Jan 2025 16:31 |
Last Modified: | 21 Jan 2025 09:39 |
Thesis DOI: | 10.5525/gla.thesis.84824 |
URI: | https://theses.gla.ac.uk/id/eprint/84824 |
Related URLs: |
Actions (login required)
View Item |
Downloads
Downloads per month over past year