Multimodal machine learning in medical screenings

Nguyen, Thuy Trinh (2023) Multimodal machine learning in medical screenings. MSc(R) thesis, University of Glasgow.

Full text available as:
[thumbnail of 2023NguyenMSc(R).pdf] PDF
Download (2MB)


The healthcare industry, with its high demand and standards, has long been considered a crucial area for technology-based innovation. However, the medical field often relies on experience-based evaluation. Limited resources, overloading capacity, and a lack of accessibility can hinder timely medical care and diagnosis delivery. In light of these challenges, automated medical screening as a decision-making aid is highly recommended. With the increasing availability of data and the need to explore the complementary effect among modalities, multimodal machine learning has emerged as a potential area of technology. Its impact has been witnessed across a wide range of domains, prompting the question of how far machine learning can be leveraged to automate processes in even more complex and high-risk sectors.

This paper delves into the realm of multimodal machine learning in the field of automated medical screening and evaluates the potential of this area of study in mental disorder detection, a highly important area of healthcare. First, we conduct a scoping review targeted at high-impact papers to highlight the trends and directions of multimodal machine learning in screening prevalent mental disorders such as depression, stress, and bipolar disorder. The review provides a comprehensive list of popular datasets and extensively studied modalities. The review also proposes an end-to-end pipeline for multimodal machine learning applications, covering essential steps from preprocessing, representation, and fusion, to modelling and evaluation. While cross-modality interaction has been considered a promising factor to leverage fusion among multimodalities, the number of existing multimodal fusion methods employing this mechanism is rather limited. This study investigates multimodal fusion in more detail through the proposal of Autofusion, an autoencoder-infused fusion technique that harnesses the cross-modality interaction among different modalities. The technique is evaluated on DementiaBank’s Pitt corpus to detect Alzheimer’s disease, leveraging the power of cross-modality interaction. Autofusion achieves a promising performance of 79.89% in accuracy, 83.85% in recall, 81.72% in precision, and 82.47% in F1. The technique consistently outperforms all unimodal methods by an average of 5.24% across all metrics. Our method consistently outperforms early fusion and late fusion. Especially against the late fusion hard-voting technique, our method outperforms by an average of 20% across all metrics. Further, empirical results show that the cross-modality interaction term enhances the model performance by 2-3% across metrics. This research highlights the promising impact of cross-modality interaction in multimodal machine learning and calls for further research to unlock its full potential.

Item Type: Thesis (MSc(R))
Qualification Level: Masters
Keywords: Multimodal machine learning, automated medical screening, mental disorder detection, Alzheimer’s disease detection.
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Q Science > QA Mathematics > QA76 Computer software
Colleges/Schools: College of Science and Engineering > School of Computing Science
Supervisor's Name: Nguyen, Dr. Hoang D. and Deligianni, Dr. Fani
Date of Award: 2023
Depositing User: Theses Team
Unique ID: glathesis:2023-83883
Copyright: Copyright of this thesis is held by the author.
Date Deposited: 11 Oct 2023 12:21
Last Modified: 26 Oct 2023 13:33
Thesis DOI: 10.5525/gla.thesis.83883

Actions (login required)

View Item View Item


Downloads per month over past year